subject:"Re\: Partial Counts in SOLR"

Re: Partial Counts in SOLR

2014-03-19 Thread Salman Akram

Anyone?

On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Below is one of the sample slow query that takes mins!

((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
purchase* or repurchase*)) w/10 (executive or director)

If a filter is used it comes in fq but what can be done about plain
keyword search?

On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson
erickerick...@gmail.comwrote:

What are our complex queries? You
say that your app will very rarely see the
same query thus you aren't using caches...
But, if you can move some of your
clauses to fq clauses, then the filterCache
might well be used to good effect.

On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
1- SOLR 4.6
2- We do but right now I am talking about plain keyword queries just
sorted
by date. Once this is better will start looking into caches which we
already changed a little.
3- As I said the contents are not stored in this index. Some other
metadata
fields are but with normal queries its super fast so I guess even if I
change there it will be a minor difference. We have SSD and quite fast
too.
4- That's something we need to do but even in low workload those queries
take a lot of time
5- Every 10 mins and currently no auto warming as user queries are
rarely
same and also once its fully warmed those queries are still slow.
6- Nops.

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan solrexp...@gmail.com
wrote:

1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache
sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the
field is
stored and it is retrieved in the result, there are couple of disk
seeks
per field= search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do
you
observe STW GC pauses?
5. How often do you commit do you have the autowarming / external
warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:

https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking
about:

Index size = 400GB (Using CommonGrams so without that the index is
around
180GB)
Position File = 280GB
Total Docs = 170 million (just indexed for searching - for
highlighting
contents are stored in another index)
Avg Doc Size = Few hundred KBs
RAM = 384GB (it has other indexes too but still OS cache can have
60-80%
of
the total index cached)

Phrase queries run pretty fast with CG but complex versions of
wildcard
and
proximity queries can be really slow. I know using CG will make them
slow
but they just take too long. By default sorting is on date but users
have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years)
but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com
wrote:

As Hoss pointed out above, different projects have different
requirements.
Some want to sort by date of ingestion reverse, which means that
having
posting lists organized in a reverse order with the early
termination
is
the way to go (no such feature in Solr directly). Some other
projects
want
to collect all docs matching a query, and then sort by rank, but
you
cannot
guarantee, that the most recently inserted document is the most
relevant
in
terms of your ranking.

Do your current searches take too long?

On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Its a long video and I will definitely go through it but it seems
this
is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean
generally for
search engines its more important to show the first page results,
rather
than using timeAllowed which might not even return a single
result.

Thanks!

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Re: Partial Counts in SOLR

2014-03-19 Thread Erick Erickson

Yes, that'll be slow. Wildcards are, at best, interesting and at worst
resource consumptive. Especially when you're doing this kind of
positioning information as well.

Consider looking at the problem sideways. That is, what is your
purpose in searching for, say, buy*? You want to find buy, buying,
buyers, etc? Would you get bette results if you just stemmed and
omitted the wildcards?

Do you have a restricted vocabulary that would allow you to define
synonyms for the important words and all their variants at index
time and use that?

Finally, of course, you could shard your index (or add more shards if
you're already sharding) if you really _must_ support these kinds of
queries and can't work around the problem.

Best,
Erick

On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
Anyone?

On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Below is one of the sample slow query that takes mins!

((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
purchase* or repurchase*)) w/10 (executive or director)

If a filter is used it comes in fq but what can be done about plain
keyword search?

On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson
erickerick...@gmail.comwrote:

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan solrexp...@gmail.com
wrote:

some solr wiki docs:

https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking
about:

I wanted to avoid creating multiple indexes (maybe based on years)
but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com
wrote:

Do your current searches take too long?

Re: Partial Counts in SOLR

2014-03-19 Thread Salman Akram

This was one example. Users can even add phrase searches with
wildcards/proximity etc so can't really use stemming.

Sharding is definitely something we are already looking into.

On Wed, Mar 19, 2014 at 6:59 PM, Erick Erickson erickerick...@gmail.comwrote:

Yes, that'll be slow. Wildcards are, at best, interesting and at worst
resource consumptive. Especially when you're doing this kind of
positioning information as well.

Do you have a restricted vocabulary that would allow you to define
synonyms for the important words and all their variants at index
time and use that?

Finally, of course, you could shard your index (or add more shards if
you're already sharding) if you really _must_ support these kinds of
queries and can't work around the problem.

Best,
Erick

On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
Anyone?

On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Below is one of the sample slow query that takes mins!

((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
purchase* or repurchase*)) w/10 (executive or director)

If a filter is used it comes in fq but what can be done about plain
keyword search?

On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson
erickerick...@gmail.comwrote:

On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
1- SOLR 4.6
2- We do but right now I am talking about plain keyword queries just
sorted
by date. Once this is better will start looking into caches which we
already changed a little.
3- As I said the contents are not stored in this index. Some other
metadata
fields are but with normal queries its super fast so I guess even if
I
change there it will be a minor difference. We have SSD and quite
fast
too.
4- That's something we need to do but even in low workload those
queries
take a lot of time
5- Every 10 mins and currently no auto warming as user queries are
rarely
same and also once its fully warmed those queries are still slow.
6- Nops.

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan solrexp...@gmail.com
wrote:

1. What is your solr version? In 4.x family the proximity searches
have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the
cache
utilization ratios? Optimize (= i.e. bump up the respective cache
sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the
field is
stored and it is retrieved in the result, there are couple of disk
seeks
per field= search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do
you
observe STW GC pauses?
5. How often do you commit do you have the autowarming / external
warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:

https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking
about:

Phrase queries run pretty fast with CG but complex versions of
wildcard
and
proximity queries can be really slow. I know using CG will make
them
slow
but they just take too long. By default sorting is on date but
users
have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years)
but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com

wrote:

As Hoss pointed out above, different projects have different
requirements.
Some want to sort by date of ingestion reverse, which means that
having
posting

Re: Partial Counts in SOLR

2014-03-17 Thread Salman Akram

Below is one of the sample slow query that takes mins!

((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
purchase* or repurchase*)) w/10 (executive or director)

If a filter is used it comes in fq but what can be done about plain keyword
search?

On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson erickerick...@gmail.comwrote:

On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
1- SOLR 4.6
2- We do but right now I am talking about plain keyword queries just
sorted
by date. Once this is better will start looking into caches which we
already changed a little.
3- As I said the contents are not stored in this index. Some other
metadata
fields are but with normal queries its super fast so I guess even if I
change there it will be a minor difference. We have SSD and quite fast
too.
4- That's something we need to do but even in low workload those queries
take a lot of time
5- Every 10 mins and currently no auto warming as user queries are rarely
same and also once its fully warmed those queries are still slow.
6- Nops.

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan solrexp...@gmail.com
wrote:

1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache
sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the
field is
stored and it is retrieved in the result, there are couple of disk seeks
per field= search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
observe STW GC pauses?
5. How often do you commit do you have the autowarming / external
warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:

https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking
about:

I wanted to avoid creating multiple indexes (maybe based on years) but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com
wrote:

As Hoss pointed out above, different projects have different
requirements.
Some want to sort by date of ingestion reverse, which means that
having
posting lists organized in a reverse order with the early
termination
is
the way to go (no such feature in Solr directly). Some other
projects
want
to collect all docs matching a query, and then sort by rank, but you
cannot
guarantee, that the most recently inserted document is the most
relevant
in
terms of your ranking.

Do your current searches take too long?

On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Its a long video and I will definitely go through it but it seems
this
is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean generally
for
search engines its more important to show the first page results,
rather
than using timeAllowed which might not even return a single
result.

Thanks!

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

--
Regards,

Salman Akram

Re: Partial Counts in SOLR

2014-03-15 Thread Erick Erickson

On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
1- SOLR 4.6
2- We do but right now I am talking about plain keyword queries just sorted
by date. Once this is better will start looking into caches which we
already changed a little.
3- As I said the contents are not stored in this index. Some other metadata
fields are but with normal queries its super fast so I guess even if I
change there it will be a minor difference. We have SSD and quite fast too.
4- That's something we need to do but even in low workload those queries
take a lot of time
5- Every 10 mins and currently no auto warming as user queries are rarely
same and also once its fully warmed those queries are still slow.
6- Nops.

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan solrexp...@gmail.com wrote:

1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the field is
stored and it is retrieved in the result, there are couple of disk seeks
per field= search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
observe STW GC pauses?
5. How often do you commit do you have the autowarming / external warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:

https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking about:

Index size = 400GB (Using CommonGrams so without that the index is around
180GB)
Position File = 280GB
Total Docs = 170 million (just indexed for searching - for highlighting
contents are stored in another index)
Avg Doc Size = Few hundred KBs
RAM = 384GB (it has other indexes too but still OS cache can have 60-80%
of
the total index cached)

Phrase queries run pretty fast with CG but complex versions of wildcard
and
proximity queries can be really slow. I know using CG will make them slow
but they just take too long. By default sorting is on date but users have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years) but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com
wrote:

As Hoss pointed out above, different projects have different
requirements.
Some want to sort by date of ingestion reverse, which means that having
posting lists organized in a reverse order with the early termination
is
the way to go (no such feature in Solr directly). Some other projects
want
to collect all docs matching a query, and then sort by rank, but you
cannot
guarantee, that the most recently inserted document is the most
relevant
in
terms of your ranking.

Do your current searches take too long?

On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Its a long video and I will definitely go through it but it seems
this
is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean generally for
search engines its more important to show the first page results,
rather
than using timeAllowed which might not even return a single result.

Thanks!

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

Re: Partial Counts in SOLR

2014-03-13 Thread Salman Akram

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking about:

Index size = 400GB (Using CommonGrams so without that the index is around
180GB)
Position File = 280GB
Total Docs = 170 million (just indexed for searching - for highlighting
contents are stored in another index)
Avg Doc Size = Few hundred KBs
RAM = 384GB (it has other indexes too but still OS cache can have 60-80% of
the total index cached)

Phrase queries run pretty fast with CG but complex versions of wildcard and
proximity queries can be really slow. I know using CG will make them slow
but they just take too long. By default sorting is on date but users have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years) but
seems that to search on partial data that's the only feasible way.




On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com wrote:

 As Hoss pointed out above, different projects have different requirements.
 Some want to sort by date of ingestion reverse, which means that having
 posting lists organized in a reverse order with the early termination is
 the way to go (no such feature in Solr directly). Some other projects want
 to collect all docs matching a query, and then sort by rank, but you cannot
 guarantee, that the most recently inserted document is the most relevant in
 terms of your ranking.


 Do your current searches take too long?


 On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:

  Its a long video and I will definitely go through it but it seems this is
  not possible with SOLR as it is?
 
  I just thought it would be quite a common issue; I mean generally for
  search engines its more important to show the first page results, rather
  than using timeAllowed which might not even return a single result.
 
  Thanks!
 
 
  --
  Regards,
 
  Salman Akram
 



 --
 Dmitry
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan




-- 
Regards,

Salman Akram

Re: Partial Counts in SOLR

2014-03-13 Thread Dmitry Kan

1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the field is
stored and it is retrieved in the result, there are couple of disk seeks
per field= search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
observe STW GC pauses?
5. How often do you commit do you have the autowarming / external warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:
https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking about:

Index size = 400GB (Using CommonGrams so without that the index is around
180GB)
Position File = 280GB
Total Docs = 170 million (just indexed for searching - for highlighting
contents are stored in another index)
Avg Doc Size = Few hundred KBs
RAM = 384GB (it has other indexes too but still OS cache can have 60-80% of
the total index cached)

Phrase queries run pretty fast with CG but complex versions of wildcard and
proximity queries can be really slow. I know using CG will make them slow
but they just take too long. By default sorting is on date but users have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years) but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com wrote:

As Hoss pointed out above, different projects have different
requirements.
Some want to sort by date of ingestion reverse, which means that having
posting lists organized in a reverse order with the early termination is
the way to go (no such feature in Solr directly). Some other projects
want
to collect all docs matching a query, and then sort by rank, but you
cannot
guarantee, that the most recently inserted document is the most relevant
in
terms of your ranking.

Do your current searches take too long?

On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Its a long video and I will definitely go through it but it seems this
is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean generally for
search engines its more important to show the first page results,
rather
than using timeAllowed which might not even return a single result.

Thanks!

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Partial Counts in SOLR

2014-03-13 Thread Salman Akram

1- SOLR 4.6
2- We do but right now I am talking about plain keyword queries just sorted
by date. Once this is better will start looking into caches which we
already changed a little.
3- As I said the contents are not stored in this index. Some other metadata
fields are but with normal queries its super fast so I guess even if I
change there it will be a minor difference. We have SSD and quite fast too.
4- That's something we need to do but even in low workload those queries
take a lot of time
5- Every 10 mins and currently no auto warming as user queries are rarely
same and also once its fully warmed those queries are still slow.
6- Nops.

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan solrexp...@gmail.com wrote:

1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the field is
stored and it is retrieved in the result, there are couple of disk seeks
per field= search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
observe STW GC pauses?
5. How often do you commit do you have the autowarming / external warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:

https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29

On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Well some of the searches take minutes.

Below are some stats about this particular index that I am talking about:

Index size = 400GB (Using CommonGrams so without that the index is around
180GB)
Position File = 280GB
Total Docs = 170 million (just indexed for searching - for highlighting
contents are stored in another index)
Avg Doc Size = Few hundred KBs
RAM = 384GB (it has other indexes too but still OS cache can have 60-80%
of
the total index cached)

Phrase queries run pretty fast with CG but complex versions of wildcard
and
proximity queries can be really slow. I know using CG will make them slow
but they just take too long. By default sorting is on date but users have
few other parameters too on which they can sort.

I wanted to avoid creating multiple indexes (maybe based on years) but
seems that to search on partial data that's the only feasible way.

On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan solrexp...@gmail.com
wrote:

As Hoss pointed out above, different projects have different
requirements.
Some want to sort by date of ingestion reverse, which means that having
posting lists organized in a reverse order with the early termination
is
the way to go (no such feature in Solr directly). Some other projects
want
to collect all docs matching a query, and then sort by rank, but you
cannot
guarantee, that the most recently inserted document is the most
relevant
in
terms of your ranking.

Do your current searches take too long?

On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Its a long video and I will definitely go through it but it seems
this
is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean generally for
search engines its more important to show the first page results,
rather
than using timeAllowed which might not even return a single result.

Thanks!

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

--
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

--
Regards,

Salman Akram

Re: Partial Counts in SOLR

2014-03-12 Thread Dmitry Kan

As Hoss pointed out above, different projects have different requirements.
Some want to sort by date of ingestion reverse, which means that having
posting lists organized in a reverse order with the early termination is
the way to go (no such feature in Solr directly). Some other projects want
to collect all docs matching a query, and then sort by rank, but you cannot
guarantee, that the most recently inserted document is the most relevant in
terms of your ranking.


Do your current searches take too long?


On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram 
salman.ak...@northbaysolutions.net wrote:

 Its a long video and I will definitely go through it but it seems this is
 not possible with SOLR as it is?

 I just thought it would be quite a common issue; I mean generally for
 search engines its more important to show the first page results, rather
 than using timeAllowed which might not even return a single result.

 Thanks!


 --
 Regards,

 Salman Akram




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Partial Counts in SOLR

2014-03-11 Thread Salman Akram

Its a long video and I will definitely go through it but it seems this is
not possible with SOLR as it is?

I just thought it would be quite a common issue; I mean generally for
search engines its more important to show the first page results, rather
than using timeAllowed which might not even return a single result.

Thanks!


-- 
Regards,

Salman Akram

Re: Partial Counts in SOLR

2014-03-10 Thread Dmitry Kan

Salman,

It looks like what you describe has been implemented at Twitter.

Presentation from the recent Lucene / Solr Revolution conference in Dublin:
http://www.youtube.com/watch?v=AguWva8P_DI


On Sat, Mar 8, 2014 at 4:16 PM, Salman Akram 
salman.ak...@northbaysolutions.net wrote:

 The issue with timeallowed is you never know if it will return minimum
 amount of docs or not.

 I do want docs to be sorted based on date but it seems its not possible
 that solr starts searching from recent docs and stops after finding certain
 no. of docs...any other tweak?

 Thanks


 On Saturday, March 8, 2014, Chris Hostetter hossman_luc...@fucit.org
 wrote:

 
  : Reason: In an index with millions of documents I don't want to know
 that
  a
  : certain query matched 1 million docs (of course it will take time to
  : calculate that). Why don't just stop looking for more results lets say
  : after it finds 100 docs? Possible??
 
  but if you care about sorting, ie: you want the top 100 documents sorted
  by score, or sorted by date, you still have to collect all 1 million
  matches in order to know what the first 100 are.
 
  if you really don't care about sorting, you can use the timAllowed
  option to tell the seraching method to do the best job it can in an
  (approximated) limited amount of time, and then pretend that the docs
  collected so far represent the total number of matches...
 
 
 
 https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
 
 
  -Hoss
  http://www.lucidworks.com/
 


 --
 Regards,

 Salman Akram
 Project Manager - Intelligize
 NorthBay Solutions
 410-G4 Johar Town, Lahore
 Off: +92-42-35290152

 Cell: +92-302-8495621




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Partial Counts in SOLR

2014-03-08 Thread Salman Akram

The issue with timeallowed is you never know if it will return minimum
amount of docs or not.

I do want docs to be sorted based on date but it seems its not possible
that solr starts searching from recent docs and stops after finding certain
no. of docs...any other tweak?

Thanks


On Saturday, March 8, 2014, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Reason: In an index with millions of documents I don't want to know that
 a
 : certain query matched 1 million docs (of course it will take time to
 : calculate that). Why don't just stop looking for more results lets say
 : after it finds 100 docs? Possible??

 but if you care about sorting, ie: you want the top 100 documents sorted
 by score, or sorted by date, you still have to collect all 1 million
 matches in order to know what the first 100 are.

 if you really don't care about sorting, you can use the timAllowed
 option to tell the seraching method to do the best job it can in an
 (approximated) limited amount of time, and then pretend that the docs
 collected so far represent the total number of matches...


 https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter


 -Hoss
 http://www.lucidworks.com/



-- 
Regards,

Salman Akram
Project Manager - Intelligize
NorthBay Solutions
410-G4 Johar Town, Lahore
Off: +92-42-35290152

Cell: +92-302-8495621

Re: Partial Counts in SOLR

2014-03-07 Thread Gora Mohanty

On 7 March 2014 15:18, Salman Akram salman.ak...@northbaysolutions.net wrote:
 All,

 Is it possible to get partial counts in SOLR? The idea is to get the count
 but if its above a certain limit than just return that limit.

 Reason: In an index with millions of documents I don't want to know that a
 certain query matched 1 million docs (of course it will take time to
 calculate that). Why don't just stop looking for more results lets say
 after it finds 100 docs? Possible??

 e.g. Something similar that we can do in MySQL:

 SELECT COUNT(*) FROM ( (SELECT * FROM table where 1 = 1) LIMIT 100) Alias

The response to the /select Solr URL has a numFound attribute that
is the number
of matches.

Regards,
Gora

Re: Partial Counts in SOLR

2014-03-07 Thread Dmitry Kan

you limit the number of results by using the rows parameter. You query
however may hit more documents (stored in numFound of the response) that
what will be returned back to you as rows prescribes.


On Fri, Mar 7, 2014 at 11:48 AM, Salman Akram 
salman.ak...@northbaysolutions.net wrote:

 All,

 Is it possible to get partial counts in SOLR? The idea is to get the count
 but if its above a certain limit than just return that limit.

 Reason: In an index with millions of documents I don't want to know that a
 certain query matched 1 million docs (of course it will take time to
 calculate that). Why don't just stop looking for more results lets say
 after it finds 100 docs? Possible??

 e.g. Something similar that we can do in MySQL:

 SELECT COUNT(*) FROM ( (SELECT * FROM table where 1 = 1) LIMIT 100) Alias


 --
 Regards,

 Salman Akram




-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Re: Partial Counts in SOLR

2014-03-07 Thread Salman Akram

I know about numFound. That's where the issue is.

On a complex query that takes mins I think there would be a major chunk of
that spent in calculating numFound whereas I don't need it. Let's say I
just need first 100 docs and then want SOLR to STOP looking further to
populate the numFound.

Let's say I just don't want SOLR to return me numFound. Is that possible?
Also would it really help on the performance?

In MySQL you can simply stop it to look further a certain count for total
count and that gives a considerable improvement for complex queries but
that's not an inverted index so not sure how it works in SOLR...


On Fri, Mar 7, 2014 at 3:17 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 7 March 2014 15:18, Salman Akram salman.ak...@northbaysolutions.net
 wrote:
  All,
 
  Is it possible to get partial counts in SOLR? The idea is to get the
 count
  but if its above a certain limit than just return that limit.
 
  Reason: In an index with millions of documents I don't want to know that
 a
  certain query matched 1 million docs (of course it will take time to
  calculate that). Why don't just stop looking for more results lets say
  after it finds 100 docs? Possible??
 
  e.g. Something similar that we can do in MySQL:
 
  SELECT COUNT(*) FROM ( (SELECT * FROM table where 1 = 1) LIMIT 100) Alias

 The response to the /select Solr URL has a numFound attribute that
 is the number
 of matches.

 Regards,
 Gora




-- 
Regards,

Salman Akram

Re: Partial Counts in SOLR

2014-03-07 Thread Chris Hostetter


: Reason: In an index with millions of documents I don't want to know that a
: certain query matched 1 million docs (of course it will take time to
: calculate that). Why don't just stop looking for more results lets say
: after it finds 100 docs? Possible??

but if you care about sorting, ie: you want the top 100 documents sorted 
by score, or sorted by date, you still have to collect all 1 million 
matches in order to know what the first 100 are.

if you really don't care about sorting, you can use the timAllowed 
option to tell the seraching method to do the best job it can in an 
(approximated) limited amount of time, and then pretend that the docs 
collected so far represent the total number of matches...

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter


-Hoss
http://www.lucidworks.com/

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

Re: Partial Counts in SOLR

16 matches

Site Navigation

Mail list logo

Footer information