Re: Slow Response for less volume

2018-10-24 Thread Deepak Goel
Are you getting errors in Jmeter?

On Wed, 24 Oct 2018, 21:49 Amjad Khan,  wrote:

> Hi,
>
> We recently moved to Solr Cloud (Google) with 4 nodes and have very
> limited number of data.
>
> We are facing very weird issue here, solr cluster response time for query
> is high when we have less number of hit and the moment we run our test to
> hit the solr cluster hard we see better response in 10ms.
>
> Any clue will be appreciated.
>
> Thanks


Re: Slow Response for less volume

2018-10-24 Thread Walter Underwood
If your cache is 2048 entries, then every one of those 1600 queries is in cache.

Our logs typically have about a million lines, with distinct queries 
distributed according to the Zipf law. Some common queries, a long tail, that 
sort of thing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 10:02 AM, Amjad Khan  wrote:
> 
> Thanks Wunder for this prompt response.
> 
> We are testing with 1600 different text to search with Jmeter and that keeps 
> running continuously, and keep running continuously means cache has been 
> built and there should be better response now. Doesn’t it?
> 
> Thanks
> 
> 
> 
>> On Oct 24, 2018, at 12:20 PM, Walter Underwood  wrote:
>> 
>> Are you testing with a small number of queries? If your cache is larger than 
>> the number of queries in your benchmark, the first round will load the 
>> cache, then everything will be super fast.
>> 
>> Load testing a system with caches is hard.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
>>> 
>>> Hi,
>>> 
>>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
>>> number of data.
>>> 
>>> We are facing very weird issue here, solr cluster response time for query 
>>> is high when we have less number of hit and the moment we run our test to 
>>> hit the solr cluster hard we see better response in 10ms.
>>> 
>>> Any clue will be appreciated.
>>> 
>>> Thanks
>> 
> 



Re: Slow Response for less volume

2018-10-24 Thread Walter Underwood
But a zero size cache doesn’t give realistic benchmarks. It makes things slower 
than they will be in production.

We do this:

1. Collect production logs.
2. Split the logs into a warming log and and a benchmark log. The warming log 
should be at least as large as the query result cache.
3. Run the warming log with four threads (unlikely to overload the system).
4. Run the benchmark with a controlled requests/minute and enough threads to 
keep up with that. Might be a few hundred with a large, slow cluster. Run for 
at least an hour.
5. Analyze the results into percentile response times for each request handler. 
Warn about any errors or a benchmark that takes too long.

Then reepeat. Oh, yeah, load the prod content first.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 9:52 AM, Erick Erickson  wrote:
> 
> You can set your queryResultCache and filterCache "size" parameter to
> zero in solrconfig.xml to disable those caches.
> On Wed, Oct 24, 2018 at 9:21 AM Walter Underwood  
> wrote:
>> 
>> Are you testing with a small number of queries? If your cache is larger than 
>> the number of queries in your benchmark, the first round will load the 
>> cache, then everything will be super fast.
>> 
>> Load testing a system with caches is hard.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
>>> 
>>> Hi,
>>> 
>>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
>>> number of data.
>>> 
>>> We are facing very weird issue here, solr cluster response time for query 
>>> is high when we have less number of hit and the moment we run our test to 
>>> hit the solr cluster hard we see better response in 10ms.
>>> 
>>> Any clue will be appreciated.
>>> 
>>> Thanks
>> 



Re: Slow Response for less volume

2018-10-24 Thread Amjad Khan
Thanks Erick,

But do you think that disabling the cache will increase the response time 
instead of solving the problem here.


> On Oct 24, 2018, at 12:52 PM, Erick Erickson  wrote:
> 
> queryResultCache



Re: Slow Response for less volume

2018-10-24 Thread Amjad Khan
Thanks Wunder for this prompt response.

We are testing with 1600 different text to search with Jmeter and that keeps 
running continuously, and keep running continuously means cache has been built 
and there should be better response now. Doesn’t it?

Thanks



> On Oct 24, 2018, at 12:20 PM, Walter Underwood  wrote:
> 
> Are you testing with a small number of queries? If your cache is larger than 
> the number of queries in your benchmark, the first round will load the cache, 
> then everything will be super fast.
> 
> Load testing a system with caches is hard.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
>> 
>> Hi,
>> 
>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
>> number of data.
>> 
>> We are facing very weird issue here, solr cluster response time for query is 
>> high when we have less number of hit and the moment we run our test to hit 
>> the solr cluster hard we see better response in 10ms.
>> 
>> Any clue will be appreciated.
>> 
>> Thanks
> 



Re: Slow Response for less volume

2018-10-24 Thread Erick Erickson
You can set your queryResultCache and filterCache "size" parameter to
zero in solrconfig.xml to disable those caches.
On Wed, Oct 24, 2018 at 9:21 AM Walter Underwood  wrote:
>
> Are you testing with a small number of queries? If your cache is larger than 
> the number of queries in your benchmark, the first round will load the cache, 
> then everything will be super fast.
>
> Load testing a system with caches is hard.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
> >
> > Hi,
> >
> > We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
> > number of data.
> >
> > We are facing very weird issue here, solr cluster response time for query 
> > is high when we have less number of hit and the moment we run our test to 
> > hit the solr cluster hard we see better response in 10ms.
> >
> > Any clue will be appreciated.
> >
> > Thanks
>


Re: Slow Response for less volume

2018-10-24 Thread Walter Underwood
Are you testing with a small number of queries? If your cache is larger than 
the number of queries in your benchmark, the first round will load the cache, 
then everything will be super fast.

Load testing a system with caches is hard.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
> 
> Hi,
> 
> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
> number of data.
> 
> We are facing very weird issue here, solr cluster response time for query is 
> high when we have less number of hit and the moment we run our test to hit 
> the solr cluster hard we see better response in 10ms.
> 
> Any clue will be appreciated.
> 
> Thanks



Re: slow response

2009-09-09 Thread Grant Ingersoll
Do you need 10K results at a time or are you just getting the top 10  
or so in a set of 10K?  Also, are you retrieving really large stored  
fields?  If you add debugQuery=true to your request, Solr will return  
timing information for the various components.



On Sep 9, 2009, at 10:10 AM, Elaine Li wrote:


Hi,

I have 20 million docs on solr. If my query would return more than
10,000 results, the response time will be very very long. How to
resolve such problem? Can I slice my docs into pieces and let the
query operate within one piece at a time so the response time and
response data will be more managable? Thanks.

Elaine


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: slow response

2009-09-09 Thread Alex Baranov
There is a good article on how to scale the Lucene/Solr solution:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

Also, if you have heavy load on the server (large amount of concurrent
requests) then I'd suggest to consider loading the index into RAM. It worked
well for me on the project with 140+ million documents and 30 concurrent
user requests per second. If your index can be placed in RAM you can reduce
the architecture complexity.

Alex Baranov

On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li elaine.bing...@gmail.com wrote:

 Hi,

 I have 20 million docs on solr. If my query would return more than
 10,000 results, the response time will be very very long. How to
 resolve such problem? Can I slice my docs into pieces and let the
 query operate within one piece at a time so the response time and
 response data will be more managable? Thanks.

 Elaine



Re: slow response

2009-09-09 Thread Constantijn Visinescu
Just wondering, is there an easy way to load the whole index into ram?

On Wed, Sep 9, 2009 at 4:22 PM, Alex Baranov alex.barano...@gmail.comwrote:

 There is a good article on how to scale the Lucene/Solr solution:


 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

 Also, if you have heavy load on the server (large amount of concurrent
 requests) then I'd suggest to consider loading the index into RAM. It
 worked
 well for me on the project with 140+ million documents and 30 concurrent
 user requests per second. If your index can be placed in RAM you can reduce
 the architecture complexity.

 Alex Baranov

 On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li elaine.bing...@gmail.com
 wrote:

  Hi,
 
  I have 20 million docs on solr. If my query would return more than
  10,000 results, the response time will be very very long. How to
  resolve such problem? Can I slice my docs into pieces and let the
  query operate within one piece at a time so the response time and
  response data will be more managable? Thanks.
 
  Elaine
 



Re: slow response

2009-09-09 Thread Alex Baranov
Please, take a look at

http://issues.apache.org/jira/browse/SOLR-1379

Alex.

On Wed, Sep 9, 2009 at 5:28 PM, Constantijn Visinescu baeli...@gmail.comwrote:

 Just wondering, is there an easy way to load the whole index into ram?

 On Wed, Sep 9, 2009 at 4:22 PM, Alex Baranov alex.barano...@gmail.com
 wrote:

  There is a good article on how to scale the Lucene/Solr solution:
 
 
 
 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
 
  Also, if you have heavy load on the server (large amount of concurrent
  requests) then I'd suggest to consider loading the index into RAM. It
  worked
  well for me on the project with 140+ million documents and 30 concurrent
  user requests per second. If your index can be placed in RAM you can
 reduce
  the architecture complexity.
 
  Alex Baranov
 
  On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li elaine.bing...@gmail.com
  wrote:
 
   Hi,
  
   I have 20 million docs on solr. If my query would return more than
   10,000 results, the response time will be very very long. How to
   resolve such problem? Can I slice my docs into pieces and let the
   query operate within one piece at a time so the response time and
   response data will be more managable? Thanks.
  
   Elaine
  
 



Re: slow response

2009-09-09 Thread gwk

Hi Elaine,

I think you need to provide us with some more information on what 
exactly you are trying to achieve. From your question I also assumed you 
wanted paging (getting the first 10 results, than the next 10 etc.) But 
reading it again, slice my docs into pieces I now think you might've 
meant that you only want to retrieve certain fields from each document. 
For that you can use the fl parameter 
(http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). 
Hope this helps.


Regards,

gwk

Elaine Li wrote:

I want to get the 10K results, not just the top 10.
The fields are regular language sentences, they are not large.

Is clustering the technique for what I am doing?

On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org wrote:
  

Do you need 10K results at a time or are you just getting the top 10 or so
in a set of 10K?  Also, are you retrieving really large stored fields?  If
you add debugQuery=true to your request, Solr will return timing
information for the various components.


On Sep 9, 2009, at 10:10 AM, Elaine Li wrote:



Hi,

I have 20 million docs on solr. If my query would return more than
10,000 results, the response time will be very very long. How to
resolve such problem? Can I slice my docs into pieces and let the
query operate within one piece at a time so the response time and
response data will be more managable? Thanks.

Elaine
  

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search







Re: slow response

2009-09-09 Thread Elaine Li
gwk,

Sorry for confusion. I am doing simple phrase search among the
sentences which could be in english or other language. Each doc has
only several id numbers and the sentence itself.

I did not know about paging. Sounds like it is what I need. How to
achieve paging from solr?

I also need to store all the results into my own tables in javascript
to use for connecting with other applications.

Elaine

On Wed, Sep 9, 2009 at 10:37 AM, gwkg...@eyefi.nl wrote:
 Hi Elaine,

 I think you need to provide us with some more information on what exactly
 you are trying to achieve. From your question I also assumed you wanted
 paging (getting the first 10 results, than the next 10 etc.) But reading it
 again, slice my docs into pieces I now think you might've meant that you
 only want to retrieve certain fields from each document. For that you can
 use the fl parameter
 (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab).
 Hope this helps.

 Regards,

 gwk

 Elaine Li wrote:

 I want to get the 10K results, not just the top 10.
 The fields are regular language sentences, they are not large.

 Is clustering the technique for what I am doing?

 On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org
 wrote:


 Do you need 10K results at a time or are you just getting the top 10 or
 so
 in a set of 10K?  Also, are you retrieving really large stored fields?
  If
 you add debugQuery=true to your request, Solr will return timing
 information for the various components.


 On Sep 9, 2009, at 10:10 AM, Elaine Li wrote:



 Hi,

 I have 20 million docs on solr. If my query would return more than
 10,000 results, the response time will be very very long. How to
 resolve such problem? Can I slice my docs into pieces and let the
 query operate within one piece at a time so the response time and
 response data will be more managable? Thanks.

 Elaine


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search







Re: slow response

2009-09-09 Thread gwk

Hi Elaine,

You can page your resultset with the rows and start parameters 
(http://wiki.apache.org/solr/CommonQueryParameters). So for example to 
get the first 100 results one would use the parameters rows=100start=0 
and the second 100 results with rows=100start=100 etc. etc.


Regards,

gwk

Elaine Li wrote:

gwk,

Sorry for confusion. I am doing simple phrase search among the
sentences which could be in english or other language. Each doc has
only several id numbers and the sentence itself.

I did not know about paging. Sounds like it is what I need. How to
achieve paging from solr?

I also need to store all the results into my own tables in javascript
to use for connecting with other applications.

Elaine

On Wed, Sep 9, 2009 at 10:37 AM, gwkg...@eyefi.nl wrote:
  

Hi Elaine,

I think you need to provide us with some more information on what exactly
you are trying to achieve. From your question I also assumed you wanted
paging (getting the first 10 results, than the next 10 etc.) But reading it
again, slice my docs into pieces I now think you might've meant that you
only want to retrieve certain fields from each document. For that you can
use the fl parameter
(http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab).
Hope this helps.

Regards,

gwk

Elaine Li wrote:


I want to get the 10K results, not just the top 10.
The fields are regular language sentences, they are not large.

Is clustering the technique for what I am doing?

On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org
wrote:

  

Do you need 10K results at a time or are you just getting the top 10 or
so
in a set of 10K?  Also, are you retrieving really large stored fields?
 If
you add debugQuery=true to your request, Solr will return timing
information for the various components.


On Sep 9, 2009, at 10:10 AM, Elaine Li wrote:




Hi,

I have 20 million docs on solr. If my query would return more than
10,000 results, the response time will be very very long. How to
resolve such problem? Can I slice my docs into pieces and let the
query operate within one piece at a time so the response time and
response data will be more managable? Thanks.

Elaine

  

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search









Re: slow response

2009-09-09 Thread Elaine Li
gwk, thanks a lot.

Elaine

On Wed, Sep 9, 2009 at 11:14 AM, gwkg...@eyefi.nl wrote:
 Hi Elaine,

 You can page your resultset with the rows and start parameters
 (http://wiki.apache.org/solr/CommonQueryParameters). So for example to get
 the first 100 results one would use the parameters rows=100start=0 and the
 second 100 results with rows=100start=100 etc. etc.

 Regards,

 gwk

 Elaine Li wrote:

 gwk,

 Sorry for confusion. I am doing simple phrase search among the
 sentences which could be in english or other language. Each doc has
 only several id numbers and the sentence itself.

 I did not know about paging. Sounds like it is what I need. How to
 achieve paging from solr?

 I also need to store all the results into my own tables in javascript
 to use for connecting with other applications.

 Elaine

 On Wed, Sep 9, 2009 at 10:37 AM, gwkg...@eyefi.nl wrote:


 Hi Elaine,

 I think you need to provide us with some more information on what exactly
 you are trying to achieve. From your question I also assumed you wanted
 paging (getting the first 10 results, than the next 10 etc.) But reading
 it
 again, slice my docs into pieces I now think you might've meant that
 you
 only want to retrieve certain fields from each document. For that you can
 use the fl parameter

 (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab).
 Hope this helps.

 Regards,

 gwk

 Elaine Li wrote:


 I want to get the 10K results, not just the top 10.
 The fields are regular language sentences, they are not large.

 Is clustering the technique for what I am doing?

 On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org
 wrote:



 Do you need 10K results at a time or are you just getting the top 10 or
 so
 in a set of 10K?  Also, are you retrieving really large stored fields?
  If
 you add debugQuery=true to your request, Solr will return timing
 information for the various components.


 On Sep 9, 2009, at 10:10 AM, Elaine Li wrote:




 Hi,

 I have 20 million docs on solr. If my query would return more than
 10,000 results, the response time will be very very long. How to
 resolve such problem? Can I slice my docs into pieces and let the
 query operate within one piece at a time so the response time and
 response data will be more managable? Thanks.

 Elaine



 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using
 Solr/Lucene:
 http://www.lucidimagination.com/search










Re: Slow response times using *:*

2008-01-31 Thread Yonik Seeley
On Jan 31, 2008 10:43 AM, Andy Blower [EMAIL PROTECTED] wrote:

 I'm evaluating SOLR/Lucene for our needs and currently looking at performance
 since 99% of the functionality we're looking for is provided. The index
 contains 18.4 Million records and is 58Gb in size. Most queries are
 acceptably quick, once the filters are cached. The filters select one or
 more of three subsets of the data and then intersect from around 15 other
 subsets of data depending on a user subscription.

 We're returning facets on several fields, and sometimes a blank (q=*:*)
 query is run purely to get the facets for all of the data that the user can
 access. This information is turned into browse information and can be
 different for each user.

 Running performance tests using jMeter sequentially with a single user,
 these blank queries are slower than the normal queries, but still in the
 1-4sec range. Unfortunately if I increase the number of test threads so that
 more than one of the blank queries is submitted while one is already being
 processed, everything grinds to a halt and the responses to these blank
 queries can take up to 125secs to be returned!

*:* maps to MatchAllDocsQuery, which for each document needs to check
if it's deleted (that's a synchronized call, and can be a bottleneck).

A cheap workaround is that if you know of a term that is in every
document, (or a field in every document that has very few terms), then
substitute a query on that for *:*
Substituting one of your filters as the base query might also work.

 This surprises me because the filter query submitted has usually already
 been submitted along with a normal query, and so should be cached in the
 filter cache. Surely all solr needs to do is return a handful of fields for
 the first 100 records in the list from the cache - or so I thought.

To calculate the DocSet (the set of all documents matching *:* and
your filters), Solr can just use it's caches as long as *:* and the
filters have been used before.

*But*, to retrieve the top 10 documents matching *:* and your filters,
the query must be re-run.  That is probably where the time is being
spent.  Since you aren't looking for relevancy scores at all, but just
faceting, it seems like we could potentially optimize this in Solr.

In the future, we could also do some query optimization by sometimes
combining filters with the base query.

-Yonik


Re: Slow response times using *:*

2008-01-31 Thread Shalin Shekhar Mangar
I can't give you a definitive answer based on the data you've provided.
However, do you really need to get *all* facets? Can't you limit them with
facet.limit field? Are you planning to run multiple *:* queries with all
facets turned on a 58GB index in a live system? I don't think that's a good
idea.

As for the 125 seconds, I think it is probably because of paging issues. Are
you faceting on multivalued or tokenized fields? In that case, Solr uses
field queries which consume a lot of memory if the number of unique terms
are large.

On Jan 31, 2008 9:13 PM, Andy Blower [EMAIL PROTECTED] wrote:


 I'm evaluating SOLR/Lucene for our needs and currently looking at
 performance
 since 99% of the functionality we're looking for is provided. The index
 contains 18.4 Million records and is 58Gb in size. Most queries are
 acceptably quick, once the filters are cached. The filters select one or
 more of three subsets of the data and then intersect from around 15 other
 subsets of data depending on a user subscription.

 We're returning facets on several fields, and sometimes a blank (q=*:*)
 query is run purely to get the facets for all of the data that the user
 can
 access. This information is turned into browse information and can be
 different for each user.

 Running performance tests using jMeter sequentially with a single user,
 these blank queries are slower than the normal queries, but still in the
 1-4sec range. Unfortunately if I increase the number of test threads so
 that
 more than one of the blank queries is submitted while one is already being
 processed, everything grinds to a halt and the responses to these blank
 queries can take up to 125secs to be returned!

 This surprises me because the filter query submitted has usually already
 been submitted along with a normal query, and so should be cached in the
 filter cache. Surely all solr needs to do is return a handful of fields
 for
 the first 100 records in the list from the cache - or so I thought.

 Can anyone tell me what might be causing this dramatic slowdown? Any
 suggestions for solutions would be gratefully received.


 Thans
 Andy.
 --
 View this message in context:
 http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15206563.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Slow response times using *:*

2008-01-31 Thread Andy Blower

Actually I do need all facets for a field, although I've just realised that
the tests are limited to only 100. Ooops. So it should be worse in
reality... erk.

Since that's what we do with our current search engine, Solr has to be able
to compete with this. The fields are a mix of non-multi, non-tokenized and
others which are. I've yet to experiment with this.

Thanks.


shalinmangar wrote:
 
 I can't give you a definitive answer based on the data you've provided.
 However, do you really need to get *all* facets? Can't you limit them with
 facet.limit field? Are you planning to run multiple *:* queries with all
 facets turned on a 58GB index in a live system? I don't think that's a
 good
 idea.
 
 As for the 125 seconds, I think it is probably because of paging issues.
 Are
 you faceting on multivalued or tokenized fields? In that case, Solr uses
 field queries which consume a lot of memory if the number of unique terms
 are large.
 
 On Jan 31, 2008 9:13 PM, Andy Blower [EMAIL PROTECTED] wrote:
 

 I'm evaluating SOLR/Lucene for our needs and currently looking at
 performance
 since 99% of the functionality we're looking for is provided. The index
 contains 18.4 Million records and is 58Gb in size. Most queries are
 acceptably quick, once the filters are cached. The filters select one or
 more of three subsets of the data and then intersect from around 15 other
 subsets of data depending on a user subscription.

 We're returning facets on several fields, and sometimes a blank (q=*:*)
 query is run purely to get the facets for all of the data that the user
 can
 access. This information is turned into browse information and can be
 different for each user.

 Running performance tests using jMeter sequentially with a single user,
 these blank queries are slower than the normal queries, but still in the
 1-4sec range. Unfortunately if I increase the number of test threads so
 that
 more than one of the blank queries is submitted while one is already
 being
 processed, everything grinds to a halt and the responses to these blank
 queries can take up to 125secs to be returned!

 This surprises me because the filter query submitted has usually already
 been submitted along with a normal query, and so should be cached in the
 filter cache. Surely all solr needs to do is return a handful of fields
 for
 the first 100 records in the list from the cache - or so I thought.

 Can anyone tell me what might be causing this dramatic slowdown? Any
 suggestions for solutions would be gratefully received.


 Thans
 Andy.
 --
 View this message in context:
 http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15206563.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15208594.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Slow response times using *:*

2008-01-31 Thread Walter Underwood
How often does the index change? Can you use an HTTP cache and do this
once for each new index?

wunder

On 1/31/08 9:09 AM, Andy Blower [EMAIL PROTECTED] wrote:

 
 Actually I do need all facets for a field, although I've just realised that
 the tests are limited to only 100. Ooops. So it should be worse in
 reality... erk.
 
 Since that's what we do with our current search engine, Solr has to be able
 to compete with this. The fields are a mix of non-multi, non-tokenized and
 others which are. I've yet to experiment with this.
 
 Thanks.
 
 
 shalinmangar wrote:
 
 I can't give you a definitive answer based on the data you've provided.
 However, do you really need to get *all* facets? Can't you limit them with
 facet.limit field? Are you planning to run multiple *:* queries with all
 facets turned on a 58GB index in a live system? I don't think that's a
 good
 idea.
 
 As for the 125 seconds, I think it is probably because of paging issues.
 Are
 you faceting on multivalued or tokenized fields? In that case, Solr uses
 field queries which consume a lot of memory if the number of unique terms
 are large.
 
 On Jan 31, 2008 9:13 PM, Andy Blower [EMAIL PROTECTED] wrote:
 
 
 I'm evaluating SOLR/Lucene for our needs and currently looking at
 performance
 since 99% of the functionality we're looking for is provided. The index
 contains 18.4 Million records and is 58Gb in size. Most queries are
 acceptably quick, once the filters are cached. The filters select one or
 more of three subsets of the data and then intersect from around 15 other
 subsets of data depending on a user subscription.
 
 We're returning facets on several fields, and sometimes a blank (q=*:*)
 query is run purely to get the facets for all of the data that the user
 can
 access. This information is turned into browse information and can be
 different for each user.
 
 Running performance tests using jMeter sequentially with a single user,
 these blank queries are slower than the normal queries, but still in the
 1-4sec range. Unfortunately if I increase the number of test threads so
 that
 more than one of the blank queries is submitted while one is already
 being
 processed, everything grinds to a halt and the responses to these blank
 queries can take up to 125secs to be returned!
 
 This surprises me because the filter query submitted has usually already
 been submitted along with a normal query, and so should be cached in the
 filter cache. Surely all solr needs to do is return a handful of fields
 for
 the first 100 records in the list from the cache - or so I thought.
 
 Can anyone tell me what might be causing this dramatic slowdown? Any
 suggestions for solutions would be gratefully received.
 
 
 Thans
 Andy.
 --
 View this message in context:
 http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15206563.ht
 ml
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 



Re: Slow response times using *:*

2008-01-31 Thread Andy Blower


Yonik Seeley wrote:
 
 *:* maps to MatchAllDocsQuery, which for each document needs to check
 if it's deleted (that's a synchronized call, and can be a bottleneck).
 

Why does this need to check if documents are deleted if normal queries
don't? Is there any way of disabling this since I can be sure this isn't the
case after indexing and optimizing.


Yonik Seeley wrote:
 
 A cheap workaround is that if you know of a term that is in every
 document, (or a field in every document that has very few terms), then
 substitute a query on that for *:*
 Substituting one of your filters as the base query might also work.
 

Would duplicating one of my filters cause any issues? That would be easy.
Otherwise I'll try the substitution and see if it helps much.


Yonik Seeley wrote:
 
 This surprises me because the filter query submitted has usually already
 been submitted along with a normal query, and so should be cached in the
 filter cache. Surely all solr needs to do is return a handful of fields
 for
 the first 100 records in the list from the cache - or so I thought.
 
 To calculate the DocSet (the set of all documents matching *:* and
 your filters), Solr can just use it's caches as long as *:* and the
 filters have been used before.
 
 *But*, to retrieve the top 10 documents matching *:* and your filters,
 the query must be re-run.  That is probably where the time is being
 spent.  Since you aren't looking for relevancy scores at all, but just
 faceting, it seems like we could potentially optimize this in Solr.
 

I'm actually retrieving the first 100 in my tests, which will be necessary
in one of the two scenarios we use blank queries for. The other scenario
doesn't require any docs at all - just the facets, and I've not put that in
my tests. What would the situation be if I specified a sort order for the
facets and/or retrieved no docs at all? I'd be sorting the facets
alphabetically, which is currently done by my app rather than the search
engine. (since I sometimes have to merge facets from more than one field)

I had assumed that no doc would be considered more relevant than any other
without any query terms - i.e. filter query terms wouldn't affect relevance.
This seems sensible to me, but maybe that's only because our current search
engine works that way. 

Regarding optimization, I certainly think that being able to access all
facets for subsets of the indexed data (defined by the filter query) is an
incredibly useful feature. My search engine usage may not be very common
though. What it means to us is that we can drive all aspects of our sites
from the search engine, not just the obvious search forms.


Yonik Seeley wrote:
 
 In the future, we could also do some query optimization by sometimes
 combining filters with the base query.
 
 -Yonik
 
 

Sorry, that flew over my head..

Thanks very much for your help. I wish I had more time during this
evaluation to delve into the code. I don't suppose there's a document with
guided tour of the codebase anywhere is there? ;-)


P.S. I re-ran my tests without returning facets whilst writing this and
didn't get the slowdowns with 4 or 10 threads, does this help? 


-- 
View this message in context: 
http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15209605.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Slow response times using *:*

2008-01-31 Thread Mike Klaas

On 31-Jan-08, at 9:41 AM, Andy Blower wrote:


Yonik Seeley wrote:


This surprises me because the filter query submitted has usually  
already
been submitted along with a normal query, and so should be cached  
in the
filter cache. Surely all solr needs to do is return a handful of  
fields

for
the first 100 records in the list from the cache - or so I thought.


To calculate the DocSet (the set of all documents matching *:* and
your filters), Solr can just use it's caches as long as *:* and the
filters have been used before.

*But*, to retrieve the top 10 documents matching *:* and your  
filters,

the query must be re-run.  That is probably where the time is being
spent.  Since you aren't looking for relevancy scores at all, but  
just

faceting, it seems like we could potentially optimize this in Solr.



I'm actually retrieving the first 100 in my tests, which will be  
necessary
in one of the two scenarios we use blank queries for. The other  
scenario
doesn't require any docs at all - just the facets, and I've not put  
that in
my tests. What would the situation be if I specified a sort order  
for the

facets and/or retrieved no docs at all? I'd be sorting the facets
alphabetically, which is currently done by my app rather than the  
search
engine. (since I sometimes have to merge facets from more than one  
field)


First question:  What is the use of retrieving 100 documents if there  
is no defined sort order?


The situation could be optimized in Solr, but there is a related case  
that _is_ optimized that should be almost as fast.  If you


a) don't ask for document score in field list (fl)
b) enable useFilterForSortedQuery in solrconfig.xml
c) specify _some_ sort order other than score

Then Solr will do cached bitset intersections only.  It will also do  
sorting, but that may not be terribly expensive.  If it is close to  
the desired performance, it would be relatively easy to patch solr to  
not do that step.


(Note: this is query sort, no facet sort).

I had assumed that no doc would be considered more relevant than  
any other
without any query terms - i.e. filter query terms wouldn't affect  
relevance.
This seems sensible to me, but maybe that's only because our  
current search

engine works that way.


It won't, but it will still try to calculate the score if you ask it  
to (all docs will score the same, though).


Regarding optimization, I certainly think that being able to access  
all
facets for subsets of the indexed data (defined by the filter  
query) is an
incredibly useful feature. My search engine usage may not be very  
common
though. What it means to us is that we can drive all aspects of our  
sites

from the search engine, not just the obvious search forms.


I also use this feature.  It would be useful to optimize the case  
where rows=0.


-Mike


Re: Slow response

2007-09-14 Thread Tom Hill
Hi Mike,

Thanks for clarifying what has been a bit of a black box to me.

A couple of questions, to increase my understanding, if you don't mind.

If I am only using fields with multiValued=false, with a type of string
or integer  (untokenized), does solr automatically use approach 2? Or is
this something I have to actively configure?

And is approach 2 better than 1? Or vice versa? Or is the answer it
depends? :-)

If, as I suspect, the answer was it depends, are there any general
guidelines on when to use or approach or the other?

Thanks,

Tom














On 9/6/07, Mike Klaas [EMAIL PROTECTED] wrote:


 On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:

 
  There are essentially two facet computation strategies:
 
  1. cached bitsets: a bitset for each term is generated and
  intersected with the query restul bitset.  This is more general and
  performs well up to a few thousand terms.
 
  2. field enumeration: cache the field contents, and generate counts
  using this data.  Relatively independent of #unique terms, but
  requires at most a single facet value per field per document.
 
  So, if you factor author into Primary author/Secondary author,
  where each is guaranteed to only have one value per doc, this could
  greatly accelerate your faceting.  There are probably fewer unique
  subjects, so strategy 1 is likely fine.
 
  To use strategy 2, just make sure that multivalued=false is set
  for those fields in schema.xml

 I forgot to mention that strategy 2 also requires a single token for
 each doc (see http://wiki.apache.org/solr/
 FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)

 -Mike



Re: Slow response

2007-09-14 Thread Mike Klaas

On 14-Sep-07, at 3:38 PM, Tom Hill wrote:


Hi Mike,

Thanks for clarifying what has been a bit of a black box to me.

A couple of questions, to increase my understanding, if you don't  
mind.


If I am only using fields with multiValued=false, with a type of  
string
or integer  (untokenized), does solr automatically use approach  
2? Or is

this something I have to actively configure?


It'll happen automatically.


And is approach 2 better than 1? Or vice versa? Or is the answer it
depends? :-)


It depends :)


If, as I suspect, the answer was it depends, are there any general
guidelines on when to use or approach or the other?


Yeah, it usually depends on how many unique facet values there are,  
how many documents are returned in the query, and how much memory you  
have.  1 is usually faster when there are few terms; 2 is usually  
faster when there are many terms.


Things can be further complicated by additional parameters, like  
facet.enum.cache.minDf (http://wiki.apache.org/solr/ 
SimpleFacetParameters#head-3ea6fc5d1056447295c38c9675e35ce06fd95f97)


-Mike






On 9/6/07, Mike Klaas [EMAIL PROTECTED] wrote:



On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:



There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and
intersected with the query restul bitset.  This is more general and
performs well up to a few thousand terms.

2. field enumeration: cache the field contents, and generate counts
using this data.  Relatively independent of #unique terms, but
requires at most a single facet value per field per document.

So, if you factor author into Primary author/Secondary author,
where each is guaranteed to only have one value per doc, this could
greatly accelerate your faceting.  There are probably fewer unique
subjects, so strategy 1 is likely fine.

To use strategy 2, just make sure that multivalued=false is set
for those fields in schema.xml


I forgot to mention that strategy 2 also requires a single token for
each doc (see http://wiki.apache.org/solr/
FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)

-Mike





RE: Slow response

2007-09-06 Thread Aaron Hammond
Thank-you for your response, this does shed some light on the subject.
Our basic question was why were we seeing slower responses the smaller
our result set got. 

Currently we are searching about 1.2 million documents with the source
document about 2KB, but we do duplicate some of the data. I bumped up my
filterCache to 5 million and the 2nd search I did for an non-indexed
term came back in 2.1 seconds so that is much improved. I am a little
concerned about having this value so high but this is our problem and we
will play with it. 

I do have a few follow-up questions. First, in regards to the
filterCache once a single search has been done and facets requested, as
long as new facets aren't requested and the size is large enough then
the filters will remain in the cache, correct?

Also, you mention that faceting is more a function of the number of the
number of terms in the field. The 2 fields causing our problems are
Authors and Subjects. If we divided up the data that made these facets
into more specific fields (Primary author, secondary author, etc.) would
this perform better? So the number of facet fields would increase but
the unique terms for a given facet should be less.

Thanks again for all your help.

Aaron


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, September 06, 2007 4:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Slow response

On 9/6/07, Aaron Hammond [EMAIL PROTECTED] wrote:
 I am pretty new to Solr and this is my first post to this list so
please
 forgive me if I make any glaring errors.

 Here's my problem. When I do a search using the Solr admin interface
for
 a term that I know does not exist in my index the QTime is about 1ms.
 However, if I add facets to the search the response takes more than 20
 seconds (and sometimes longer) to return. Here is the slow URL -

Faceting on multi-value fields is more a function of the number of
terms in the field (and their distribution) rather than the number of
hits for a query.  That said, perhaps faceting should be able to bail
out if there are no hits.

Is your question more about why faceting takes so long in general, or
why it takes so long if there are no results?  If you haven't, try
optimizing your index for facet faceting in general.  How many docs do
you have in your index?

As a side note, the way multi-valued faceting currently works, it's
actually normally faster if the query returns a large number of hits.

-Yonik


Re: Slow response

2007-09-06 Thread Mike Klaas

On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:


Thank-you for your response, this does shed some light on the subject.
Our basic question was why were we seeing slower responses the smaller
our result set got.

Currently we are searching about 1.2 million documents with the source
document about 2KB, but we do duplicate some of the data. I bumped  
up my

filterCache to 5 million and the 2nd search I did for an non-indexed
term came back in 2.1 seconds so that is much improved. I am a little
concerned about having this value so high but this is our problem  
and we

will play with it.

I do have a few follow-up questions. First, in regards to the
filterCache once a single search has been done and facets  
requested, as

long as new facets aren't requested and the size is large enough then
the filters will remain in the cache, correct?

Also, you mention that faceting is more a function of the number  
of the

number of terms in the field. The 2 fields causing our problems are
Authors and Subjects. If we divided up the data that made these facets
into more specific fields (Primary author, secondary author, etc.)  
would

this perform better? So the number of facet fields would increase but
the unique terms for a given facet should be less.


There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and  
intersected with the query restul bitset.  This is more general and  
performs well up to a few thousand terms.


2. field enumeration: cache the field contents, and generate counts  
using this data.  Relatively independent of #unique terms, but  
requires at most a single facet value per field per document.


So, if you factor author into Primary author/Secondary author, where  
each is guaranteed to only have one value per doc, this could greatly  
accelerate your faceting.  There are probably fewer unique subjects,  
so strategy 1 is likely fine.


To use strategy 2, just make sure that multivalued=false is set for  
those fields in schema.xml


-Mike


Re: Slow response

2007-09-06 Thread Mike Klaas


On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:



There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and  
intersected with the query restul bitset.  This is more general and  
performs well up to a few thousand terms.


2. field enumeration: cache the field contents, and generate counts  
using this data.  Relatively independent of #unique terms, but  
requires at most a single facet value per field per document.


So, if you factor author into Primary author/Secondary author,  
where each is guaranteed to only have one value per doc, this could  
greatly accelerate your faceting.  There are probably fewer unique  
subjects, so strategy 1 is likely fine.


To use strategy 2, just make sure that multivalued=false is set  
for those fields in schema.xml


I forgot to mention that strategy 2 also requires a single token for  
each doc (see http://wiki.apache.org/solr/ 
FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)


-Mike