Re: Slow Response for less volume
Are you getting errors in Jmeter? On Wed, 24 Oct 2018, 21:49 Amjad Khan, wrote: > Hi, > > We recently moved to Solr Cloud (Google) with 4 nodes and have very > limited number of data. > > We are facing very weird issue here, solr cluster response time for query > is high when we have less number of hit and the moment we run our test to > hit the solr cluster hard we see better response in 10ms. > > Any clue will be appreciated. > > Thanks
Re: Slow Response for less volume
If your cache is 2048 entries, then every one of those 1600 queries is in cache. Our logs typically have about a million lines, with distinct queries distributed according to the Zipf law. Some common queries, a long tail, that sort of thing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 24, 2018, at 10:02 AM, Amjad Khan wrote: > > Thanks Wunder for this prompt response. > > We are testing with 1600 different text to search with Jmeter and that keeps > running continuously, and keep running continuously means cache has been > built and there should be better response now. Doesn’t it? > > Thanks > > > >> On Oct 24, 2018, at 12:20 PM, Walter Underwood wrote: >> >> Are you testing with a small number of queries? If your cache is larger than >> the number of queries in your benchmark, the first round will load the >> cache, then everything will be super fast. >> >> Load testing a system with caches is hard. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Oct 24, 2018, at 9:19 AM, Amjad Khan wrote: >>> >>> Hi, >>> >>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited >>> number of data. >>> >>> We are facing very weird issue here, solr cluster response time for query >>> is high when we have less number of hit and the moment we run our test to >>> hit the solr cluster hard we see better response in 10ms. >>> >>> Any clue will be appreciated. >>> >>> Thanks >> >
Re: Slow Response for less volume
But a zero size cache doesn’t give realistic benchmarks. It makes things slower than they will be in production. We do this: 1. Collect production logs. 2. Split the logs into a warming log and and a benchmark log. The warming log should be at least as large as the query result cache. 3. Run the warming log with four threads (unlikely to overload the system). 4. Run the benchmark with a controlled requests/minute and enough threads to keep up with that. Might be a few hundred with a large, slow cluster. Run for at least an hour. 5. Analyze the results into percentile response times for each request handler. Warn about any errors or a benchmark that takes too long. Then reepeat. Oh, yeah, load the prod content first. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 24, 2018, at 9:52 AM, Erick Erickson wrote: > > You can set your queryResultCache and filterCache "size" parameter to > zero in solrconfig.xml to disable those caches. > On Wed, Oct 24, 2018 at 9:21 AM Walter Underwood > wrote: >> >> Are you testing with a small number of queries? If your cache is larger than >> the number of queries in your benchmark, the first round will load the >> cache, then everything will be super fast. >> >> Load testing a system with caches is hard. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Oct 24, 2018, at 9:19 AM, Amjad Khan wrote: >>> >>> Hi, >>> >>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited >>> number of data. >>> >>> We are facing very weird issue here, solr cluster response time for query >>> is high when we have less number of hit and the moment we run our test to >>> hit the solr cluster hard we see better response in 10ms. >>> >>> Any clue will be appreciated. >>> >>> Thanks >>
Re: Slow Response for less volume
Thanks Erick, But do you think that disabling the cache will increase the response time instead of solving the problem here. > On Oct 24, 2018, at 12:52 PM, Erick Erickson wrote: > > queryResultCache
Re: Slow Response for less volume
Thanks Wunder for this prompt response. We are testing with 1600 different text to search with Jmeter and that keeps running continuously, and keep running continuously means cache has been built and there should be better response now. Doesn’t it? Thanks > On Oct 24, 2018, at 12:20 PM, Walter Underwood wrote: > > Are you testing with a small number of queries? If your cache is larger than > the number of queries in your benchmark, the first round will load the cache, > then everything will be super fast. > > Load testing a system with caches is hard. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Oct 24, 2018, at 9:19 AM, Amjad Khan wrote: >> >> Hi, >> >> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited >> number of data. >> >> We are facing very weird issue here, solr cluster response time for query is >> high when we have less number of hit and the moment we run our test to hit >> the solr cluster hard we see better response in 10ms. >> >> Any clue will be appreciated. >> >> Thanks >
Re: Slow Response for less volume
You can set your queryResultCache and filterCache "size" parameter to zero in solrconfig.xml to disable those caches. On Wed, Oct 24, 2018 at 9:21 AM Walter Underwood wrote: > > Are you testing with a small number of queries? If your cache is larger than > the number of queries in your benchmark, the first round will load the cache, > then everything will be super fast. > > Load testing a system with caches is hard. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Oct 24, 2018, at 9:19 AM, Amjad Khan wrote: > > > > Hi, > > > > We recently moved to Solr Cloud (Google) with 4 nodes and have very limited > > number of data. > > > > We are facing very weird issue here, solr cluster response time for query > > is high when we have less number of hit and the moment we run our test to > > hit the solr cluster hard we see better response in 10ms. > > > > Any clue will be appreciated. > > > > Thanks >
Re: Slow Response for less volume
Are you testing with a small number of queries? If your cache is larger than the number of queries in your benchmark, the first round will load the cache, then everything will be super fast. Load testing a system with caches is hard. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 24, 2018, at 9:19 AM, Amjad Khan wrote: > > Hi, > > We recently moved to Solr Cloud (Google) with 4 nodes and have very limited > number of data. > > We are facing very weird issue here, solr cluster response time for query is > high when we have less number of hit and the moment we run our test to hit > the solr cluster hard we see better response in 10ms. > > Any clue will be appreciated. > > Thanks
Re: slow response
Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: slow response
There is a good article on how to scale the Lucene/Solr solution: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Also, if you have heavy load on the server (large amount of concurrent requests) then I'd suggest to consider loading the index into RAM. It worked well for me on the project with 140+ million documents and 30 concurrent user requests per second. If your index can be placed in RAM you can reduce the architecture complexity. Alex Baranov On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li elaine.bing...@gmail.com wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine
Re: slow response
Just wondering, is there an easy way to load the whole index into ram? On Wed, Sep 9, 2009 at 4:22 PM, Alex Baranov alex.barano...@gmail.comwrote: There is a good article on how to scale the Lucene/Solr solution: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Also, if you have heavy load on the server (large amount of concurrent requests) then I'd suggest to consider loading the index into RAM. It worked well for me on the project with 140+ million documents and 30 concurrent user requests per second. If your index can be placed in RAM you can reduce the architecture complexity. Alex Baranov On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li elaine.bing...@gmail.com wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine
Re: slow response
Please, take a look at http://issues.apache.org/jira/browse/SOLR-1379 Alex. On Wed, Sep 9, 2009 at 5:28 PM, Constantijn Visinescu baeli...@gmail.comwrote: Just wondering, is there an easy way to load the whole index into ram? On Wed, Sep 9, 2009 at 4:22 PM, Alex Baranov alex.barano...@gmail.com wrote: There is a good article on how to scale the Lucene/Solr solution: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Also, if you have heavy load on the server (large amount of concurrent requests) then I'd suggest to consider loading the index into RAM. It worked well for me on the project with 140+ million documents and 30 concurrent user requests per second. If your index can be placed in RAM you can reduce the architecture complexity. Alex Baranov On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li elaine.bing...@gmail.com wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine
Re: slow response
Hi Elaine, I think you need to provide us with some more information on what exactly you are trying to achieve. From your question I also assumed you wanted paging (getting the first 10 results, than the next 10 etc.) But reading it again, slice my docs into pieces I now think you might've meant that you only want to retrieve certain fields from each document. For that you can use the fl parameter (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). Hope this helps. Regards, gwk Elaine Li wrote: I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org wrote: Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: slow response
gwk, Sorry for confusion. I am doing simple phrase search among the sentences which could be in english or other language. Each doc has only several id numbers and the sentence itself. I did not know about paging. Sounds like it is what I need. How to achieve paging from solr? I also need to store all the results into my own tables in javascript to use for connecting with other applications. Elaine On Wed, Sep 9, 2009 at 10:37 AM, gwkg...@eyefi.nl wrote: Hi Elaine, I think you need to provide us with some more information on what exactly you are trying to achieve. From your question I also assumed you wanted paging (getting the first 10 results, than the next 10 etc.) But reading it again, slice my docs into pieces I now think you might've meant that you only want to retrieve certain fields from each document. For that you can use the fl parameter (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). Hope this helps. Regards, gwk Elaine Li wrote: I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org wrote: Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: slow response
Hi Elaine, You can page your resultset with the rows and start parameters (http://wiki.apache.org/solr/CommonQueryParameters). So for example to get the first 100 results one would use the parameters rows=100start=0 and the second 100 results with rows=100start=100 etc. etc. Regards, gwk Elaine Li wrote: gwk, Sorry for confusion. I am doing simple phrase search among the sentences which could be in english or other language. Each doc has only several id numbers and the sentence itself. I did not know about paging. Sounds like it is what I need. How to achieve paging from solr? I also need to store all the results into my own tables in javascript to use for connecting with other applications. Elaine On Wed, Sep 9, 2009 at 10:37 AM, gwkg...@eyefi.nl wrote: Hi Elaine, I think you need to provide us with some more information on what exactly you are trying to achieve. From your question I also assumed you wanted paging (getting the first 10 results, than the next 10 etc.) But reading it again, slice my docs into pieces I now think you might've meant that you only want to retrieve certain fields from each document. For that you can use the fl parameter (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). Hope this helps. Regards, gwk Elaine Li wrote: I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org wrote: Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: slow response
gwk, thanks a lot. Elaine On Wed, Sep 9, 2009 at 11:14 AM, gwkg...@eyefi.nl wrote: Hi Elaine, You can page your resultset with the rows and start parameters (http://wiki.apache.org/solr/CommonQueryParameters). So for example to get the first 100 results one would use the parameters rows=100start=0 and the second 100 results with rows=100start=100 etc. etc. Regards, gwk Elaine Li wrote: gwk, Sorry for confusion. I am doing simple phrase search among the sentences which could be in english or other language. Each doc has only several id numbers and the sentence itself. I did not know about paging. Sounds like it is what I need. How to achieve paging from solr? I also need to store all the results into my own tables in javascript to use for connecting with other applications. Elaine On Wed, Sep 9, 2009 at 10:37 AM, gwkg...@eyefi.nl wrote: Hi Elaine, I think you need to provide us with some more information on what exactly you are trying to achieve. From your question I also assumed you wanted paging (getting the first 10 results, than the next 10 etc.) But reading it again, slice my docs into pieces I now think you might've meant that you only want to retrieve certain fields from each document. For that you can use the fl parameter (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). Hope this helps. Regards, gwk Elaine Li wrote: I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersollgsing...@apache.org wrote: Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Slow response times using *:*
On Jan 31, 2008 10:43 AM, Andy Blower [EMAIL PROTECTED] wrote: I'm evaluating SOLR/Lucene for our needs and currently looking at performance since 99% of the functionality we're looking for is provided. The index contains 18.4 Million records and is 58Gb in size. Most queries are acceptably quick, once the filters are cached. The filters select one or more of three subsets of the data and then intersect from around 15 other subsets of data depending on a user subscription. We're returning facets on several fields, and sometimes a blank (q=*:*) query is run purely to get the facets for all of the data that the user can access. This information is turned into browse information and can be different for each user. Running performance tests using jMeter sequentially with a single user, these blank queries are slower than the normal queries, but still in the 1-4sec range. Unfortunately if I increase the number of test threads so that more than one of the blank queries is submitted while one is already being processed, everything grinds to a halt and the responses to these blank queries can take up to 125secs to be returned! *:* maps to MatchAllDocsQuery, which for each document needs to check if it's deleted (that's a synchronized call, and can be a bottleneck). A cheap workaround is that if you know of a term that is in every document, (or a field in every document that has very few terms), then substitute a query on that for *:* Substituting one of your filters as the base query might also work. This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields for the first 100 records in the list from the cache - or so I thought. To calculate the DocSet (the set of all documents matching *:* and your filters), Solr can just use it's caches as long as *:* and the filters have been used before. *But*, to retrieve the top 10 documents matching *:* and your filters, the query must be re-run. That is probably where the time is being spent. Since you aren't looking for relevancy scores at all, but just faceting, it seems like we could potentially optimize this in Solr. In the future, we could also do some query optimization by sometimes combining filters with the base query. -Yonik
Re: Slow response times using *:*
I can't give you a definitive answer based on the data you've provided. However, do you really need to get *all* facets? Can't you limit them with facet.limit field? Are you planning to run multiple *:* queries with all facets turned on a 58GB index in a live system? I don't think that's a good idea. As for the 125 seconds, I think it is probably because of paging issues. Are you faceting on multivalued or tokenized fields? In that case, Solr uses field queries which consume a lot of memory if the number of unique terms are large. On Jan 31, 2008 9:13 PM, Andy Blower [EMAIL PROTECTED] wrote: I'm evaluating SOLR/Lucene for our needs and currently looking at performance since 99% of the functionality we're looking for is provided. The index contains 18.4 Million records and is 58Gb in size. Most queries are acceptably quick, once the filters are cached. The filters select one or more of three subsets of the data and then intersect from around 15 other subsets of data depending on a user subscription. We're returning facets on several fields, and sometimes a blank (q=*:*) query is run purely to get the facets for all of the data that the user can access. This information is turned into browse information and can be different for each user. Running performance tests using jMeter sequentially with a single user, these blank queries are slower than the normal queries, but still in the 1-4sec range. Unfortunately if I increase the number of test threads so that more than one of the blank queries is submitted while one is already being processed, everything grinds to a halt and the responses to these blank queries can take up to 125secs to be returned! This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields for the first 100 records in the list from the cache - or so I thought. Can anyone tell me what might be causing this dramatic slowdown? Any suggestions for solutions would be gratefully received. Thans Andy. -- View this message in context: http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15206563.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Slow response times using *:*
Actually I do need all facets for a field, although I've just realised that the tests are limited to only 100. Ooops. So it should be worse in reality... erk. Since that's what we do with our current search engine, Solr has to be able to compete with this. The fields are a mix of non-multi, non-tokenized and others which are. I've yet to experiment with this. Thanks. shalinmangar wrote: I can't give you a definitive answer based on the data you've provided. However, do you really need to get *all* facets? Can't you limit them with facet.limit field? Are you planning to run multiple *:* queries with all facets turned on a 58GB index in a live system? I don't think that's a good idea. As for the 125 seconds, I think it is probably because of paging issues. Are you faceting on multivalued or tokenized fields? In that case, Solr uses field queries which consume a lot of memory if the number of unique terms are large. On Jan 31, 2008 9:13 PM, Andy Blower [EMAIL PROTECTED] wrote: I'm evaluating SOLR/Lucene for our needs and currently looking at performance since 99% of the functionality we're looking for is provided. The index contains 18.4 Million records and is 58Gb in size. Most queries are acceptably quick, once the filters are cached. The filters select one or more of three subsets of the data and then intersect from around 15 other subsets of data depending on a user subscription. We're returning facets on several fields, and sometimes a blank (q=*:*) query is run purely to get the facets for all of the data that the user can access. This information is turned into browse information and can be different for each user. Running performance tests using jMeter sequentially with a single user, these blank queries are slower than the normal queries, but still in the 1-4sec range. Unfortunately if I increase the number of test threads so that more than one of the blank queries is submitted while one is already being processed, everything grinds to a halt and the responses to these blank queries can take up to 125secs to be returned! This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields for the first 100 records in the list from the cache - or so I thought. Can anyone tell me what might be causing this dramatic slowdown? Any suggestions for solutions would be gratefully received. Thans Andy. -- View this message in context: http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15206563.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15208594.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow response times using *:*
How often does the index change? Can you use an HTTP cache and do this once for each new index? wunder On 1/31/08 9:09 AM, Andy Blower [EMAIL PROTECTED] wrote: Actually I do need all facets for a field, although I've just realised that the tests are limited to only 100. Ooops. So it should be worse in reality... erk. Since that's what we do with our current search engine, Solr has to be able to compete with this. The fields are a mix of non-multi, non-tokenized and others which are. I've yet to experiment with this. Thanks. shalinmangar wrote: I can't give you a definitive answer based on the data you've provided. However, do you really need to get *all* facets? Can't you limit them with facet.limit field? Are you planning to run multiple *:* queries with all facets turned on a 58GB index in a live system? I don't think that's a good idea. As for the 125 seconds, I think it is probably because of paging issues. Are you faceting on multivalued or tokenized fields? In that case, Solr uses field queries which consume a lot of memory if the number of unique terms are large. On Jan 31, 2008 9:13 PM, Andy Blower [EMAIL PROTECTED] wrote: I'm evaluating SOLR/Lucene for our needs and currently looking at performance since 99% of the functionality we're looking for is provided. The index contains 18.4 Million records and is 58Gb in size. Most queries are acceptably quick, once the filters are cached. The filters select one or more of three subsets of the data and then intersect from around 15 other subsets of data depending on a user subscription. We're returning facets on several fields, and sometimes a blank (q=*:*) query is run purely to get the facets for all of the data that the user can access. This information is turned into browse information and can be different for each user. Running performance tests using jMeter sequentially with a single user, these blank queries are slower than the normal queries, but still in the 1-4sec range. Unfortunately if I increase the number of test threads so that more than one of the blank queries is submitted while one is already being processed, everything grinds to a halt and the responses to these blank queries can take up to 125secs to be returned! This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields for the first 100 records in the list from the cache - or so I thought. Can anyone tell me what might be causing this dramatic slowdown? Any suggestions for solutions would be gratefully received. Thans Andy. -- View this message in context: http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15206563.ht ml Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Slow response times using *:*
Yonik Seeley wrote: *:* maps to MatchAllDocsQuery, which for each document needs to check if it's deleted (that's a synchronized call, and can be a bottleneck). Why does this need to check if documents are deleted if normal queries don't? Is there any way of disabling this since I can be sure this isn't the case after indexing and optimizing. Yonik Seeley wrote: A cheap workaround is that if you know of a term that is in every document, (or a field in every document that has very few terms), then substitute a query on that for *:* Substituting one of your filters as the base query might also work. Would duplicating one of my filters cause any issues? That would be easy. Otherwise I'll try the substitution and see if it helps much. Yonik Seeley wrote: This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields for the first 100 records in the list from the cache - or so I thought. To calculate the DocSet (the set of all documents matching *:* and your filters), Solr can just use it's caches as long as *:* and the filters have been used before. *But*, to retrieve the top 10 documents matching *:* and your filters, the query must be re-run. That is probably where the time is being spent. Since you aren't looking for relevancy scores at all, but just faceting, it seems like we could potentially optimize this in Solr. I'm actually retrieving the first 100 in my tests, which will be necessary in one of the two scenarios we use blank queries for. The other scenario doesn't require any docs at all - just the facets, and I've not put that in my tests. What would the situation be if I specified a sort order for the facets and/or retrieved no docs at all? I'd be sorting the facets alphabetically, which is currently done by my app rather than the search engine. (since I sometimes have to merge facets from more than one field) I had assumed that no doc would be considered more relevant than any other without any query terms - i.e. filter query terms wouldn't affect relevance. This seems sensible to me, but maybe that's only because our current search engine works that way. Regarding optimization, I certainly think that being able to access all facets for subsets of the indexed data (defined by the filter query) is an incredibly useful feature. My search engine usage may not be very common though. What it means to us is that we can drive all aspects of our sites from the search engine, not just the obvious search forms. Yonik Seeley wrote: In the future, we could also do some query optimization by sometimes combining filters with the base query. -Yonik Sorry, that flew over my head.. Thanks very much for your help. I wish I had more time during this evaluation to delve into the code. I don't suppose there's a document with guided tour of the codebase anywhere is there? ;-) P.S. I re-ran my tests without returning facets whilst writing this and didn't get the slowdowns with 4 or 10 threads, does this help? -- View this message in context: http://www.nabble.com/Slow-response-times-using-*%3A*-tp15206563p15209605.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow response times using *:*
On 31-Jan-08, at 9:41 AM, Andy Blower wrote: Yonik Seeley wrote: This surprises me because the filter query submitted has usually already been submitted along with a normal query, and so should be cached in the filter cache. Surely all solr needs to do is return a handful of fields for the first 100 records in the list from the cache - or so I thought. To calculate the DocSet (the set of all documents matching *:* and your filters), Solr can just use it's caches as long as *:* and the filters have been used before. *But*, to retrieve the top 10 documents matching *:* and your filters, the query must be re-run. That is probably where the time is being spent. Since you aren't looking for relevancy scores at all, but just faceting, it seems like we could potentially optimize this in Solr. I'm actually retrieving the first 100 in my tests, which will be necessary in one of the two scenarios we use blank queries for. The other scenario doesn't require any docs at all - just the facets, and I've not put that in my tests. What would the situation be if I specified a sort order for the facets and/or retrieved no docs at all? I'd be sorting the facets alphabetically, which is currently done by my app rather than the search engine. (since I sometimes have to merge facets from more than one field) First question: What is the use of retrieving 100 documents if there is no defined sort order? The situation could be optimized in Solr, but there is a related case that _is_ optimized that should be almost as fast. If you a) don't ask for document score in field list (fl) b) enable useFilterForSortedQuery in solrconfig.xml c) specify _some_ sort order other than score Then Solr will do cached bitset intersections only. It will also do sorting, but that may not be terribly expensive. If it is close to the desired performance, it would be relatively easy to patch solr to not do that step. (Note: this is query sort, no facet sort). I had assumed that no doc would be considered more relevant than any other without any query terms - i.e. filter query terms wouldn't affect relevance. This seems sensible to me, but maybe that's only because our current search engine works that way. It won't, but it will still try to calculate the score if you ask it to (all docs will score the same, though). Regarding optimization, I certainly think that being able to access all facets for subsets of the indexed data (defined by the filter query) is an incredibly useful feature. My search engine usage may not be very common though. What it means to us is that we can drive all aspects of our sites from the search engine, not just the obvious search forms. I also use this feature. It would be useful to optimize the case where rows=0. -Mike
Re: Slow response
Hi Mike, Thanks for clarifying what has been a bit of a black box to me. A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued=false, with a type of string or integer (untokenized), does solr automatically use approach 2? Or is this something I have to actively configure? And is approach 2 better than 1? Or vice versa? Or is the answer it depends? :-) If, as I suspect, the answer was it depends, are there any general guidelines on when to use or approach or the other? Thanks, Tom On 9/6/07, Mike Klaas [EMAIL PROTECTED] wrote: On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field enumeration: cache the field contents, and generate counts using this data. Relatively independent of #unique terms, but requires at most a single facet value per field per document. So, if you factor author into Primary author/Secondary author, where each is guaranteed to only have one value per doc, this could greatly accelerate your faceting. There are probably fewer unique subjects, so strategy 1 is likely fine. To use strategy 2, just make sure that multivalued=false is set for those fields in schema.xml I forgot to mention that strategy 2 also requires a single token for each doc (see http://wiki.apache.org/solr/ FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) -Mike
Re: Slow response
On 14-Sep-07, at 3:38 PM, Tom Hill wrote: Hi Mike, Thanks for clarifying what has been a bit of a black box to me. A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued=false, with a type of string or integer (untokenized), does solr automatically use approach 2? Or is this something I have to actively configure? It'll happen automatically. And is approach 2 better than 1? Or vice versa? Or is the answer it depends? :-) It depends :) If, as I suspect, the answer was it depends, are there any general guidelines on when to use or approach or the other? Yeah, it usually depends on how many unique facet values there are, how many documents are returned in the query, and how much memory you have. 1 is usually faster when there are few terms; 2 is usually faster when there are many terms. Things can be further complicated by additional parameters, like facet.enum.cache.minDf (http://wiki.apache.org/solr/ SimpleFacetParameters#head-3ea6fc5d1056447295c38c9675e35ce06fd95f97) -Mike On 9/6/07, Mike Klaas [EMAIL PROTECTED] wrote: On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field enumeration: cache the field contents, and generate counts using this data. Relatively independent of #unique terms, but requires at most a single facet value per field per document. So, if you factor author into Primary author/Secondary author, where each is guaranteed to only have one value per doc, this could greatly accelerate your faceting. There are probably fewer unique subjects, so strategy 1 is likely fine. To use strategy 2, just make sure that multivalued=false is set for those fields in schema.xml I forgot to mention that strategy 2 also requires a single token for each doc (see http://wiki.apache.org/solr/ FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) -Mike
RE: Slow response
Thank-you for your response, this does shed some light on the subject. Our basic question was why were we seeing slower responses the smaller our result set got. Currently we are searching about 1.2 million documents with the source document about 2KB, but we do duplicate some of the data. I bumped up my filterCache to 5 million and the 2nd search I did for an non-indexed term came back in 2.1 seconds so that is much improved. I am a little concerned about having this value so high but this is our problem and we will play with it. I do have a few follow-up questions. First, in regards to the filterCache once a single search has been done and facets requested, as long as new facets aren't requested and the size is large enough then the filters will remain in the cache, correct? Also, you mention that faceting is more a function of the number of the number of terms in the field. The 2 fields causing our problems are Authors and Subjects. If we divided up the data that made these facets into more specific fields (Primary author, secondary author, etc.) would this perform better? So the number of facet fields would increase but the unique terms for a given facet should be less. Thanks again for all your help. Aaron -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, September 06, 2007 4:17 PM To: solr-user@lucene.apache.org Subject: Re: Slow response On 9/6/07, Aaron Hammond [EMAIL PROTECTED] wrote: I am pretty new to Solr and this is my first post to this list so please forgive me if I make any glaring errors. Here's my problem. When I do a search using the Solr admin interface for a term that I know does not exist in my index the QTime is about 1ms. However, if I add facets to the search the response takes more than 20 seconds (and sometimes longer) to return. Here is the slow URL - Faceting on multi-value fields is more a function of the number of terms in the field (and their distribution) rather than the number of hits for a query. That said, perhaps faceting should be able to bail out if there are no hits. Is your question more about why faceting takes so long in general, or why it takes so long if there are no results? If you haven't, try optimizing your index for facet faceting in general. How many docs do you have in your index? As a side note, the way multi-valued faceting currently works, it's actually normally faster if the query returns a large number of hits. -Yonik
Re: Slow response
On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote: Thank-you for your response, this does shed some light on the subject. Our basic question was why were we seeing slower responses the smaller our result set got. Currently we are searching about 1.2 million documents with the source document about 2KB, but we do duplicate some of the data. I bumped up my filterCache to 5 million and the 2nd search I did for an non-indexed term came back in 2.1 seconds so that is much improved. I am a little concerned about having this value so high but this is our problem and we will play with it. I do have a few follow-up questions. First, in regards to the filterCache once a single search has been done and facets requested, as long as new facets aren't requested and the size is large enough then the filters will remain in the cache, correct? Also, you mention that faceting is more a function of the number of the number of terms in the field. The 2 fields causing our problems are Authors and Subjects. If we divided up the data that made these facets into more specific fields (Primary author, secondary author, etc.) would this perform better? So the number of facet fields would increase but the unique terms for a given facet should be less. There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field enumeration: cache the field contents, and generate counts using this data. Relatively independent of #unique terms, but requires at most a single facet value per field per document. So, if you factor author into Primary author/Secondary author, where each is guaranteed to only have one value per doc, this could greatly accelerate your faceting. There are probably fewer unique subjects, so strategy 1 is likely fine. To use strategy 2, just make sure that multivalued=false is set for those fields in schema.xml -Mike
Re: Slow response
On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: There are essentially two facet computation strategies: 1. cached bitsets: a bitset for each term is generated and intersected with the query restul bitset. This is more general and performs well up to a few thousand terms. 2. field enumeration: cache the field contents, and generate counts using this data. Relatively independent of #unique terms, but requires at most a single facet value per field per document. So, if you factor author into Primary author/Secondary author, where each is guaranteed to only have one value per doc, this could greatly accelerate your faceting. There are probably fewer unique subjects, so strategy 1 is likely fine. To use strategy 2, just make sure that multivalued=false is set for those fields in schema.xml I forgot to mention that strategy 2 also requires a single token for each doc (see http://wiki.apache.org/solr/ FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) -Mike