Re: Retrieving 1000 records at a time
Thanks Shawn! Best, Mark. On Wed, Feb 17, 2016 at 7:48 PM, Shawn Heiseywrote: > On 2/17/2016 3:49 PM, Mark Robinson wrote: > > I have around 121 fields out of which 12 of them are indexed and almost > all > > 121 are stored. > > Average size of a doc is 10KB. > > > > I was checking for start=0, rows=1000. > > We were querying a Solr instance which was on another server and I think > > network lag might have come into the picture also. > > > > I did not go for any caching as I wanted good response time in the first > > time querying itself. > > Stored fields, which contain the data that is returned to the client in > the response, are compressed on disk. Uncompressing this data can > contribute to the time on a slow query, but I do not think it can > explain 30 seconds of delay. Very large documents can be particularly > slow to decompress, but you have indicated that each entire document is > about 10K in size, which is not huge. > > It is more likely that the delay is caused by one of two things, > possibly both: > > * Extremely long garbage collection pauses due to a heap that is too > small or VERY huge (beyond 32GB) with inadequate GC tuning. > * Not enough system memory to effectively cache the index. > > Some additional info that may be helpful in tracking this down further: > > * For each core on one machine, the size on disk of the data directory. > * For each core, the number of documents and the number of deleted > documents. > * The max heap size for the Solr JVM. > * Whether there is more than one Solr instance per server. > * The total installed memory size in the server. > * Whether or not the server is used for other applications. > * What operating system the server is running. > * Whether the index is distributed or contained in a single core. > * Whether Solr is in SolrCloud mode or not. > * Solr version. > > Thanks, > Shawn > >
Re: Retrieving 1000 records at a time
On 2/17/2016 3:49 PM, Mark Robinson wrote: > I have around 121 fields out of which 12 of them are indexed and almost all > 121 are stored. > Average size of a doc is 10KB. > > I was checking for start=0, rows=1000. > We were querying a Solr instance which was on another server and I think > network lag might have come into the picture also. > > I did not go for any caching as I wanted good response time in the first > time querying itself. Stored fields, which contain the data that is returned to the client in the response, are compressed on disk. Uncompressing this data can contribute to the time on a slow query, but I do not think it can explain 30 seconds of delay. Very large documents can be particularly slow to decompress, but you have indicated that each entire document is about 10K in size, which is not huge. It is more likely that the delay is caused by one of two things, possibly both: * Extremely long garbage collection pauses due to a heap that is too small or VERY huge (beyond 32GB) with inadequate GC tuning. * Not enough system memory to effectively cache the index. Some additional info that may be helpful in tracking this down further: * For each core on one machine, the size on disk of the data directory. * For each core, the number of documents and the number of deleted documents. * The max heap size for the Solr JVM. * Whether there is more than one Solr instance per server. * The total installed memory size in the server. * Whether or not the server is used for other applications. * What operating system the server is running. * Whether the index is distributed or contained in a single core. * Whether Solr is in SolrCloud mode or not. * Solr version. Thanks, Shawn
Re: Retrieving 1000 records at a time
Thanks Joel and Chris! I have around 121 fields out of which 12 of them are indexed and almost all 121 are stored. Average size of a doc is 10KB. I was checking for start=0, rows=1000. We were querying a Solr instance which was on another server and I think network lag might have come into the picture also. I did not go for any caching as I wanted good response time in the first time querying itself. Thanks much for the links and suggestions. I will go thru each of them. Best, Mark. On Wed, Feb 17, 2016 at 5:26 PM, Chris Hostetterwrote: > > : I have a requirement where I need to retrieve 1 to 15000 records at a > : time from SOLR. > : With 20 or 100 records everything happens in milliseconds. > : When it goes to 1000, 1 it is taking more time... like even 30 > seconds. > > so far all you've really told us about your setup is that some > queries with "rows=1000" are slow -- but you haven't really told us > anything else we can help you with -- for example it's not obvious if you > mean that you are using start=0 in all ofthose queries andthey are slow, > or if you mean you are paginating through results (ie: increasing start > param) 1000 at a time nad it starts getting slow as you page deeply. > > you also haven't told us anything about the fields you are returning -- > how many are there?, what data types are they? are they large string > values? > > how are you measuring the time? are you sure network lag, or client side > processing of the data as solr returns it isn't the bulk of the time you > are measuring? what does the QTime in the solr responses for these slow > queries say? > > my best guesses are that either: you are doing deep paging and conflating > the increased response time for deep results with an increase in response > time for large rows params (because you are getting "deeper" faster with a > large rows#) or you are seeing an increase in processing time on the > client due ot the large volume of data being returned -- possibly even > with SolrJ which is designed to parse the entire response into java > data structures by default before returning to the client. > > w/o more concrete information, it's hard to give you advice beyond > guesses. > > > potentially helpful links... > > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results > > https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ > > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets > > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > > https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html > > > > -Hoss > http://www.lucidworks.com/ >
Re: Retrieving 1000 records at a time
: I have a requirement where I need to retrieve 1 to 15000 records at a : time from SOLR. : With 20 or 100 records everything happens in milliseconds. : When it goes to 1000, 1 it is taking more time... like even 30 seconds. so far all you've really told us about your setup is that some queries with "rows=1000" are slow -- but you haven't really told us anything else we can help you with -- for example it's not obvious if you mean that you are using start=0 in all ofthose queries andthey are slow, or if you mean you are paginating through results (ie: increasing start param) 1000 at a time nad it starts getting slow as you page deeply. you also haven't told us anything about the fields you are returning -- how many are there?, what data types are they? are they large string values? how are you measuring the time? are you sure network lag, or client side processing of the data as solr returns it isn't the bulk of the time you are measuring? what does the QTime in the solr responses for these slow queries say? my best guesses are that either: you are doing deep paging and conflating the increased response time for deep results with an increase in response time for large rows params (because you are getting "deeper" faster with a large rows#) or you are seeing an increase in processing time on the client due ot the large volume of data being returned -- possibly even with SolrJ which is designed to parse the entire response into java data structures by default before returning to the client. w/o more concrete information, it's hard to give you advice beyond guesses. potentially helpful links... https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions https://lucene.apache.org/solr/5_4_0/solr-solrj/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.html -Hoss http://www.lucidworks.com/
Re: Retrieving 1000 records at a time
Also are you ranking documents by score Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Feb 17, 2016 at 1:59 PM, Joel Bernsteinwrote: > A few questions for you: What types of fields and how many fields will you > be retrieving? What version of Solr are you using? > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinson > wrote: > >> Hi, >> >> I have a requirement where I need to retrieve 1 to 15000 records at a >> time from SOLR. >> With 20 or 100 records everything happens in milliseconds. >> When it goes to 1000, 1 it is taking more time... like even 30 >> seconds. >> >> Will Solr be able to return 1 records at a time in less than say 200 >> milliseconds? >> >> I have read that disk read is a costly affair so we have to batch results >> and lesser the number of records retrieved in a batch the faster the >> response when using SOLR. >> >> So is Solr a straight away NO candidate in a situation where 1 records >> should be retrieved in a time of <=200 mS. >> >> A quick response would be very helpful. >> >> Thanks! >> Mark >> > >
Re: Retrieving 1000 records at a time
A few questions for you: What types of fields and how many fields will you be retrieving? What version of Solr are you using? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Feb 17, 2016 at 1:37 PM, Mark Robinsonwrote: > Hi, > > I have a requirement where I need to retrieve 1 to 15000 records at a > time from SOLR. > With 20 or 100 records everything happens in milliseconds. > When it goes to 1000, 1 it is taking more time... like even 30 > seconds. > > Will Solr be able to return 1 records at a time in less than say 200 > milliseconds? > > I have read that disk read is a costly affair so we have to batch results > and lesser the number of records retrieved in a batch the faster the > response when using SOLR. > > So is Solr a straight away NO candidate in a situation where 1 records > should be retrieved in a time of <=200 mS. > > A quick response would be very helpful. > > Thanks! > Mark >
Retrieving 1000 records at a time
Hi, I have a requirement where I need to retrieve 1 to 15000 records at a time from SOLR. With 20 or 100 records everything happens in milliseconds. When it goes to 1000, 1 it is taking more time... like even 30 seconds. Will Solr be able to return 1 records at a time in less than say 200 milliseconds? I have read that disk read is a costly affair so we have to batch results and lesser the number of records retrieved in a batch the faster the response when using SOLR. So is Solr a straight away NO candidate in a situation where 1 records should be retrieved in a time of <=200 mS. A quick response would be very helpful. Thanks! Mark