Re: Optimal size for queries?
On Wed, Apr 15, 2020 at 10:09:59AM +0100, Colvin Cowie wrote: > Hi, I can't answer the question as to what the optimal size of rows per > request is. I would expect it to depend on the number of stored fields > being marshaled, and their type, and your hardware. It was a somewhat naive question, but I wasn't sure how to ask a better one. Having thought a bit more, I expect that the eventual solution to my problem will include a number of different changes, including larger pages, tuning several caches, providing a progress indicator to the user, and (as you point out below) re-thinking how I ask Solr for so many documents. > But using start + rows is a *bad thing* for deep paging. You need to use > cursorMark, which looks like it was added in 4.7 originally > https://issues.apache.org/jira/browse/SOLR-5463 > There's a description on the newer reference guide > https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors > and in the 4.10 PDF on page 305 > https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf > > http://yonik.com/solr/paging-and-deep-paging/ Thank you for the links. I think these will be very helpful. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu signature.asc Description: PGP signature
Re: Optimal size for queries?
Hi, I can't answer the question as to what the optimal size of rows per request is. I would expect it to depend on the number of stored fields being marshaled, and their type, and your hardware. But using start + rows is a *bad thing* for deep paging. You need to use cursorMark, which looks like it was added in 4.7 originally https://issues.apache.org/jira/browse/SOLR-5463 There's a description on the newer reference guide https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors and in the 4.10 PDF on page 305 https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf http://yonik.com/solr/paging-and-deep-paging/ On Fri, 10 Apr 2020 at 19:05, Mark H. Wood wrote: > I need to pull a *lot* of records out of a core, to be statistically > analyzed and the stat.s presented to the user, who is sitting at a > browser waiting. So far I haven't seen a way to calculate the stat.s > I need in Solr itself. It's difficult to know the size of the total > result, so I'm running the query repeatedly and windowing the results > with 'start' and 'rows'. I just guessed that a window of 1000 > documents would be reasonable. We currently have about 48GB in the > core. > > The product uses Solr 4.10. Yes, I know that's very old. > > What I got is that every three seconds or so I get another 1000 > documents, totalling around 500KB per response. For a user request > for a large range, this is taking way longer than the user's browser > is willing to wait. The single CPU on my test box is at 99% > continuously, and Solr's memory use is around 90% of 8GB. The test > hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @ > 2.70GHz'. > > A sample query: > > 0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET > /solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2 > HTTP/1.1" 200 497475 "-" > "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" > > As you can see, my test was getting close to 1000 windows. It's still > going. I don't know how far along that is. > > So I'm wondering: > > o how can I do better than guessing that 1000 is a good window size? >How big a response is too big? > > o what else should I be thinking about? > > o given that my test on a full-sized copy of the live data has been >running for an hour and is still going, is it totally impractical >to expect that I can improve the process enough to give a response >to an ad-hoc query while-you-wait? > > -- > Mark H. Wood > Lead Technology Analyst > > University Library > Indiana University - Purdue University Indianapolis > 755 W. Michigan Street > Indianapolis, IN 46202 > 317-274-0749 > www.ulib.iupui.edu >
Optimal size for queries?
I need to pull a *lot* of records out of a core, to be statistically analyzed and the stat.s presented to the user, who is sitting at a browser waiting. So far I haven't seen a way to calculate the stat.s I need in Solr itself. It's difficult to know the size of the total result, so I'm running the query repeatedly and windowing the results with 'start' and 'rows'. I just guessed that a window of 1000 documents would be reasonable. We currently have about 48GB in the core. The product uses Solr 4.10. Yes, I know that's very old. What I got is that every three seconds or so I get another 1000 documents, totalling around 500KB per response. For a user request for a large range, this is taking way longer than the user's browser is willing to wait. The single CPU on my test box is at 99% continuously, and Solr's memory use is around 90% of 8GB. The test hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz'. A sample query: 0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET /solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2 HTTP/1.1" 200 497475 "-" "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" As you can see, my test was getting close to 1000 windows. It's still going. I don't know how far along that is. So I'm wondering: o how can I do better than guessing that 1000 is a good window size? How big a response is too big? o what else should I be thinking about? o given that my test on a full-sized copy of the live data has been running for an hour and is still going, is it totally impractical to expect that I can improve the process enough to give a response to an ad-hoc query while-you-wait? -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu signature.asc Description: PGP signature