Re: SolrCloud & Paging on large indexes

Bram Van Dam Tue, 23 Dec 2014 01:54:13 -0800

On 12/22/2014 04:27 PM, Erick Erickson wrote:

Have you read Hossman's blog here?
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl

Oh thanks, that's a pretty interesting read. The scale we'reinvestigating is several orders of magnitude larger than what was testedthere, so I'm still a bit worried.

Because if you're trying this and _still_ getting bad performance we
need to know.

I'll definitely keep you posted when our test results on larger indexes(~50 billion documents) come in, but this sadly won't be any time soon(infrastructure sucks). The largest index I currently have access to isabout a billion documents in size. Paging there is a nightmare, but theSolr version is too old to support cursors so I'm afraid I can't offerany useful data.

Does anyone have any performance data on multi-billion-document indexes?With or without SolrCloud?

Bram:
One minor pedantic clarification.. The first round-trip only returns
the id and sort criteria (score by default), not the whole document,
although the effect is the same, as you page N into the corpus, the
default implementation returns N * (pageNum + 1) entries. Even worse,
each node itself has to _sort_ that many entries.... Then a second
call is made to get the page-worth of docs...

I was trying to keep it short and sweet, but yes, that's the way I thinkit works ;-)

That said, though, its pretty easy to argue that the 500th page is
pretty useless, nobody will ever hit the "next page" button 499 times.

Nobody will hit next 499 times, but a lot of our users skip to the lastpage quite often. Maybe I should make *that* as hard as possible. Hmm.


Thanks for the tips!

 - Bram

Re: SolrCloud & Paging on large indexes

Reply via email to