On 12/22/2014 04:27 PM, Erick Erickson wrote:
Have you read Hossman's blog here?
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl

Oh thanks, that's a pretty interesting read. The scale we're investigating is several orders of magnitude larger than what was tested there, so I'm still a bit worried.

Because if you're trying this and _still_ getting bad performance we
need to know.

I'll definitely keep you posted when our test results on larger indexes (~50 billion documents) come in, but this sadly won't be any time soon (infrastructure sucks). The largest index I currently have access to is about a billion documents in size. Paging there is a nightmare, but the Solr version is too old to support cursors so I'm afraid I can't offer any useful data.

Does anyone have any performance data on multi-billion-document indexes? With or without SolrCloud?

Bram:
One minor pedantic clarification.. The first round-trip only returns
the id and sort criteria (score by default), not the whole document,
although the effect is the same, as you page N into the corpus, the
default implementation returns N * (pageNum + 1) entries. Even worse,
each node itself has to _sort_ that many entries.... Then a second
call is made to get the page-worth of docs...

I was trying to keep it short and sweet, but yes, that's the way I think it works ;-)

That said, though, its pretty easy to argue that the 500th page is
pretty useless, nobody will ever hit the "next page" button 499 times.

Nobody will hit next 499 times, but a lot of our users skip to the last page quite often. Maybe I should make *that* as hard as possible. Hmm.

Thanks for the tips!

 - Bram

Reply via email to