Re: Why are cursor mark queries recommended over regular start, rows combination?

Chris Hostetter Tue, 13 Mar 2018 14:05:31 -0700

: > 3) Lastly, it is not clear the role of export handler. It seems that the
: > export handler would also have to do exactly the same kind of thing as
: > start=0 and rows=1000,000. And that again means bad performance.
        
: <3> First, streaming requests can only return docValues="true"
: fields.Second, most streaming operations require sorting on something
: besides score. Within those constraints, streaming will be _much_
: faster and more efficient than cursorMark. Without tuning I saw 200K
: rows/second returned for streaming, the bottleneck will be the speed
: that the client can read from the network. First of all you only
: execute one query rather than one query per N rows. Second, in the
: cursorMark case, to return a document you and assuming that any field
: you return is docValues=false


Just to clarify, there is big difference between the /export handler 
and "streaming expressions"

Unless something has changed drasticly in the past few releases, the 
/export handler does *NOT* support exporting a full *collection* in solr 
cloud -- it only operates on an individual core (aka: shard/replica).  

Streaming expressions is a feature that does work in Cloud mode, and can 
make calls to the /export handler on a replica of each shard in order to 
process the data of an entire collection -- but when doing so it has to 
aggregate the *ALL* the results from every shard in memory on the 
coordinating node -- meaning that (in addition to the docvalues caveat) 
streaming expressions requires you to "spend" a lot of ram usage on one 
node as a trade off for spending more time & multiple requests to get teh 
same data from cursorMark...

https://lucene.apache.org/solr/guide/exporting-result-sets.html
https://lucene.apache.org/solr/guide/streaming-expressions.html

An additional perk of cursorMakr that may be relevant to the OP is that 
you can "stop" tailing a cursor at anytime (ie: if you're post processing 
the results client side and decide you have "enough" results) but a simila 
feature isn't available (AFAICT) from streaming expressions...

https://lucene.apache.org/solr/guide/pagination-of-results.html#tailing-a-cursor


-Hoss
http://www.lucidworks.com/

Re: Why are cursor mark queries recommended over regular start, rows combination?

Reply via email to