I'm pretty sure you can use Streaming Expressions to get all the rows
back from a sharded collection without chewing up lots of memory.

             sort="id asc",

on a sharded SolrCloud installation, I believe you'll get all the rows back.

1> Some while ago you couldn't _stop_ the stream part way through.
down in the SolrJ world you could read from a stream for a while and
call close on it but that would just spin in the background until it
reached EOF. Search the JIRA list if you need (can't find the JIRA
right now, 6.6 IIRC is OK and, of course, 7.3).

This shouldn't chew up memory since the streams are sorted, so what
you get in the response is the ordered set of tuples.

Some of the join streams _do_ have to hold all the results in memory,
so look at the docs if you wind up using those.


On Wed, Mar 14, 2018 at 9:20 AM, S G <sg.online.em...@gmail.com> wrote:
> Thanks everybody. This is lot of good information.
> And we should try to update this in the documentation too to help users
> make the right choice.
> I can take a stab at this if someone can point me how to update the
> documentation.
> Thanks
> SG
> On Tue, Mar 13, 2018 at 2:04 PM, Chris Hostetter <hossman_luc...@fucit.org>
> wrote:
>> : > 3) Lastly, it is not clear the role of export handler. It seems that
>> the
>> : > export handler would also have to do exactly the same kind of thing as
>> : > start=0 and rows=1000,000. And that again means bad performance.
>> : <3> First, streaming requests can only return docValues="true"
>> : fields.Second, most streaming operations require sorting on something
>> : besides score. Within those constraints, streaming will be _much_
>> : faster and more efficient than cursorMark. Without tuning I saw 200K
>> : rows/second returned for streaming, the bottleneck will be the speed
>> : that the client can read from the network. First of all you only
>> : execute one query rather than one query per N rows. Second, in the
>> : cursorMark case, to return a document you and assuming that any field
>> : you return is docValues=false
>> Just to clarify, there is big difference between the /export handler
>> and "streaming expressions"
>> Unless something has changed drasticly in the past few releases, the
>> /export handler does *NOT* support exporting a full *collection* in solr
>> cloud -- it only operates on an individual core (aka: shard/replica).
>> Streaming expressions is a feature that does work in Cloud mode, and can
>> make calls to the /export handler on a replica of each shard in order to
>> process the data of an entire collection -- but when doing so it has to
>> aggregate the *ALL* the results from every shard in memory on the
>> coordinating node -- meaning that (in addition to the docvalues caveat)
>> streaming expressions requires you to "spend" a lot of ram usage on one
>> node as a trade off for spending more time & multiple requests to get teh
>> same data from cursorMark...
>> https://lucene.apache.org/solr/guide/exporting-result-sets.html
>> https://lucene.apache.org/solr/guide/streaming-expressions.html
>> An additional perk of cursorMakr that may be relevant to the OP is that
>> you can "stop" tailing a cursor at anytime (ie: if you're post processing
>> the results client side and decide you have "enough" results) but a simila
>> feature isn't available (AFAICT) from streaming expressions...
>> https://lucene.apache.org/solr/guide/pagination-of-
>> results.html#tailing-a-cursor
>> -Hoss
>> http://www.lucidworks.com/

Reply via email to