Greg and I are talking about the same type of parallel. We do the same thing - if I know there are 10,000 results, we can chunk that up across multiple worker threads up front without having to page through the results. We know there are 10 chunks of 1,000, so we can have one thread process 0-1000 while another thread starts on 1000-2000 at the same time.
The only idea I've had so far is that you could have a single thread up front iterate through the entire result set, perhaps asking for 'null' from the the fl param (to make the response more light weight) and record all the next cursorMark tokens - then just fire those off to the workers as you get them. depending on the amount of processing being done for each response it might give you some optimizations from being multi-threaded...or maybe the overhead of calculating the cursorMarks isn't worth the effort. Haven't tried either way yet. Mike On Mon, Mar 17, 2014 at 6:54 PM, Greg Pendlebury <greg.pendleb...@gmail.com>wrote: > Sorry, I meant one thread requesting records 1 - 1000, whilst the next > thread requests 1001 - 2000 from the same ordered result set. We've > observed several of our customers trying to harvest our data with > multi-threaded scripts that work like this. I thought it would not work > using cursor marks... but: > > A) I could be wrong, and > B) I could be talking about parallel in a different way to Mike. > > Ta, > Greg > > > > On 18 March 2014 10:24, Yonik Seeley <yo...@heliosearch.com> wrote: > > > On Mon, Mar 17, 2014 at 7:14 PM, Greg Pendlebury > > <greg.pendleb...@gmail.com> wrote: > > > My suspicion is that it won't work in parallel > > > > Deep paging with cursorMark does work with distributed search > > (assuming that's what you meant by "parallel"... querying sub-shards > > in parallel?). > > > > -Yonik > > http://heliosearch.org - solve Solr GC pauses with off-heap filters > > and fieldcache > > >