Greg and I are talking about the same type of parallel.

We do the same thing - if I know there are 10,000 results, we can chunk
that up across multiple worker threads up front without having to page
through the results.  We know there are 10 chunks of 1,000, so we can have
one thread process 0-1000 while another thread starts on 1000-2000 at the
same time.

The only idea I've had so far is that you could have a single thread up
front iterate through the entire result set, perhaps asking for 'null' from
the the fl param (to make the response more light weight) and record all
the next cursorMark tokens - then just fire those off to the workers as you
get them. depending on the amount of processing being done for each
response it might give you some optimizations from being
multi-threaded...or maybe the overhead of calculating the cursorMarks isn't
worth the effort.  Haven't tried either way yet.

Mike


On Mon, Mar 17, 2014 at 6:54 PM, Greg Pendlebury
<greg.pendleb...@gmail.com>wrote:

> Sorry, I meant one thread requesting records 1 - 1000, whilst the next
> thread requests 1001 - 2000 from the same ordered result set. We've
> observed several of our customers trying to harvest our data with
> multi-threaded scripts that work like this. I thought it would not work
> using cursor marks... but:
>
> A) I could be wrong, and
> B) I could be talking about parallel in a different way to Mike.
>
> Ta,
> Greg
>
>
>
> On 18 March 2014 10:24, Yonik Seeley <yo...@heliosearch.com> wrote:
>
> > On Mon, Mar 17, 2014 at 7:14 PM, Greg Pendlebury
> > <greg.pendleb...@gmail.com> wrote:
> > > My suspicion is that it won't work in parallel
> >
> > Deep paging with cursorMark does work with distributed search
> > (assuming that's what you meant by "parallel"... querying sub-shards
> > in parallel?).
> >
> > -Yonik
> > http://heliosearch.org - solve Solr GC pauses with off-heap filters
> > and fieldcache
> >
>

Reply via email to