Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

Santosh Narayan Fri, 23 Feb 2018 04:49:22 -0800

Thanks Jason. Hope this can be fixed in the next update of SolrJ.



On Thu, Feb 22, 2018 at 10:49 AM, Jason Gerlowski <gerlowsk...@gmail.com>
wrote:

> My apologies Santosh.  I added that comment a few releases back based
> on a misunderstanding I've only recently been disabused of.  I will
> correct it.
>
> Anyway, Shawn's explanation above is correct.  The queueSize parameter
> doesn't control batching, as he clarified.  Sorry for the trouble.
>
> Best,
>
> Jason
>
> On Wed, Feb 21, 2018 at 8:50 PM, Santosh Narayan
> <santosh.narayan....@gmail.com> wrote:
> > Thanks for the explanation Shawn. Very helpful. I think I got misled by
> the
> > JavaDoc text for
> > *ConcurrentUpdateSolrClient.Builder.withQueueSize*
> >     /**
> >      * The number of documents to batch together before sending to Solr.
> If
> > not set, this defaults to 10.
> >      */
> >     public Builder withQueueSize(int queueSize) {
> >       if (queueSize <= 0) {
> >         throw new IllegalArgumentException("queueSize must be a positive
> > integer.");
> >       }
> >       this.queueSize = queueSize;
> >       return this;
> >     }
> >
> >
> >
> > On Thu, Feb 22, 2018 at 9:41 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >
> >> On 2/21/2018 7:41 AM, Santosh Narayan wrote:
> >> > May be it is my understanding of the documentation. As per the
> >> > JavaDoc, ConcurrentUpdateSolrClient
> >> > buffers all added documents and writes them into open HTTP
> connections.
> >> >
> >> > So I thought that this class would buffer documents in the client side
> >> > itself till the QueueSize is reached and then send all the cached
> >> documents
> >> > together in one HTTP request. Is this not the case?
> >>
> >> That's not how it's designed.
> >>
> >> What ConcurrentUpdateSolrClient does differently than HttpSolrClient or
> >> CloudSolrClient is return control immediately to your program when you
> >> send an update, and begin processing that update in the background.  If
> >> you send a LOT of updates very quickly, then the queue will get larger,
> >> and will typically be processed in parallel by multiple threads.  The
> >> client won't wait for the queue to fill.  Processing of the first update
> >> you send should begin right after you add it.
> >>
> >> Something to consider:  Because control is returned to your program
> >> immediately, and the response is always a success, your program will
> >> never be informed about any problems with your adds when you use the
> >> concurrent client.  The concurrent client is a great choice for initial
> >> bulk indexing, because it offers multi-threaded indexing without any
> >> need to handle the threads yourself.  But you don't get any kind of
> >> error handling.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

Reply via email to