Thanks Jason. Hope this can be fixed in the next update of SolrJ.
On Thu, Feb 22, 2018 at 10:49 AM, Jason Gerlowski <gerlowsk...@gmail.com> wrote: > My apologies Santosh. I added that comment a few releases back based > on a misunderstanding I've only recently been disabused of. I will > correct it. > > Anyway, Shawn's explanation above is correct. The queueSize parameter > doesn't control batching, as he clarified. Sorry for the trouble. > > Best, > > Jason > > On Wed, Feb 21, 2018 at 8:50 PM, Santosh Narayan > <santosh.narayan....@gmail.com> wrote: > > Thanks for the explanation Shawn. Very helpful. I think I got misled by > the > > JavaDoc text for > > *ConcurrentUpdateSolrClient.Builder.withQueueSize* > > /** > > * The number of documents to batch together before sending to Solr. > If > > not set, this defaults to 10. > > */ > > public Builder withQueueSize(int queueSize) { > > if (queueSize <= 0) { > > throw new IllegalArgumentException("queueSize must be a positive > > integer."); > > } > > this.queueSize = queueSize; > > return this; > > } > > > > > > > > On Thu, Feb 22, 2018 at 9:41 AM, Shawn Heisey <apa...@elyograg.org> > wrote: > > > >> On 2/21/2018 7:41 AM, Santosh Narayan wrote: > >> > May be it is my understanding of the documentation. As per the > >> > JavaDoc, ConcurrentUpdateSolrClient > >> > buffers all added documents and writes them into open HTTP > connections. > >> > > >> > So I thought that this class would buffer documents in the client side > >> > itself till the QueueSize is reached and then send all the cached > >> documents > >> > together in one HTTP request. Is this not the case? > >> > >> That's not how it's designed. > >> > >> What ConcurrentUpdateSolrClient does differently than HttpSolrClient or > >> CloudSolrClient is return control immediately to your program when you > >> send an update, and begin processing that update in the background. If > >> you send a LOT of updates very quickly, then the queue will get larger, > >> and will typically be processed in parallel by multiple threads. The > >> client won't wait for the queue to fill. Processing of the first update > >> you send should begin right after you add it. > >> > >> Something to consider: Because control is returned to your program > >> immediately, and the response is always a success, your program will > >> never be informed about any problems with your adds when you use the > >> concurrent client. The concurrent client is a great choice for initial > >> bulk indexing, because it offers multi-threaded indexing without any > >> need to handle the threads yourself. But you don't get any kind of > >> error handling. > >> > >> Thanks, > >> Shawn > >> > >> >