The indexer takes almost 2 hours to optimize. It has a multi-threaded add of batches of documents to org.apache.solr.client.solrj.impl.CloudSolrClient. Once all the documents are indexed it invokes commit and optimize. I have seen that the optimize goes into background after 10 minutes and indexer exits. I am not sure why this 10 minutes it hangs on indexer. This behavior I have seen in multiple iteration of the indexing of same data.
There is nothing significant I found in log which I can share. I can see following in log. org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <erickerick...@gmail.com> wrote: > All strange of course. What do your Solr logs show when this happens? > And how reproducible is this? > > Best, > Erick > > On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> wrote: > > In this case, optimising makes sense, once the index is generated, you > > are not updating It. > > > > Upayavira > > > > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: > >> Our index has almost 100M documents running on SolrCloud of 5 shards and > >> each shard has an index size of about 170+GB (for the record, we are not > >> using stored fields - our documents are pretty large). We perform a full > >> indexing every weekend and during the week there are no updates made to > >> the > >> index. Most of the queries that we run are pretty complex with hundreds > >> of > >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. > >> and take many minutes to execute. A difference of 10-20% is also a big > >> advantage for us. > >> > >> We have been optimizing the index after indexing for years and it has > >> worked well for us. Every once in a while, we upgrade Solr to the latest > >> version and try without optimizing so that we can save the many hours it > >> take to optimize such a huge index, but find optimized index work well > >> for > >> us. > >> > >> Erick I was indexing today the documents and saw the optimize happening > >> in > >> background. > >> > >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >> > No results yet. I finished the test harness last night (not really a > >> > unit test, a stand-alone program that endlessly adds stuff and tests > >> > that every commit returns the correct number of docs). > >> > > >> > 8,000 cycles later there aren't any problems reported. > >> > > >> > Siiigggggh. > >> > > >> > > >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather < > modather1...@gmail.com> > >> > wrote: > >> > > Hi, > >> > > > >> > > Erick you mentioned about a unit test to test the optimize running > in > >> > > background. Kindly share your findings if any. > >> > > > >> > > Thanks, > >> > > Modassar > >> > > > >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather < > modather1...@gmail.com > >> > > > >> > > wrote: > >> > > > >> > >> Thanks everybody for your replies. > >> > >> > >> > >> I have noticed the optimization running in background every time I > >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses the > >> > >> CloudSolrClient. Kindly share your findings on this issue. > >> > >> > >> > >> Our index has almost 100M documents running on SolrCloud. We have > been > >> > >> optimizing the index after indexing for years and it has worked > well for > >> > >> us. > >> > >> > >> > >> Thanks, > >> > >> Modassar > >> > >> > >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson < > >> > erickerick...@gmail.com> > >> > >> wrote: > >> > >> > >> > >>> Actually, I've recently seen very similar behavior in Solr > 4.10.3, but > >> > >>> involving hard commits openSearcher=true, see: > >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I > can't > >> > >>> reproduce this at will, siigggghhhh. > >> > >>> > >> > >>> A unit test should be very simple to write though, maybe I can > get to > >> > it > >> > >>> today. > >> > >>> > >> > >>> Erick > >> > >>> > >> > >>> > >> > >>> > >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <u...@odoko.co.uk> > wrote: > >> > >>> > > >> > >>> > > >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: > >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote: > >> > >>> >> > I am using Solr-5.1.0. I have an indexer class which invokes > >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits > after > >> > the > >> > >>> >> > invocation of optimize and the optimization keeps on running > in > >> > the > >> > >>> >> > background. > >> > >>> >> > Kindly let me know if it is per design and how can I make my > >> > indexer > >> > >>> to > >> > >>> >> > wait until the optimization is over. Is there a > >> > >>> configuration/parameter I > >> > >>> >> > need to set for the same. > >> > >>> >> > > >> > >>> >> > Please note that the same indexer with > >> > >>> cloudSolrServer.optimize(true, true, > >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was over > before > >> > >>> exiting. > >> > >>> >> > >> > >>> >> This is very odd, because I could not get HttpSolrServer to > >> > optimize in > >> > >>> >> the background, even when that was what I wanted. > >> > >>> >> > >> > >>> >> I wondered if maybe the Cloud object behaves differently with > >> > regard to > >> > >>> >> blocking until an optimize is finished ... except that there > is no > >> > code > >> > >>> >> for optimizing in CloudSolrClient at all ... so I don't know > where > >> > the > >> > >>> >> different behavior would actually be happening. > >> > >>> > > >> > >>> > A more important question is, why are you optimising? Generally > it > >> > isn't > >> > >>> > recommended anymore as it reduces the natural distribution of > >> > documents > >> > >>> > amongst segments and makes future merges more costly. > >> > >>> > > >> > >>> > Upayavira > >> > >>> > >> > >> > >> > >> > >> > >