I have not added any timeout in the indexer except zk client time out which is 30 seconds. I am simply calling client.close() at the end of indexing. The same code was not running in background for optimize with solr-4.10.3 and org.apache.solr.client.solrj.impl.CloudSolrServer.
On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Are you timing out on the client request? The theory here is that it's > still a synchronous call, but you're just timing out at the client > level. At that point, the optimize is still running it's just the > connection has been dropped.... > > Shot in the dark. > Erick > > On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <modather1...@gmail.com> > wrote: > > I could not notice it but with my past experience of commit which used to > > take around 2 minutes is now taking around 8 seconds. I think this is > also > > running as background. > > > > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <modather1...@gmail.com > > > > wrote: > > > >> The indexer takes almost 2 hours to optimize. It has a multi-threaded > add > >> of batches of documents to > >> org.apache.solr.client.solrj.impl.CloudSolrClient. > >> Once all the documents are indexed it invokes commit and optimize. I > have > >> seen that the optimize goes into background after 10 minutes and indexer > >> exits. > >> I am not sure why this 10 minutes it hangs on indexer. This behavior I > >> have seen in multiple iteration of the indexing of same data. > >> > >> There is nothing significant I found in log which I can share. I can see > >> following in log. > >> org.apache.solr.update.DirectUpdateHandler2; start > >> > commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > >> > >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> All strange of course. What do your Solr logs show when this happens? > >>> And how reproducible is this? > >>> > >>> Best, > >>> Erick > >>> > >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> wrote: > >>> > In this case, optimising makes sense, once the index is generated, > you > >>> > are not updating It. > >>> > > >>> > Upayavira > >>> > > >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: > >>> >> Our index has almost 100M documents running on SolrCloud of 5 shards > >>> and > >>> >> each shard has an index size of about 170+GB (for the record, we are > >>> not > >>> >> using stored fields - our documents are pretty large). We perform a > >>> full > >>> >> indexing every weekend and during the week there are no updates > made to > >>> >> the > >>> >> index. Most of the queries that we run are pretty complex with > hundreds > >>> >> of > >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts > >>> etc. > >>> >> and take many minutes to execute. A difference of 10-20% is also a > big > >>> >> advantage for us. > >>> >> > >>> >> We have been optimizing the index after indexing for years and it > has > >>> >> worked well for us. Every once in a while, we upgrade Solr to the > >>> latest > >>> >> version and try without optimizing so that we can save the many > hours > >>> it > >>> >> take to optimize such a huge index, but find optimized index work > well > >>> >> for > >>> >> us. > >>> >> > >>> >> Erick I was indexing today the documents and saw the optimize > happening > >>> >> in > >>> >> background. > >>> >> > >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson < > >>> erickerick...@gmail.com> > >>> >> wrote: > >>> >> > >>> >> > No results yet. I finished the test harness last night (not > really a > >>> >> > unit test, a stand-alone program that endlessly adds stuff and > tests > >>> >> > that every commit returns the correct number of docs). > >>> >> > > >>> >> > 8,000 cycles later there aren't any problems reported. > >>> >> > > >>> >> > Siiigggggh. > >>> >> > > >>> >> > > >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather < > >>> modather1...@gmail.com> > >>> >> > wrote: > >>> >> > > Hi, > >>> >> > > > >>> >> > > Erick you mentioned about a unit test to test the optimize > running > >>> in > >>> >> > > background. Kindly share your findings if any. > >>> >> > > > >>> >> > > Thanks, > >>> >> > > Modassar > >>> >> > > > >>> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather < > >>> modather1...@gmail.com > >>> >> > > > >>> >> > > wrote: > >>> >> > > > >>> >> > >> Thanks everybody for your replies. > >>> >> > >> > >>> >> > >> I have noticed the optimization running in background every > time I > >>> >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses the > >>> >> > >> CloudSolrClient. Kindly share your findings on this issue. > >>> >> > >> > >>> >> > >> Our index has almost 100M documents running on SolrCloud. We > have > >>> been > >>> >> > >> optimizing the index after indexing for years and it has worked > >>> well for > >>> >> > >> us. > >>> >> > >> > >>> >> > >> Thanks, > >>> >> > >> Modassar > >>> >> > >> > >>> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson < > >>> >> > erickerick...@gmail.com> > >>> >> > >> wrote: > >>> >> > >> > >>> >> > >>> Actually, I've recently seen very similar behavior in Solr > >>> 4.10.3, but > >>> >> > >>> involving hard commits openSearcher=true, see: > >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I > >>> can't > >>> >> > >>> reproduce this at will, siigggghhhh. > >>> >> > >>> > >>> >> > >>> A unit test should be very simple to write though, maybe I can > >>> get to > >>> >> > it > >>> >> > >>> today. > >>> >> > >>> > >>> >> > >>> Erick > >>> >> > >>> > >>> >> > >>> > >>> >> > >>> > >>> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <u...@odoko.co.uk> > >>> wrote: > >>> >> > >>> > > >>> >> > >>> > > >>> >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: > >>> >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote: > >>> >> > >>> >> > I am using Solr-5.1.0. I have an indexer class which > invokes > >>> >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits > >>> after > >>> >> > the > >>> >> > >>> >> > invocation of optimize and the optimization keeps on > >>> running in > >>> >> > the > >>> >> > >>> >> > background. > >>> >> > >>> >> > Kindly let me know if it is per design and how can I > make my > >>> >> > indexer > >>> >> > >>> to > >>> >> > >>> >> > wait until the optimization is over. Is there a > >>> >> > >>> configuration/parameter I > >>> >> > >>> >> > need to set for the same. > >>> >> > >>> >> > > >>> >> > >>> >> > Please note that the same indexer with > >>> >> > >>> cloudSolrServer.optimize(true, true, > >>> >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was over > >>> before > >>> >> > >>> exiting. > >>> >> > >>> >> > >>> >> > >>> >> This is very odd, because I could not get HttpSolrServer to > >>> >> > optimize in > >>> >> > >>> >> the background, even when that was what I wanted. > >>> >> > >>> >> > >>> >> > >>> >> I wondered if maybe the Cloud object behaves differently > with > >>> >> > regard to > >>> >> > >>> >> blocking until an optimize is finished ... except that > there > >>> is no > >>> >> > code > >>> >> > >>> >> for optimizing in CloudSolrClient at all ... so I don't > know > >>> where > >>> >> > the > >>> >> > >>> >> different behavior would actually be happening. > >>> >> > >>> > > >>> >> > >>> > A more important question is, why are you optimising? > >>> Generally it > >>> >> > isn't > >>> >> > >>> > recommended anymore as it reduces the natural distribution > of > >>> >> > documents > >>> >> > >>> > amongst segments and makes future merges more costly. > >>> >> > >>> > > >>> >> > >>> > Upayavira > >>> >> > >>> > >>> >> > >> > >>> >> > >> > >>> >> > > >>> > >> > >> >