Re: Index optimize runs in background.

Modassar Ather Thu, 28 May 2015 22:23:25 -0700

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I have
seen in multiple iteration of the indexing of same data.


There is nothing significant I found in log which I can share. I can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> All strange of course. What do your Solr logs show when this happens?
> And how reproducible is this?
>
> Best,
> Erick
>
> On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> wrote:
> > In this case, optimising makes sense, once the index is generated, you
> > are not updating It.
> >
> > Upayavira
> >
> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
> >> Our index has almost 100M documents running on SolrCloud of 5 shards and
> >> each shard has an index size of about 170+GB (for the record, we are not
> >> using stored fields - our documents are pretty large). We perform a full
> >> indexing every weekend and during the week there are no updates made to
> >> the
> >> index. Most of the queries that we run are pretty complex with hundreds
> >> of
> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
> >> and take many minutes to execute. A difference of 10-20% is also a big
> >> advantage for us.
> >>
> >> We have been optimizing the index after indexing for years and it has
> >> worked well for us. Every once in a while, we upgrade Solr to the latest
> >> version and try without optimizing so that we can save the many hours it
> >> take to optimize such a huge index, but find optimized index work well
> >> for
> >> us.
> >>
> >> Erick I was indexing today the documents and saw the optimize happening
> >> in
> >> background.
> >>
> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >> > No results yet. I finished the test harness last night (not really a
> >> > unit test, a stand-alone program that endlessly adds stuff and tests
> >> > that every commit returns the correct number of docs).
> >> >
> >> > 8,000 cycles later there aren't any problems reported.
> >> >
> >> > Siiigggggh.
> >> >
> >> >
> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather <
> modather1...@gmail.com>
> >> > wrote:
> >> > > Hi,
> >> > >
> >> > > Erick you mentioned about a unit test to test the optimize running
> in
> >> > > background. Kindly share your findings if any.
> >> > >
> >> > > Thanks,
> >> > > Modassar
> >> > >
> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather <
> modather1...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > >> Thanks everybody for your replies.
> >> > >>
> >> > >> I have noticed the optimization running in background every time I
> >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses the
> >> > >> CloudSolrClient. Kindly share your findings on this issue.
> >> > >>
> >> > >> Our index has almost 100M documents running on SolrCloud. We have
> been
> >> > >> optimizing the index after indexing for years and it has worked
> well for
> >> > >> us.
> >> > >>
> >> > >> Thanks,
> >> > >> Modassar
> >> > >>
> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson <
> >> > erickerick...@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >>> Actually, I've recently seen very similar behavior in Solr
> 4.10.3, but
> >> > >>> involving hard commits openSearcher=true, see:
> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I
> can't
> >> > >>> reproduce this at will, siigggghhhh.
> >> > >>>
> >> > >>> A unit test should be very simple to write though, maybe I can
> get to
> >> > it
> >> > >>> today.
> >> > >>>
> >> > >>> Erick
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <u...@odoko.co.uk>
> wrote:
> >> > >>> >
> >> > >>> >
> >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
> >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote:
> >> > >>> >> > I am using Solr-5.1.0. I have an indexer class which invokes
> >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits
> after
> >> > the
> >> > >>> >> > invocation of optimize and the optimization keeps on running
> in
> >> > the
> >> > >>> >> > background.
> >> > >>> >> > Kindly let me know if it is per design and how can I make my
> >> > indexer
> >> > >>> to
> >> > >>> >> > wait until the optimization is over. Is there a
> >> > >>> configuration/parameter I
> >> > >>> >> > need to set for the same.
> >> > >>> >> >
> >> > >>> >> > Please note that the same indexer with
> >> > >>> cloudSolrServer.optimize(true, true,
> >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was over
> before
> >> > >>> exiting.
> >> > >>> >>
> >> > >>> >> This is very odd, because I could not get HttpSolrServer to
> >> > optimize in
> >> > >>> >> the background, even when that was what I wanted.
> >> > >>> >>
> >> > >>> >> I wondered if maybe the Cloud object behaves differently with
> >> > regard to
> >> > >>> >> blocking until an optimize is finished ... except that there
> is no
> >> > code
> >> > >>> >> for optimizing in CloudSolrClient at all ... so I don't know
> where
> >> > the
> >> > >>> >> different behavior would actually be happening.
> >> > >>> >
> >> > >>> > A more important question is, why are you optimising? Generally
> it
> >> > isn't
> >> > >>> > recommended anymore as it reduces the natural distribution of
> >> > documents
> >> > >>> > amongst segments and makes future merges more costly.
> >> > >>> >
> >> > >>> > Upayavira
> >> > >>>
> >> > >>
> >> > >>
> >> >
>

Re: Index optimize runs in background.

Reply via email to