I have not added any timeout in the indexer except zk client time out which
is 30 seconds. I am simply calling client.close() at the end of indexing.
The same code was not running in background for optimize with solr-4.10.3
and org.apache.solr.client.solrj.impl.CloudSolrServer.

On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Are you timing out on the client request? The theory here is that it's
> still a synchronous call, but you're just timing out at the client
> level. At that point, the optimize is still running it's just the
> connection has been dropped....
>
> Shot in the dark.
> Erick
>
> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
> > I could not notice it but with my past experience of commit which used to
> > take around 2 minutes is now taking around 8 seconds. I think this is
> also
> > running as background.
> >
> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <modather1...@gmail.com
> >
> > wrote:
> >
> >> The indexer takes almost 2 hours to optimize. It has a multi-threaded
> add
> >> of batches of documents to
> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
> >> Once all the documents are indexed it invokes commit and optimize. I
> have
> >> seen that the optimize goes into background after 10 minutes and indexer
> >> exits.
> >> I am not sure why this 10 minutes it hangs on indexer. This behavior I
> >> have seen in multiple iteration of the indexing of same data.
> >>
> >> There is nothing significant I found in log which I can share. I can see
> >> following in log.
> >> org.apache.solr.update.DirectUpdateHandler2; start
> >>
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> >>
> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> All strange of course. What do your Solr logs show when this happens?
> >>> And how reproducible is this?
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> wrote:
> >>> > In this case, optimising makes sense, once the index is generated,
> you
> >>> > are not updating It.
> >>> >
> >>> > Upayavira
> >>> >
> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
> >>> >> Our index has almost 100M documents running on SolrCloud of 5 shards
> >>> and
> >>> >> each shard has an index size of about 170+GB (for the record, we are
> >>> not
> >>> >> using stored fields - our documents are pretty large). We perform a
> >>> full
> >>> >> indexing every weekend and during the week there are no updates
> made to
> >>> >> the
> >>> >> index. Most of the queries that we run are pretty complex with
> hundreds
> >>> >> of
> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
> >>> etc.
> >>> >> and take many minutes to execute. A difference of 10-20% is also a
> big
> >>> >> advantage for us.
> >>> >>
> >>> >> We have been optimizing the index after indexing for years and it
> has
> >>> >> worked well for us. Every once in a while, we upgrade Solr to the
> >>> latest
> >>> >> version and try without optimizing so that we can save the many
> hours
> >>> it
> >>> >> take to optimize such a huge index, but find optimized index work
> well
> >>> >> for
> >>> >> us.
> >>> >>
> >>> >> Erick I was indexing today the documents and saw the optimize
> happening
> >>> >> in
> >>> >> background.
> >>> >>
> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <
> >>> erickerick...@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >> > No results yet. I finished the test harness last night (not
> really a
> >>> >> > unit test, a stand-alone program that endlessly adds stuff and
> tests
> >>> >> > that every commit returns the correct number of docs).
> >>> >> >
> >>> >> > 8,000 cycles later there aren't any problems reported.
> >>> >> >
> >>> >> > Siiigggggh.
> >>> >> >
> >>> >> >
> >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather <
> >>> modather1...@gmail.com>
> >>> >> > wrote:
> >>> >> > > Hi,
> >>> >> > >
> >>> >> > > Erick you mentioned about a unit test to test the optimize
> running
> >>> in
> >>> >> > > background. Kindly share your findings if any.
> >>> >> > >
> >>> >> > > Thanks,
> >>> >> > > Modassar
> >>> >> > >
> >>> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather <
> >>> modather1...@gmail.com
> >>> >> > >
> >>> >> > > wrote:
> >>> >> > >
> >>> >> > >> Thanks everybody for your replies.
> >>> >> > >>
> >>> >> > >> I have noticed the optimization running in background every
> time I
> >>> >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses the
> >>> >> > >> CloudSolrClient. Kindly share your findings on this issue.
> >>> >> > >>
> >>> >> > >> Our index has almost 100M documents running on SolrCloud. We
> have
> >>> been
> >>> >> > >> optimizing the index after indexing for years and it has worked
> >>> well for
> >>> >> > >> us.
> >>> >> > >>
> >>> >> > >> Thanks,
> >>> >> > >> Modassar
> >>> >> > >>
> >>> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson <
> >>> >> > erickerick...@gmail.com>
> >>> >> > >> wrote:
> >>> >> > >>
> >>> >> > >>> Actually, I've recently seen very similar behavior in Solr
> >>> 4.10.3, but
> >>> >> > >>> involving hard commits openSearcher=true, see:
> >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I
> >>> can't
> >>> >> > >>> reproduce this at will, siigggghhhh.
> >>> >> > >>>
> >>> >> > >>> A unit test should be very simple to write though, maybe I can
> >>> get to
> >>> >> > it
> >>> >> > >>> today.
> >>> >> > >>>
> >>> >> > >>> Erick
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>>
> >>> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <u...@odoko.co.uk>
> >>> wrote:
> >>> >> > >>> >
> >>> >> > >>> >
> >>> >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
> >>> >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote:
> >>> >> > >>> >> > I am using Solr-5.1.0. I have an indexer class which
> invokes
> >>> >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits
> >>> after
> >>> >> > the
> >>> >> > >>> >> > invocation of optimize and the optimization keeps on
> >>> running in
> >>> >> > the
> >>> >> > >>> >> > background.
> >>> >> > >>> >> > Kindly let me know if it is per design and how can I
> make my
> >>> >> > indexer
> >>> >> > >>> to
> >>> >> > >>> >> > wait until the optimization is over. Is there a
> >>> >> > >>> configuration/parameter I
> >>> >> > >>> >> > need to set for the same.
> >>> >> > >>> >> >
> >>> >> > >>> >> > Please note that the same indexer with
> >>> >> > >>> cloudSolrServer.optimize(true, true,
> >>> >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was over
> >>> before
> >>> >> > >>> exiting.
> >>> >> > >>> >>
> >>> >> > >>> >> This is very odd, because I could not get HttpSolrServer to
> >>> >> > optimize in
> >>> >> > >>> >> the background, even when that was what I wanted.
> >>> >> > >>> >>
> >>> >> > >>> >> I wondered if maybe the Cloud object behaves differently
> with
> >>> >> > regard to
> >>> >> > >>> >> blocking until an optimize is finished ... except that
> there
> >>> is no
> >>> >> > code
> >>> >> > >>> >> for optimizing in CloudSolrClient at all ... so I don't
> know
> >>> where
> >>> >> > the
> >>> >> > >>> >> different behavior would actually be happening.
> >>> >> > >>> >
> >>> >> > >>> > A more important question is, why are you optimising?
> >>> Generally it
> >>> >> > isn't
> >>> >> > >>> > recommended anymore as it reduces the natural distribution
> of
> >>> >> > documents
> >>> >> > >>> > amongst segments and makes future merges more costly.
> >>> >> > >>> >
> >>> >> > >>> > Upayavira
> >>> >> > >>>
> >>> >> > >>
> >>> >> > >>
> >>> >> >
> >>>
> >>
> >>
>

Reply via email to