Re: Solr Hangs During Updates for over 10 minutes

Erick Erickson Wed, 10 Jul 2013 05:39:26 -0700

Jed:

I'm not sure changing Java runtime is any less scary than upgrading Solr....


Wait, I know! Ask your manager if you can do both at once <evil smirk>. I have
a  t-shirt that says "I don't test, but when I do it's in production"...

Erick

On Wed, Jul 10, 2013 at 8:08 AM, Jed Glazner <jglaz...@adobe.com> wrote:
> Hey Daniel,
>
> Thanks for the response.  I think we'll give this a try to see if this
> helps.
>
> Jed.
>
> On 7/10/13 10:48 AM, "Daniel Collins" <danwcoll...@gmail.com> wrote:
>
>>We had something similar in terms of update times suddenly spiking up for
>>no obvious reason.  We never got quite as bad as you in terms of the other
>>knock on effects, but we certainly saw updates jumping from 10ms up to
>>30000ms, all our external queues backed up and we rejected some updates,
>>then after a while things quietened down.
>>
>>We were running Solr 4.3.0 but with Java 6 and the CMS GC.  We swapped to
>>Java 7, G1 GC (and increased heap size from 8Gb to 12Gb) and the problem
>>went away.
>>
>>Now, I admit its not exactly the same as your case, we never had the
>>follow-on effects, but I'd consider Java 7 and the G1 GC, it has certainly
>>reduced the "spikes" in our indexing times.
>>
>>We run the following settings now (the usual caveats apply, it might not
>>work for you).
>>
>>    GC_OPTIONS="-XX:+AggressiveOpts -XX:+UseG1GC -XX:+UseStringCache
>>-XX:+OptimizeStringConcat -XX:-UseSplitVerifier -XX:+UseNUMA
>>-XX:MaxGCPauseMillis=50 -XX:GCPauseIntervalMillis=1000"
>>
>>I set the MaxGCPauseMillis/GCPauseIntervalMillis to try to minimise
>>application pauses, that's our goal, if we have to use more memory in the
>>short term then so be it, but we couldn't afford application pauses,
>>because we are using NRT (soft commits every 1s, hard commits every 60s)
>>and we get a lot of updates.
>>
>>I know there have been other discussion on G1 and it has received mixed
>>results overall, but for us, it seems to be a winner.
>>
>>Hope that helps,
>>
>>
>>On 10 July 2013 08:32, Jed Glazner <jglaz...@adobe.com> wrote:
>>
>>> We are planning an upgrade to 4.4 but it's still weeks out. We offer a
>>> high availability search service and there are a number of changes in
>>>4.4
>>> that are not backward compatible. (i.e. Clusterstate.json and no
>>>solr.xml)
>>> So there must be lots of testing, additionally this upgrade cannot be
>>> performed without downtime.
>>>
>>> Regardless, I need to find a band-aid right now.  Does anyone know if
>>>it's
>>> possible to set the timeout for distributed update request to/from
>>>leader.
>>>  Currently we see it's set to 0.  Maybe via -D startup param, or
>>>something?
>>>
>>> Jed
>>>
>>> On 7/10/13 1:23 AM, "Otis Gospodnetic" <otis.gospodne...@gmail.com>
>>>wrote:
>>>
>>> >Hi Jed,
>>> >
>>> >This is really with Solr 4.0?  If so, it may be wiser to jump on 4.4
>>> >that is about to be released.  We did not have fun working with 4.0 in
>>> >SolrCloud mode a few months ago.  You will save time, hair, and money
>>> >if you convince your manager to let you use Solr 4.4. :)
>>> >
>>> >Otis
>>> >--
>>> >Solr & ElasticSearch Support -- http://sematext.com/
>>> >Performance Monitoring -- http://sematext.com/spm
>>> >
>>> >
>>> >
>>> >On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner <jglaz...@adobe.com> wrote:
>>> >> Hi Shawn,
>>> >>
>>> >> I have been trying to duplicate this problem without success for the
>>> >>last 2 weeks which is one reason I'm getting flustered.   It seems
>>> >>reasonable to be able to duplicate it but I can't.
>>> >>
>>> >>  We do have a story to upgrade but that is still weeks if not months
>>> >>before that gets rolled out to production.
>>> >>
>>> >> We have another cluster running the same version but with 8 shards
>>>and
>>> >>8 replicas with each shard at 100gb and more load and more indexing
>>> >>requests without this problem but we send docs in batches here and all
>>> >>fields are stored.   Where as the trouble index has only 1 or 2 stored
>>> >>fields and only send docs 1 at a time.
>>> >>
>>> >> Could that have anything to do with it?
>>> >>
>>> >> Jed
>>> >>
>>> >>
>>> >> Von Samsung Mobile gesendet
>>> >>
>>> >>
>>> >>
>>> >> -------- Ursprüngliche Nachricht --------
>>> >> Von: Shawn Heisey <s...@elyograg.org>
>>> >> Datum: 07.09.2013 18:33 (GMT+01:00)
>>> >> An: solr-user@lucene.apache.org
>>> >> Betreff: Re: Solr Hangs During Updates for over 10 minutes
>>> >>
>>> >>
>>> >> On 7/9/2013 9:50 AM, Jed Glazner wrote:
>>> >>> I'll give you the high level before delving deep into setup etc. I
>>> >>>have been struggeling at work with a seemingly random problem when
>>>solr
>>> >>>will hang for 10-15 minutes during updates.  This outage always seems
>>> >>>to immediately be proceeded by an EOF exception on  the replica.
>>>Then
>>> >>>10-15 minutes later we see an exception on the leader for a socket
>>> >>>timeout to the replica.  The leader will then tell the replica to
>>> >>>recover which in most cases it does and then the outage is over.
>>> >>>
>>> >>> Here are the setup details:
>>> >>>
>>> >>> We are currently using Solr 4.0.0 with an external ZK ensemble of 5
>>> >>>machines.
>>> >>
>>> >> After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced
>>> >> and have since been fixed.  You're five releases and about nine
>>>months
>>> >> behind what's current.  My recommendation: Upgrade to 4.3.1, ensure
>>>your
>>> >> configuration is up to date with changes to the example config
>>>between
>>> >> 4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0
>>> >> testbed, duplicate your current problem, and upgrade the testbed to
>>>see
>>> >> if the problem goes away.  A testbed will also give you practice for
>>>a
>>> >> smooth upgrade of your production system.
>>> >>
>>> >> Thanks,
>>> >> Shawn
>>> >>
>>>
>>>
>

Re: Solr Hangs During Updates for over 10 minutes

Reply via email to