Re: Solr Hangs During Updates for over 10 minutes

Otis Gospodnetic Wed, 10 Jul 2013 11:02:27 -0700

+1 for G1.  We just had a happy client this week switch to G1 after
seeing stw pauses with CMS.  I can't share their JVM metrics from SPM,
but I can share ours:
http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/
(HBase, not Solr, but we've seen the same effect with ElasticSearch
for example, so I'm optimistic about seeing the same effects with
Solr, too).


Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Wed, Jul 10, 2013 at 4:48 AM, Daniel Collins <danwcoll...@gmail.com> wrote:
> We had something similar in terms of update times suddenly spiking up for
> no obvious reason.  We never got quite as bad as you in terms of the other
> knock on effects, but we certainly saw updates jumping from 10ms up to
> 30000ms, all our external queues backed up and we rejected some updates,
> then after a while things quietened down.
>
> We were running Solr 4.3.0 but with Java 6 and the CMS GC.  We swapped to
> Java 7, G1 GC (and increased heap size from 8Gb to 12Gb) and the problem
> went away.
>
> Now, I admit its not exactly the same as your case, we never had the
> follow-on effects, but I'd consider Java 7 and the G1 GC, it has certainly
> reduced the "spikes" in our indexing times.
>
> We run the following settings now (the usual caveats apply, it might not
> work for you).
>
>     GC_OPTIONS="-XX:+AggressiveOpts -XX:+UseG1GC -XX:+UseStringCache
> -XX:+OptimizeStringConcat -XX:-UseSplitVerifier -XX:+UseNUMA
> -XX:MaxGCPauseMillis=50 -XX:GCPauseIntervalMillis=1000"
>
> I set the MaxGCPauseMillis/GCPauseIntervalMillis to try to minimise
> application pauses, that's our goal, if we have to use more memory in the
> short term then so be it, but we couldn't afford application pauses,
> because we are using NRT (soft commits every 1s, hard commits every 60s)
> and we get a lot of updates.
>
> I know there have been other discussion on G1 and it has received mixed
> results overall, but for us, it seems to be a winner.
>
> Hope that helps,
>
>
> On 10 July 2013 08:32, Jed Glazner <jglaz...@adobe.com> wrote:
>
>> We are planning an upgrade to 4.4 but it's still weeks out. We offer a
>> high availability search service and there are a number of changes in 4.4
>> that are not backward compatible. (i.e. Clusterstate.json and no solr.xml)
>> So there must be lots of testing, additionally this upgrade cannot be
>> performed without downtime.
>>
>> Regardless, I need to find a band-aid right now.  Does anyone know if it's
>> possible to set the timeout for distributed update request to/from leader.
>>  Currently we see it's set to 0.  Maybe via -D startup param, or something?
>>
>> Jed
>>
>> On 7/10/13 1:23 AM, "Otis Gospodnetic" <otis.gospodne...@gmail.com> wrote:
>>
>> >Hi Jed,
>> >
>> >This is really with Solr 4.0?  If so, it may be wiser to jump on 4.4
>> >that is about to be released.  We did not have fun working with 4.0 in
>> >SolrCloud mode a few months ago.  You will save time, hair, and money
>> >if you convince your manager to let you use Solr 4.4. :)
>> >
>> >Otis
>> >--
>> >Solr & ElasticSearch Support -- http://sematext.com/
>> >Performance Monitoring -- http://sematext.com/spm
>> >
>> >
>> >
>> >On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner <jglaz...@adobe.com> wrote:
>> >> Hi Shawn,
>> >>
>> >> I have been trying to duplicate this problem without success for the
>> >>last 2 weeks which is one reason I'm getting flustered.   It seems
>> >>reasonable to be able to duplicate it but I can't.
>> >>
>> >>  We do have a story to upgrade but that is still weeks if not months
>> >>before that gets rolled out to production.
>> >>
>> >> We have another cluster running the same version but with 8 shards and
>> >>8 replicas with each shard at 100gb and more load and more indexing
>> >>requests without this problem but we send docs in batches here and all
>> >>fields are stored.   Where as the trouble index has only 1 or 2 stored
>> >>fields and only send docs 1 at a time.
>> >>
>> >> Could that have anything to do with it?
>> >>
>> >> Jed
>> >>
>> >>
>> >> Von Samsung Mobile gesendet
>> >>
>> >>
>> >>
>> >> -------- Ursprüngliche Nachricht --------
>> >> Von: Shawn Heisey <s...@elyograg.org>
>> >> Datum: 07.09.2013 18:33 (GMT+01:00)
>> >> An: solr-user@lucene.apache.org
>> >> Betreff: Re: Solr Hangs During Updates for over 10 minutes
>> >>
>> >>
>> >> On 7/9/2013 9:50 AM, Jed Glazner wrote:
>> >>> I'll give you the high level before delving deep into setup etc. I
>> >>>have been struggeling at work with a seemingly random problem when solr
>> >>>will hang for 10-15 minutes during updates.  This outage always seems
>> >>>to immediately be proceeded by an EOF exception on  the replica.  Then
>> >>>10-15 minutes later we see an exception on the leader for a socket
>> >>>timeout to the replica.  The leader will then tell the replica to
>> >>>recover which in most cases it does and then the outage is over.
>> >>>
>> >>> Here are the setup details:
>> >>>
>> >>> We are currently using Solr 4.0.0 with an external ZK ensemble of 5
>> >>>machines.
>> >>
>> >> After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced
>> >> and have since been fixed.  You're five releases and about nine months
>> >> behind what's current.  My recommendation: Upgrade to 4.3.1, ensure your
>> >> configuration is up to date with changes to the example config between
>> >> 4.0.0 and 4.3.1, and reindex.  Ideally, you should set up a 4.0.0
>> >> testbed, duplicate your current problem, and upgrade the testbed to see
>> >> if the problem goes away.  A testbed will also give you practice for a
>> >> smooth upgrade of your production system.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>>
>>

Re: Solr Hangs During Updates for over 10 minutes

Reply via email to