Hey Daniel, Thanks for the response. I think we'll give this a try to see if this helps.
Jed. On 7/10/13 10:48 AM, "Daniel Collins" <danwcoll...@gmail.com> wrote: >We had something similar in terms of update times suddenly spiking up for >no obvious reason. We never got quite as bad as you in terms of the other >knock on effects, but we certainly saw updates jumping from 10ms up to >30000ms, all our external queues backed up and we rejected some updates, >then after a while things quietened down. > >We were running Solr 4.3.0 but with Java 6 and the CMS GC. We swapped to >Java 7, G1 GC (and increased heap size from 8Gb to 12Gb) and the problem >went away. > >Now, I admit its not exactly the same as your case, we never had the >follow-on effects, but I'd consider Java 7 and the G1 GC, it has certainly >reduced the "spikes" in our indexing times. > >We run the following settings now (the usual caveats apply, it might not >work for you). > > GC_OPTIONS="-XX:+AggressiveOpts -XX:+UseG1GC -XX:+UseStringCache >-XX:+OptimizeStringConcat -XX:-UseSplitVerifier -XX:+UseNUMA >-XX:MaxGCPauseMillis=50 -XX:GCPauseIntervalMillis=1000" > >I set the MaxGCPauseMillis/GCPauseIntervalMillis to try to minimise >application pauses, that's our goal, if we have to use more memory in the >short term then so be it, but we couldn't afford application pauses, >because we are using NRT (soft commits every 1s, hard commits every 60s) >and we get a lot of updates. > >I know there have been other discussion on G1 and it has received mixed >results overall, but for us, it seems to be a winner. > >Hope that helps, > > >On 10 July 2013 08:32, Jed Glazner <jglaz...@adobe.com> wrote: > >> We are planning an upgrade to 4.4 but it's still weeks out. We offer a >> high availability search service and there are a number of changes in >>4.4 >> that are not backward compatible. (i.e. Clusterstate.json and no >>solr.xml) >> So there must be lots of testing, additionally this upgrade cannot be >> performed without downtime. >> >> Regardless, I need to find a band-aid right now. Does anyone know if >>it's >> possible to set the timeout for distributed update request to/from >>leader. >> Currently we see it's set to 0. Maybe via -D startup param, or >>something? >> >> Jed >> >> On 7/10/13 1:23 AM, "Otis Gospodnetic" <otis.gospodne...@gmail.com> >>wrote: >> >> >Hi Jed, >> > >> >This is really with Solr 4.0? If so, it may be wiser to jump on 4.4 >> >that is about to be released. We did not have fun working with 4.0 in >> >SolrCloud mode a few months ago. You will save time, hair, and money >> >if you convince your manager to let you use Solr 4.4. :) >> > >> >Otis >> >-- >> >Solr & ElasticSearch Support -- http://sematext.com/ >> >Performance Monitoring -- http://sematext.com/spm >> > >> > >> > >> >On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner <jglaz...@adobe.com> wrote: >> >> Hi Shawn, >> >> >> >> I have been trying to duplicate this problem without success for the >> >>last 2 weeks which is one reason I'm getting flustered. It seems >> >>reasonable to be able to duplicate it but I can't. >> >> >> >> We do have a story to upgrade but that is still weeks if not months >> >>before that gets rolled out to production. >> >> >> >> We have another cluster running the same version but with 8 shards >>and >> >>8 replicas with each shard at 100gb and more load and more indexing >> >>requests without this problem but we send docs in batches here and all >> >>fields are stored. Where as the trouble index has only 1 or 2 stored >> >>fields and only send docs 1 at a time. >> >> >> >> Could that have anything to do with it? >> >> >> >> Jed >> >> >> >> >> >> Von Samsung Mobile gesendet >> >> >> >> >> >> >> >> -------- Ursprüngliche Nachricht -------- >> >> Von: Shawn Heisey <s...@elyograg.org> >> >> Datum: 07.09.2013 18:33 (GMT+01:00) >> >> An: solr-user@lucene.apache.org >> >> Betreff: Re: Solr Hangs During Updates for over 10 minutes >> >> >> >> >> >> On 7/9/2013 9:50 AM, Jed Glazner wrote: >> >>> I'll give you the high level before delving deep into setup etc. I >> >>>have been struggeling at work with a seemingly random problem when >>solr >> >>>will hang for 10-15 minutes during updates. This outage always seems >> >>>to immediately be proceeded by an EOF exception on the replica. >>Then >> >>>10-15 minutes later we see an exception on the leader for a socket >> >>>timeout to the replica. The leader will then tell the replica to >> >>>recover which in most cases it does and then the outage is over. >> >>> >> >>> Here are the setup details: >> >>> >> >>> We are currently using Solr 4.0.0 with an external ZK ensemble of 5 >> >>>machines. >> >> >> >> After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced >> >> and have since been fixed. You're five releases and about nine >>months >> >> behind what's current. My recommendation: Upgrade to 4.3.1, ensure >>your >> >> configuration is up to date with changes to the example config >>between >> >> 4.0.0 and 4.3.1, and reindex. Ideally, you should set up a 4.0.0 >> >> testbed, duplicate your current problem, and upgrade the testbed to >>see >> >> if the problem goes away. A testbed will also give you practice for >>a >> >> smooth upgrade of your production system. >> >> >> >> Thanks, >> >> Shawn >> >> >> >>