Jed: I'm not sure changing Java runtime is any less scary than upgrading Solr....
Wait, I know! Ask your manager if you can do both at once <evil smirk>. I have a t-shirt that says "I don't test, but when I do it's in production"... Erick On Wed, Jul 10, 2013 at 8:08 AM, Jed Glazner <jglaz...@adobe.com> wrote: > Hey Daniel, > > Thanks for the response. I think we'll give this a try to see if this > helps. > > Jed. > > On 7/10/13 10:48 AM, "Daniel Collins" <danwcoll...@gmail.com> wrote: > >>We had something similar in terms of update times suddenly spiking up for >>no obvious reason. We never got quite as bad as you in terms of the other >>knock on effects, but we certainly saw updates jumping from 10ms up to >>30000ms, all our external queues backed up and we rejected some updates, >>then after a while things quietened down. >> >>We were running Solr 4.3.0 but with Java 6 and the CMS GC. We swapped to >>Java 7, G1 GC (and increased heap size from 8Gb to 12Gb) and the problem >>went away. >> >>Now, I admit its not exactly the same as your case, we never had the >>follow-on effects, but I'd consider Java 7 and the G1 GC, it has certainly >>reduced the "spikes" in our indexing times. >> >>We run the following settings now (the usual caveats apply, it might not >>work for you). >> >> GC_OPTIONS="-XX:+AggressiveOpts -XX:+UseG1GC -XX:+UseStringCache >>-XX:+OptimizeStringConcat -XX:-UseSplitVerifier -XX:+UseNUMA >>-XX:MaxGCPauseMillis=50 -XX:GCPauseIntervalMillis=1000" >> >>I set the MaxGCPauseMillis/GCPauseIntervalMillis to try to minimise >>application pauses, that's our goal, if we have to use more memory in the >>short term then so be it, but we couldn't afford application pauses, >>because we are using NRT (soft commits every 1s, hard commits every 60s) >>and we get a lot of updates. >> >>I know there have been other discussion on G1 and it has received mixed >>results overall, but for us, it seems to be a winner. >> >>Hope that helps, >> >> >>On 10 July 2013 08:32, Jed Glazner <jglaz...@adobe.com> wrote: >> >>> We are planning an upgrade to 4.4 but it's still weeks out. We offer a >>> high availability search service and there are a number of changes in >>>4.4 >>> that are not backward compatible. (i.e. Clusterstate.json and no >>>solr.xml) >>> So there must be lots of testing, additionally this upgrade cannot be >>> performed without downtime. >>> >>> Regardless, I need to find a band-aid right now. Does anyone know if >>>it's >>> possible to set the timeout for distributed update request to/from >>>leader. >>> Currently we see it's set to 0. Maybe via -D startup param, or >>>something? >>> >>> Jed >>> >>> On 7/10/13 1:23 AM, "Otis Gospodnetic" <otis.gospodne...@gmail.com> >>>wrote: >>> >>> >Hi Jed, >>> > >>> >This is really with Solr 4.0? If so, it may be wiser to jump on 4.4 >>> >that is about to be released. We did not have fun working with 4.0 in >>> >SolrCloud mode a few months ago. You will save time, hair, and money >>> >if you convince your manager to let you use Solr 4.4. :) >>> > >>> >Otis >>> >-- >>> >Solr & ElasticSearch Support -- http://sematext.com/ >>> >Performance Monitoring -- http://sematext.com/spm >>> > >>> > >>> > >>> >On Tue, Jul 9, 2013 at 4:44 PM, Jed Glazner <jglaz...@adobe.com> wrote: >>> >> Hi Shawn, >>> >> >>> >> I have been trying to duplicate this problem without success for the >>> >>last 2 weeks which is one reason I'm getting flustered. It seems >>> >>reasonable to be able to duplicate it but I can't. >>> >> >>> >> We do have a story to upgrade but that is still weeks if not months >>> >>before that gets rolled out to production. >>> >> >>> >> We have another cluster running the same version but with 8 shards >>>and >>> >>8 replicas with each shard at 100gb and more load and more indexing >>> >>requests without this problem but we send docs in batches here and all >>> >>fields are stored. Where as the trouble index has only 1 or 2 stored >>> >>fields and only send docs 1 at a time. >>> >> >>> >> Could that have anything to do with it? >>> >> >>> >> Jed >>> >> >>> >> >>> >> Von Samsung Mobile gesendet >>> >> >>> >> >>> >> >>> >> -------- Ursprüngliche Nachricht -------- >>> >> Von: Shawn Heisey <s...@elyograg.org> >>> >> Datum: 07.09.2013 18:33 (GMT+01:00) >>> >> An: solr-user@lucene.apache.org >>> >> Betreff: Re: Solr Hangs During Updates for over 10 minutes >>> >> >>> >> >>> >> On 7/9/2013 9:50 AM, Jed Glazner wrote: >>> >>> I'll give you the high level before delving deep into setup etc. I >>> >>>have been struggeling at work with a seemingly random problem when >>>solr >>> >>>will hang for 10-15 minutes during updates. This outage always seems >>> >>>to immediately be proceeded by an EOF exception on the replica. >>>Then >>> >>>10-15 minutes later we see an exception on the leader for a socket >>> >>>timeout to the replica. The leader will then tell the replica to >>> >>>recover which in most cases it does and then the outage is over. >>> >>> >>> >>> Here are the setup details: >>> >>> >>> >>> We are currently using Solr 4.0.0 with an external ZK ensemble of 5 >>> >>>machines. >>> >> >>> >> After 4.0.0 was released, a *lot* of problems with SolrCloud surfaced >>> >> and have since been fixed. You're five releases and about nine >>>months >>> >> behind what's current. My recommendation: Upgrade to 4.3.1, ensure >>>your >>> >> configuration is up to date with changes to the example config >>>between >>> >> 4.0.0 and 4.3.1, and reindex. Ideally, you should set up a 4.0.0 >>> >> testbed, duplicate your current problem, and upgrade the testbed to >>>see >>> >> if the problem goes away. A testbed will also give you practice for >>>a >>> >> smooth upgrade of your production system. >>> >> >>> >> Thanks, >>> >> Shawn >>> >> >>> >>> >