Hi Joe,

Are you/how are you sure all 3 shards are roughly the same size?  Can you
share what you run/see that shows you that?

Are you sure queries are evenly distributed?  Something like SPM
<http://sematext.com/spm/> should give you insight into that.

How big are your caches?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sat, May 31, 2014 at 5:54 PM, Joe Gresock <jgres...@gmail.com> wrote:

> Interesting thought about the routing.  Our document ids are in 3 parts:
>
> <10-digit identifier>!<epoch timestamp>!<format>
>
> e.g., 5/12345678!130000025603!TEXT
>
> Each object has an identifier, and there may be multiple versions of the
> object, hence the timestamp.  We like to be able to pull back all of the
> versions of an object at once, hence the routing scheme.
>
> The nature of the identifier is that a great many of them begin with a
> certain number.  I'd be interested to know more about the hashing scheme
> used for the document routing.  Perhaps the first character gives it more
> weight as to which shard it lands in?
>
> It seems strange that certain of the most highly-searched documents would
> happen to fall on this shard, but you may be onto something.   We'll scrape
> through some non-distributed queries and see what we can find.
>
>
> On Sat, May 31, 2014 at 1:47 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > This is very weird.
> >
> > Are you sure that all the Java versions are identical? And all the JVM
> > parameters are the same? Grasping at straws here.
> >
> > More grasping at straws: I'm a little suspicious that you are using
> > routing. You say that the indexes are about the same size, but is it is
> > possible that your routing is somehow loading the problem shard
> abnormally?
> > By that I mean somehow the documents on that shard are different, or
> have a
> > drastically higher number of hits than the other shards?
> >
> > You can fire queries at shards with &distrib=false and NOT have it go to
> > other shards, perhaps if you can isolate the problem queries that might
> > shed some light on the problem.
> >
> >
> > Best
> > er...@baffled.com
> >
> >
> > On Sat, May 31, 2014 at 8:33 AM, Joe Gresock <jgres...@gmail.com> wrote:
> >
> > > It has taken as little as 2 minutes to happen the last time we tried.
>  It
> > > basically happens upon high query load (peak user hours during the
> day).
> > >  When we reduce functionality by disabling most searches, it
> stabilizes.
> > >  So it really is only on high query load.  Our ingest rate is fairly
> low.
> > >
> > > It happens no matter how many nodes in the shard are up.
> > >
> > >
> > > Joe
> > >
> > >
> > > On Sat, May 31, 2014 at 11:04 AM, Jack Krupansky <
> > j...@basetechnology.com>
> > > wrote:
> > >
> > > > When you restart, how long does it take it hit the problem? And how
> > much
> > > > query or update activity is happening in that time? Is there any
> other
> > > > activity showing up in the log?
> > > >
> > > > If you bring up only a single node in that problematic shard, do you
> > > still
> > > > see the problem?
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -----Original Message----- From: Joe Gresock
> > > > Sent: Saturday, May 31, 2014 9:34 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Uneven shard heap usage
> > > >
> > > >
> > > > Hi folks,
> > > >
> > > > I'm trying to figure out why one shard of an evenly-distributed
> 3-shard
> > > > cluster would suddenly start running out of heap space, after 9+
> months
> > > of
> > > > stable performance.  We're using the "!" delimiter in our ids to
> > > distribute
> > > > the documents, and indeed the disk size of our shards are very
> similar
> > > > (31-32GB on disk per replica).
> > > >
> > > > Our setup is:
> > > > 9 VMs with 16GB RAM, 8 vcpus (with a 4:1 oversubscription ratio, so
> > > > basically 2 physical CPUs), 24GB disk
> > > > 3 shards, 3 replicas per shard (1 leader, 2 replicas, whatever).  We
> > > > reserve 10g heap for each solr instance.
> > > > Also 3 zookeeper VMs, which are very stable
> > > >
> > > > Since the troubles started, we've been monitoring all 9 with
> jvisualvm,
> > > and
> > > > shards 2 and 3 keep a steady amount of heap space reserved, always
> > having
> > > > horizontal lines (with some minor gc).  They're using 4-5GB heap, and
> > > when
> > > > we force gc using jvisualvm, they drop to 1GB usage.  Shard 1,
> however,
> > > > quickly has a steep slope, and eventually has concurrent mode
> failures
> > in
> > > > the gc logs, requiring us to restart the instances when they can no
> > > longer
> > > > do anything but gc.
> > > >
> > > > We've tried ruling out physical host problems by moving all 3 Shard 1
> > > > replicas to different hosts that are underutilized, however we still
> > get
> > > > the same problem.  We'll still be working on ruling out
> infrastructure
> > > > issues, but I wanted to ask the questions here in case it makes
> sense:
> > > >
> > > > * Does it make sense that all the replicas on one shard of a cluster
> > > would
> > > > have heap problems, when the other shard replicas do not, assuming a
> > > fairly
> > > > even data distribution?
> > > > * One thing we changed recently was to make all of our fields stored,
> > > > instead of only half of them.  This was to support atomic updates.
>  Can
> > > > stored fields, even though lazily loaded, cause problems like this?
> > > >
> > > > Thanks for any input,
> > > > Joe
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > I know what it is to be in need, and I know what it is to have
> plenty.
> >  I
> > > > have learned the secret of being content in any and every situation,
> > > > whether well fed or hungry, whether living in plenty or in want.  I
> can
> > > do
> > > > all this through him who gives me strength.    *-Philippians 4:12-13*
> > > >
> > >
> > >
> > >
> > > --
> > > I know what it is to be in need, and I know what it is to have plenty.
>  I
> > > have learned the secret of being content in any and every situation,
> > > whether well fed or hungry, whether living in plenty or in want.  I can
> > do
> > > all this through him who gives me strength.    *-Philippians 4:12-13*
> > >
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>

Reply via email to