RE: Cluster Wide Pauses

Geoff Hendrey Fri, 14 Jan 2011 10:07:09 -0800

This is not an answer to your question, but just an anecdote on cluster
pauses/slowdowns. We had horrible problems with cluster wide pauses. I
think there were several keys to getting this resolved:


1) we used the default settings recommended for bulk inserts:
http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf
2) we upgraded to hbase 20.6 b/c there was a deadlock bug in prior
versions that basically just caused the entire cluster to "go to sleep"

Finally, we had a very strange problem which took 3 weeks of debugging
to get to the bottom of. I don't expect that this is your problem, but
I'll just throw it out there. Most bulk HBase data-producing M/R jobs
are going to do some processing, then write the data from the reducer
into hbase (using autoflush=false and disabling the WAL). Since the
reducers all receive keys in the same order, this causes all the
reducers to load the same HBase region simultaneously. We had this
"great idea" that if we reversed the keys that we wrote out of our
mapper, then un-reversed them in the reducer, that our reducers would be
randomly writing to different region servers, not hitting a single
region in lock step. Now, I have some theories on why this seemingly
innocuous approach repeatedly destroyed our entire Hbase database. I
won't wax philosophical here, but one thing is certain: Any table
created via batch inserts of randomized keys got totally hosed. Scans
became dirt slow and compactions ran constantly, even *days* after the
table was created. None of these problems made a whole lot of sense,
which is why it took 3-4 weeks of debugging for us to back this "key
randomizing" out of our code. The hosed tables, actually had to be
dropped for the problem, and ensuing chaos to totally abate. Until we
dropped the tables, if the region server logs showed constant
compaction. Like I said, it sounds crazy, but this definitely was the
cause of our problem. I'm fully expecting a lot of "your crazy"
responses to this email, but we repeatedly reproduced the issue, and the
fix was to stop the "key reversing". We just had to live with all the
reducers loading individual regions in lock step, as this was really not
a big deal (at least not as big a deal as hosing the entire
installation).

-g

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Christopher
Tarnas
Sent: Friday, January 14, 2011 9:54 AM
To: [email protected]
Subject: Re: Cluster Wide Pauses

Thanks - I was not sure and had not received a response from the list on
my
related question earlier this week.

It does seem like compactions are related to my problem, and if I
understand
correctly does raising hbase.hregion.memstore.block.multiplier give it
more
of a buffer for that before writes are blocked while compactions happen?
I'm
writing via thrift (about 30 clients) to a 5 node cluster when I see
this
problem. There is no io wait so I don't think it is disk bound, and it
is
not CPU starved. I'm waiting on IT to get me access to ganglia for the
network info.

-chris

On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray <[email protected]> wrote:

> These are a different kind of pause (those caused by
blockingStoreFiles).
>
> This is HBase stepping in and actually blocking updates to a region
because
> compactions have not been able to keep up with the write load.  It
could
> manifest itself in the same way but this is different than shorter
pauses
> caused by periodic offlining of regions during balancing and splits.
>
> Wayne, have you confirmed in your RegionServer logs that the pauses
are
> associated with splits or region movement, and that you are not seeing
the
> blocking store files issue?
>
> JG
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
Christopher
> > Tarnas
> > Sent: Friday, January 14, 2011 7:29 AM
> > To: [email protected]
> > Subject: Re: Cluster Wide Pauses
> >
> > I have been seeing similar problems and found by raising the
> > hbase.hregion.memstore.block.multiplier
> > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles
to
> 16 I
> > managed to reduce the frequency of the pauses during loads.  My
nodes are
> > pretty beefy (48 GB of ram) so I had room to experiment.
> >
> > From what I understand that gave the regionservers more buffer
before
> > they had to halt the world to catch up. The pauses still happen but
their
> > impact is less now.
> >
> > -chris
> >
> > On Fri, Jan 14, 2011 at 8:34 AM, Wayne <[email protected]> wrote:
> >
> > > We have not found any smoking gun here. Most likely these are
region
> > > splits on a quickly growing/hot region that all clients get caught
> waiting for.
> > >
> > >
> > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <[email protected]> wrote:
> > >
> > > > Thank you for the lead! We will definitely look closer at the OS
> logs.
> > > >
> > > >
> > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano
> > > ><[email protected]
> > > >wrote:
> > > >
> > > >>
> > > >> Hi Wayne,
> > > >>
> > > >> > We are seeing some TCP Resets on all nodes at the same time,
and
> > > >> sometimes
> > > >> > quite a lot of them.
> > > >>
> > > >>
> > > >> Have you checked this article from Andrei and Cosmin? They had
a
> > > >> busy firewall to cause network blackout.
> > > >>
> > > >> http://hstack.org/hbase-performance-testing/
> > > >>
> > > >> Maybe it's not your case but just for sure.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> --
> > > >> Tatsuya Kawano (Mr.)
> > > >> Tokyo, Japan
> > > >>
> > > >>
> > > >> On Jan 13, 2011, at 4:52 AM, Wayne <[email protected]> wrote:
> > > >>
> > > >> > We are seeing some TCP Resets on all nodes at the same time,
and
> > > >> sometimes
> > > >> > quite a lot of them. We have yet to correlate the pauses to
the
> > > >> > TCP
> > > >> resets
> > > >> > but I am starting to wonder if this is partly a network
problem.
> > > >> > Does Gigabit Ethernet break down on high volume nodes? Do
high
> > > >> > volume nodes
> > > >> use
> > > >> > 10G or Infiniband?
> > > >> >
> > > >> >
> > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <[email protected]>
wrote:
> > > >> >
> > > >> >> Jon asks that you describe your loading in the issue.  Would
you
> > > >> >> mind doing so.  Ted, stick up in the issue the workload and
> > > >> >> configs. you are running if you don't mind.  I'd like to try
it
> over here.
> > > >> >> Thanks lads,
> > > >> >> St.Ack
> > > >> >>
> > > >> >>
> > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <[email protected]>
> > wrote:
> > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438.
> > > >> >>>
> > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <[email protected]>
> > wrote:
> > > >> >>>
> > > >> >>>> We are using 0.89.20100924, r1001068
> > > >> >>>>
> > > >> >>>> We are seeing see it during heavy write load (which is all
the
> > > time),
> > > >> >> but
> > > >> >>>> yesterday we had read load as well as write load and saw
both
> > > >> >>>> reads
> > > >> and
> > > >> >>>> writes stop for 10+ seconds. The region size is the
biggest
> > > >> >>>> clue we
> > > >> have
> > > >> >>>> found from our tests as setting up a new cluster with a
1GB
> > > >> >>>> max
> > > >> region
> > > >> >> size
> > > >> >>>> and starting to load heavily we will see this a lot for
long
> > > >> >>>> long
> > > >> time
> > > >> >>>> frames. Maybe the bigger file gets hung up more easily
with a
> > > split?
> > > >> >> Your
> > > >> >>>> description below also fits in that early on the load is
not
> > > balanced
> > > >> so
> > > >> >> it
> > > >> >>>> is easier to stop everything on one node as the balance is
not
> > > great
> > > >> >> early
> > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into
the
> > > >> >>>> logs
> > > >> >> during
> > > >> >>>> the pauses to find a node that might be stuck in a split.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack <[email protected]>
> > wrote:
> > > >> >>>>
> > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne <[email protected]>
> > wrote:
> > > >> >>>>>> We have very frequent cluster wide pauses that stop all
> > > >> >>>>>> reads and
> > > >> >>>>> writes
> > > >> >>>>>> for seconds.
> > > >> >>>>>
> > > >> >>>>> All reads and all writes?
> > > >> >>>>>
> > > >> >>>>> I've seen the pause too for writes.  Its something I've
> > > >> >>>>> always
> > > meant
> > > >> >>>>> to look into.  Friso postulates one cause.  Another that
> > > >> >>>>> we've
> > > >> talked
> > > >> >>>>> of is a region taking a while to come back on line after
a
> > > >> >>>>> split
> > > or
> > > >> a
> > > >> >>>>> rebalance for whatever reason.  Client loading might be
> 'random'
> > > >> >>>>> spraying over lots of random regions but they all get
stuck
> > > waiting
> > > >> on
> > > >> >>>>> one particular region to come back online.
> > > >> >>>>>
> > > >> >>>>> I suppose reads could be blocked for same reason if all
are
> > > >> >>>>> trying
> > > >> to
> > > >> >>>>> read from the offlined region.
> > > >> >>>>>
> > > >> >>>>> What version of hbase are you using?  Splits should be
faster
> > > >> >>>>> in
> > > >> 0.90
> > > >> >>>>> now that the split daughters come up on the same region.
> > > >> >>>>>
> > > >> >>>>> Sorry I don't have a better answer for you.  Need to dig
in.
> > > >> >>>>>
> > > >> >>>>> File a JIRA.  If you want to help out some, stick some
data
> > > >> >>>>> up in
> > > >> it.
> > > >> >>>>> Some suggestions would be to enable logging of when we
> > lookup
> > > region
> > > >> >>>>> locations in client and then note when requests go to
zero.
> > > >> >>>>> Can
> > > you
> > > >> >>>>> figure what region the clients are waiting on (if they
are
> > > >> >>>>> waiting
> > > >> on
> > > >> >>>>> any).  If you can pull out a particular one, try and
elicit
> > > >> >>>>> its history at time of blockage.  Is it being moved or
> > > >> >>>>> mid-split?  I suppose it makes sense that bigger regions
> > > >> >>>>> would make the
> > > situation
> > > >> >>>>> 'worse'.  I can take a look at it too.
> > > >> >>>>>
> > > >> >>>>> St.Ack
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> We are constantly loading data to this cluster of 10
nodes.
> > > >> >>>>>> These pauses can happen as frequently as every minute
but
> > > sometimes
> > > >> >> are
> > > >> >>>>> not
> > > >> >>>>>> seen for 15+ minutes. Basically watching the Region
server
> > > >> >>>>>> list
> > > >> with
> > > >> >>>>> request
> > > >> >>>>>> counts is the only evidence of what is going on. All
reads
> > > >> >>>>>> and
> > > >> writes
> > > >> >>>>>> totally stop and if there is ever any activity it is on
the
> > > >> >>>>>> node
> > > >> >> hosting
> > > >> >>>>> the
> > > >> >>>>>> .META. table with a request count of region count + 1.
This
> > > problem
> > > >> >>>>> seems to
> > > >> >>>>>> be worse with a larger region size. We tried a 1GB
region
> > > >> >>>>>> size
> > > and
> > > >> >> saw
> > > >> >>>>> this
> > > >> >>>>>> more than we saw actual activity (and stopped using a
larger
> > > region
> > > >> >> size
> > > >> >>>>>> because of it). We went back to the default region size
and
> > > >> >>>>>> it
> > > was
> > > >> >>>>> better,
> > > >> >>>>>> but we had too many regions so now we are up to 512M for
a
> > > >> >>>>>> region
> > > >> >> size
> > > >> >>>>> and
> > > >> >>>>>> we are seeing it more again.
> > > >> >>>>>>
> > > >> >>>>>> Does anyone know what this is? We have dug into all of
the
> > > >> >>>>>> logs
> > > to
> > > >> >> find
> > > >> >>>>> some
> > > >> >>>>>> sort of pause but are not able to find anything. Is this
an
> > > >> >>>>>> wal
> > > >> hlog
> > > >> >>>>> roll?
> > > >> >>>>>> Is this a region split or compaction? Of course our
biggest
> > > >> >>>>>> fear
> > > is
> > > >> a
> > > >> >> GC
> > > >> >>>>>> pause on the master but we do not have java logging
turned
> > > >> >>>>>> on
> > > with
> > > >> >> the
> > > >> >>>>>> master to tell. What could possibly stop the entire
cluster
> > > >> >>>>>> from
> > > >> >> working
> > > >> >>>>> for
> > > >> >>>>>> seconds at a time very frequently?
> > > >> >>>>>>
> > > >> >>>>>> Thanks in advance for any ideas of what could be causing
> this.
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > > >
> > >
>

RE: Cluster Wide Pauses

Reply via email to