Re: Cluster Wide Pauses

Wayne Thu, 27 Jan 2011 06:54:31 -0800

This is great, thanks. We had narrowed down our problem to a memstore flush,
and this confirms why.


On Wed, Jan 26, 2011 at 10:51 PM, Todd Lipcon <[email protected]> wrote:

> Hey all,
>
> I spent some time this afternoon looking into this issue and think I found
> a
> good culprit:
> https://issues.apache.org/jira/browse/HBASE-3483
>
> If you've been having this problem, please watch that JIRA for a patch to
> try (the one up there now is OK but not great).
>
> -Todd
>
> On Fri, Jan 14, 2011 at 10:06 AM, Geoff Hendrey <[email protected]
> >wrote:
>
> > This is not an answer to your question, but just an anecdote on cluster
> > pauses/slowdowns. We had horrible problems with cluster wide pauses. I
> > think there were several keys to getting this resolved:
> >
> > 1) we used the default settings recommended for bulk inserts:
> > http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf<http://people.apache.org/%7Ejdcryans/HUG8/HUG8-rawson.pdf>
> > 2) we upgraded to hbase 20.6 b/c there was a deadlock bug in prior
> > versions that basically just caused the entire cluster to "go to sleep"
> >
> > Finally, we had a very strange problem which took 3 weeks of debugging
> > to get to the bottom of. I don't expect that this is your problem, but
> > I'll just throw it out there. Most bulk HBase data-producing M/R jobs
> > are going to do some processing, then write the data from the reducer
> > into hbase (using autoflush=false and disabling the WAL). Since the
> > reducers all receive keys in the same order, this causes all the
> > reducers to load the same HBase region simultaneously. We had this
> > "great idea" that if we reversed the keys that we wrote out of our
> > mapper, then un-reversed them in the reducer, that our reducers would be
> > randomly writing to different region servers, not hitting a single
> > region in lock step. Now, I have some theories on why this seemingly
> > innocuous approach repeatedly destroyed our entire Hbase database. I
> > won't wax philosophical here, but one thing is certain: Any table
> > created via batch inserts of randomized keys got totally hosed. Scans
> > became dirt slow and compactions ran constantly, even *days* after the
> > table was created. None of these problems made a whole lot of sense,
> > which is why it took 3-4 weeks of debugging for us to back this "key
> > randomizing" out of our code. The hosed tables, actually had to be
> > dropped for the problem, and ensuing chaos to totally abate. Until we
> > dropped the tables, if the region server logs showed constant
> > compaction. Like I said, it sounds crazy, but this definitely was the
> > cause of our problem. I'm fully expecting a lot of "your crazy"
> > responses to this email, but we repeatedly reproduced the issue, and the
> > fix was to stop the "key reversing". We just had to live with all the
> > reducers loading individual regions in lock step, as this was really not
> > a big deal (at least not as big a deal as hosing the entire
> > installation).
> >
> > -g
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of Christopher
> > Tarnas
> > Sent: Friday, January 14, 2011 9:54 AM
> > To: [email protected]
> > Subject: Re: Cluster Wide Pauses
> >
> > Thanks - I was not sure and had not received a response from the list on
> > my
> > related question earlier this week.
> >
> > It does seem like compactions are related to my problem, and if I
> > understand
> > correctly does raising hbase.hregion.memstore.block.multiplier give it
> > more
> > of a buffer for that before writes are blocked while compactions happen?
> > I'm
> > writing via thrift (about 30 clients) to a 5 node cluster when I see
> > this
> > problem. There is no io wait so I don't think it is disk bound, and it
> > is
> > not CPU starved. I'm waiting on IT to get me access to ganglia for the
> > network info.
> >
> > -chris
> >
> > On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray <[email protected]> wrote:
> >
> > > These are a different kind of pause (those caused by
> > blockingStoreFiles).
> > >
> > > This is HBase stepping in and actually blocking updates to a region
> > because
> > > compactions have not been able to keep up with the write load.  It
> > could
> > > manifest itself in the same way but this is different than shorter
> > pauses
> > > caused by periodic offlining of regions during balancing and splits.
> > >
> > > Wayne, have you confirmed in your RegionServer logs that the pauses
> > are
> > > associated with splits or region movement, and that you are not seeing
> > the
> > > blocking store files issue?
> > >
> > > JG
> > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Christopher
> > > > Tarnas
> > > > Sent: Friday, January 14, 2011 7:29 AM
> > > > To: [email protected]
> > > > Subject: Re: Cluster Wide Pauses
> > > >
> > > > I have been seeing similar problems and found by raising the
> > > > hbase.hregion.memstore.block.multiplier
> > > > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles
> > to
> > > 16 I
> > > > managed to reduce the frequency of the pauses during loads.  My
> > nodes are
> > > > pretty beefy (48 GB of ram) so I had room to experiment.
> > > >
> > > > From what I understand that gave the regionservers more buffer
> > before
> > > > they had to halt the world to catch up. The pauses still happen but
> > their
> > > > impact is less now.
> > > >
> > > > -chris
> > > >
> > > > On Fri, Jan 14, 2011 at 8:34 AM, Wayne <[email protected]> wrote:
> > > >
> > > > > We have not found any smoking gun here. Most likely these are
> > region
> > > > > splits on a quickly growing/hot region that all clients get caught
> > > waiting for.
> > > > >
> > > > >
> > > > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <[email protected]> wrote:
> > > > >
> > > > > > Thank you for the lead! We will definitely look closer at the OS
> > > logs.
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano
> > > > > ><[email protected]
> > > > > >wrote:
> > > > > >
> > > > > >>
> > > > > >> Hi Wayne,
> > > > > >>
> > > > > >> > We are seeing some TCP Resets on all nodes at the same time,
> > and
> > > > > >> sometimes
> > > > > >> > quite a lot of them.
> > > > > >>
> > > > > >>
> > > > > >> Have you checked this article from Andrei and Cosmin? They had
> > a
> > > > > >> busy firewall to cause network blackout.
> > > > > >>
> > > > > >> http://hstack.org/hbase-performance-testing/
> > > > > >>
> > > > > >> Maybe it's not your case but just for sure.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> --
> > > > > >> Tatsuya Kawano (Mr.)
> > > > > >> Tokyo, Japan
> > > > > >>
> > > > > >>
> > > > > >> On Jan 13, 2011, at 4:52 AM, Wayne <[email protected]> wrote:
> > > > > >>
> > > > > >> > We are seeing some TCP Resets on all nodes at the same time,
> > and
> > > > > >> sometimes
> > > > > >> > quite a lot of them. We have yet to correlate the pauses to
> > the
> > > > > >> > TCP
> > > > > >> resets
> > > > > >> > but I am starting to wonder if this is partly a network
> > problem.
> > > > > >> > Does Gigabit Ethernet break down on high volume nodes? Do
> > high
> > > > > >> > volume nodes
> > > > > >> use
> > > > > >> > 10G or Infiniband?
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <[email protected]>
> > wrote:
> > > > > >> >
> > > > > >> >> Jon asks that you describe your loading in the issue.  Would
> > you
> > > > > >> >> mind doing so.  Ted, stick up in the issue the workload and
> > > > > >> >> configs. you are running if you don't mind.  I'd like to try
> > it
> > > over here.
> > > > > >> >> Thanks lads,
> > > > > >> >> St.Ack
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <[email protected]>
> > > > wrote:
> > > > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438.
> > > > > >> >>>
> > > > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <[email protected]>
> > > > wrote:
> > > > > >> >>>
> > > > > >> >>>> We are using 0.89.20100924, r1001068
> > > > > >> >>>>
> > > > > >> >>>> We are seeing see it during heavy write load (which is all
> > the
> > > > > time),
> > > > > >> >> but
> > > > > >> >>>> yesterday we had read load as well as write load and saw
> > both
> > > > > >> >>>> reads
> > > > > >> and
> > > > > >> >>>> writes stop for 10+ seconds. The region size is the
> > biggest
> > > > > >> >>>> clue we
> > > > > >> have
> > > > > >> >>>> found from our tests as setting up a new cluster with a
> > 1GB
> > > > > >> >>>> max
> > > > > >> region
> > > > > >> >> size
> > > > > >> >>>> and starting to load heavily we will see this a lot for
> > long
> > > > > >> >>>> long
> > > > > >> time
> > > > > >> >>>> frames. Maybe the bigger file gets hung up more easily
> > with a
> > > > > split?
> > > > > >> >> Your
> > > > > >> >>>> description below also fits in that early on the load is
> > not
> > > > > balanced
> > > > > >> so
> > > > > >> >> it
> > > > > >> >>>> is easier to stop everything on one node as the balance is
> > not
> > > > > great
> > > > > >> >> early
> > > > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into
> > the
> > > > > >> >>>> logs
> > > > > >> >> during
> > > > > >> >>>> the pauses to find a node that might be stuck in a split.
> > > > > >> >>>>
> > > > > >> >>>>
> > > > > >> >>>>
> > > > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack <[email protected]>
> > > > wrote:
> > > > > >> >>>>
> > > > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne <[email protected]>
> > > > wrote:
> > > > > >> >>>>>> We have very frequent cluster wide pauses that stop all
> > > > > >> >>>>>> reads and
> > > > > >> >>>>> writes
> > > > > >> >>>>>> for seconds.
> > > > > >> >>>>>
> > > > > >> >>>>> All reads and all writes?
> > > > > >> >>>>>
> > > > > >> >>>>> I've seen the pause too for writes.  Its something I've
> > > > > >> >>>>> always
> > > > > meant
> > > > > >> >>>>> to look into.  Friso postulates one cause.  Another that
> > > > > >> >>>>> we've
> > > > > >> talked
> > > > > >> >>>>> of is a region taking a while to come back on line after
> > a
> > > > > >> >>>>> split
> > > > > or
> > > > > >> a
> > > > > >> >>>>> rebalance for whatever reason.  Client loading might be
> > > 'random'
> > > > > >> >>>>> spraying over lots of random regions but they all get
> > stuck
> > > > > waiting
> > > > > >> on
> > > > > >> >>>>> one particular region to come back online.
> > > > > >> >>>>>
> > > > > >> >>>>> I suppose reads could be blocked for same reason if all
> > are
> > > > > >> >>>>> trying
> > > > > >> to
> > > > > >> >>>>> read from the offlined region.
> > > > > >> >>>>>
> > > > > >> >>>>> What version of hbase are you using?  Splits should be
> > faster
> > > > > >> >>>>> in
> > > > > >> 0.90
> > > > > >> >>>>> now that the split daughters come up on the same region.
> > > > > >> >>>>>
> > > > > >> >>>>> Sorry I don't have a better answer for you.  Need to dig
> > in.
> > > > > >> >>>>>
> > > > > >> >>>>> File a JIRA.  If you want to help out some, stick some
> > data
> > > > > >> >>>>> up in
> > > > > >> it.
> > > > > >> >>>>> Some suggestions would be to enable logging of when we
> > > > lookup
> > > > > region
> > > > > >> >>>>> locations in client and then note when requests go to
> > zero.
> > > > > >> >>>>> Can
> > > > > you
> > > > > >> >>>>> figure what region the clients are waiting on (if they
> > are
> > > > > >> >>>>> waiting
> > > > > >> on
> > > > > >> >>>>> any).  If you can pull out a particular one, try and
> > elicit
> > > > > >> >>>>> its history at time of blockage.  Is it being moved or
> > > > > >> >>>>> mid-split?  I suppose it makes sense that bigger regions
> > > > > >> >>>>> would make the
> > > > > situation
> > > > > >> >>>>> 'worse'.  I can take a look at it too.
> > > > > >> >>>>>
> > > > > >> >>>>> St.Ack
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>> We are constantly loading data to this cluster of 10
> > nodes.
> > > > > >> >>>>>> These pauses can happen as frequently as every minute
> > but
> > > > > sometimes
> > > > > >> >> are
> > > > > >> >>>>> not
> > > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region
> > server
> > > > > >> >>>>>> list
> > > > > >> with
> > > > > >> >>>>> request
> > > > > >> >>>>>> counts is the only evidence of what is going on. All
> > reads
> > > > > >> >>>>>> and
> > > > > >> writes
> > > > > >> >>>>>> totally stop and if there is ever any activity it is on
> > the
> > > > > >> >>>>>> node
> > > > > >> >> hosting
> > > > > >> >>>>> the
> > > > > >> >>>>>> .META. table with a request count of region count + 1.
> > This
> > > > > problem
> > > > > >> >>>>> seems to
> > > > > >> >>>>>> be worse with a larger region size. We tried a 1GB
> > region
> > > > > >> >>>>>> size
> > > > > and
> > > > > >> >> saw
> > > > > >> >>>>> this
> > > > > >> >>>>>> more than we saw actual activity (and stopped using a
> > larger
> > > > > region
> > > > > >> >> size
> > > > > >> >>>>>> because of it). We went back to the default region size
> > and
> > > > > >> >>>>>> it
> > > > > was
> > > > > >> >>>>> better,
> > > > > >> >>>>>> but we had too many regions so now we are up to 512M for
> > a
> > > > > >> >>>>>> region
> > > > > >> >> size
> > > > > >> >>>>> and
> > > > > >> >>>>>> we are seeing it more again.
> > > > > >> >>>>>>
> > > > > >> >>>>>> Does anyone know what this is? We have dug into all of
> > the
> > > > > >> >>>>>> logs
> > > > > to
> > > > > >> >> find
> > > > > >> >>>>> some
> > > > > >> >>>>>> sort of pause but are not able to find anything. Is this
> > an
> > > > > >> >>>>>> wal
> > > > > >> hlog
> > > > > >> >>>>> roll?
> > > > > >> >>>>>> Is this a region split or compaction? Of course our
> > biggest
> > > > > >> >>>>>> fear
> > > > > is
> > > > > >> a
> > > > > >> >> GC
> > > > > >> >>>>>> pause on the master but we do not have java logging
> > turned
> > > > > >> >>>>>> on
> > > > > with
> > > > > >> >> the
> > > > > >> >>>>>> master to tell. What could possibly stop the entire
> > cluster
> > > > > >> >>>>>> from
> > > > > >> >> working
> > > > > >> >>>>> for
> > > > > >> >>>>>> seconds at a time very frequently?
> > > > > >> >>>>>>
> > > > > >> >>>>>> Thanks in advance for any ideas of what could be causing
> > > this.
> > > > > >> >>>>>>
> > > > > >> >>>>>
> > > > > >> >>>>
> > > > > >> >>>>
> > > > > >> >>>
> > > > > >> >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Cluster Wide Pauses

Reply via email to