Re: Slow sync cost

Bryan Beaudreault Wed, 27 Apr 2016 12:29:27 -0700

We turned off auto-splitting by setting our region sizes to very large
(100gb). We split them manually when they become too unwieldy from a
compaction POV.


We do use BufferedMutators in a number of places. They are pretty
straightforward, and definitely improve performance. The only lessons
learned there would be to use low buffer sizes. You'll get a lot of
benefits from just 1MB size, but if you want to go higher than that, you
should should aim for less than half of your G1GC region size. Anything
larger than that is considered a humongous object, and has implications for
garbage collection. The blog post I linked earlier goes into humongous
objects:
http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection#HumongousObjects.
We've seen them to be very bad for GC performance when many of them come in
at once.

So for us, most of our regionservers are 40gb+ heaps, which for that we use
32mb G1GC regions. With 32mb G1GC regions, we aim for all buffered mutators
to use less than 16mb buffer sizes -- we even go further to limit it to
around 10mb just to be safe. We also do the same for reads -- we try to
limit all scanner and multiget responses to less than 10mb.

We've created a dashboard with our internal monitoring system which shows
the count of requests that we consider too large, for all applications (we
have many 100s of deployed applications hitting these clusters). It's on
the individual teams that own the applications to try to drive that count
down to 0. We've built into HBase a detention queue (similar to quotas),
where we can put any of these applications based on their username if they
are doing something that is adversely affecting the rest of the system. For
instance if they started spamming a lot of too large requests, or badly
filtered scans, etc. In the detention queue, they use their own RPC
handlers which we can aggressively limit or reject if need be to preserve
the cluster.

Hope this helps

On Wed, Apr 27, 2016 at 2:54 PM Saad Mufti <[email protected]> wrote:

> Hi Bryan,
>
> In Hubspot do you use a single shared (per-JVM) BufferedMutator anywhere in
> an attempt to get better performance? Any lessons learned from any
> attempts? Has it hurt or helped?
>
> Also do you have any experience with write performance in conjunction with
> auto-splitting activity kicking in, either with BufferedMutator or
> separately with just direct Put's?
>
> Thanks.
>
> ----
> Saad
>
>
>
>
> On Wed, Apr 27, 2016 at 2:22 PM, Bryan Beaudreault <
> [email protected]
> > wrote:
>
> > Hey Ted,
> >
> > Actually, gc_log_visualizer is open-sourced, I will ask the author to
> > update the post with links: https://github.com/HubSpot/gc_log_visualizer
> >
> > The author was taking a foundational approach with this blog post. We do
> > use ParallelGC for backend non-API deployables, such as kafka consumers
> and
> > long running daemons, etc. However, we treat HBase like our API's, in
> that
> > it must have low latency requests. So we use G1GC for HBase.
> >
> > Expect another blog post from another HubSpot engineer soon, with all the
> > details on how we approached G1GC tuning for HBase. I will update this
> list
> > when it's published, and will put some pressure on that author to get it
> > out there :)
> >
> > On Wed, Apr 27, 2016 at 2:01 PM Ted Yu <[email protected]> wrote:
> >
> > > Bryan:
> > > w.r.t. gc_log_visualizer, is there plan to open source it ?
> > >
> > > bq. while backend throughput will be better/cheaper with ParallelGC.
> > >
> > > Does the above mean that hbase servers are still using ParallelGC ?
> > >
> > > Thanks
> > >
> > > On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault <
> > > [email protected]
> > > > wrote:
> > >
> > > > We have 6 production clusters and all of them are tuned differently,
> so
> > > I'm
> > > > not sure there is a setting I could easily give you. It really
> depends
> > on
> > > > the usage.  One of our devs wrote a blog post on G1GC fundamentals
> > > > recently. It's rather long, but could be worth a read:
> > > >
> > > >
> > >
> >
> http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
> > > >
> > > > We will also have a blog post coming out in the next week or so that
> > > talks
> > > > specifically to tuning G1GC for HBase. I can update this thread when
> > > that's
> > > > available.
> > > >
> > > > On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti <[email protected]>
> > wrote:
> > > >
> > > > > That is interesting. Would it be possible for you to share what GC
> > > > settings
> > > > > you ended up on that gave you the most predictable performance?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > ----
> > > > > Saad
> > > > >
> > > > >
> > > > > On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > We were seeing this for a while with our CDH5 HBase clusters too.
> > We
> > > > > > eventually correlated it very closely to GC pauses. Through
> heavily
> > > > > tuning
> > > > > > our GC we were able to drastically reduce the logs, by keeping
> most
> > > > GC's
> > > > > > under 100ms.
> > > > > >
> > > > > > On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > From what I can see in the source code, the default is actually
> > > even
> > > > > > lower
> > > > > > > at 100 ms (can be overridden with
> > > > hbase.regionserver.hlog.slowsync.ms
> > > > > ).
> > > > > > >
> > > > > > > ----
> > > > > > > Saad
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling <
> > > > > [email protected]
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I see similar log spam while system has reasonable
> performance.
> > > > Was
> > > > > > the
> > > > > > > > 250ms default chosen with SSDs and 10ge in mind or something?
> > I
> > > > > guess
> > > > > > > I'm
> > > > > > > > surprised a sync write several times through JVMs to 2 remote
> > > > > datanodes
> > > > > > > > would be expected to consistently happen that fast.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're
> > > > > constantly
> > > > > > > > seeing
> > > > > > > > > the following messages in the region server logs:
> > > > > > > > >
> > > > > > > > > 2016-04-25 14:02:55,178 INFO
> > > > > > > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync
> > > cost:
> > > > > 258
> > > > > > > ms,
> > > > > > > > > current pipeline:
> > > > > > > > > [DatanodeInfoWithStorage[10.99.182.165:50010
> > > > > > > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> > > > > > > > > DatanodeInfoWithStorage[10.99.182.236:50010
> > > > > > > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> > > > > > > > > DatanodeInfoWithStorage[10.99.182.195:50010
> > > > > > > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > These happen regularly while HBase appear to be operating
> > > > normally
> > > > > > with
> > > > > > > > > decent read and write performance. We do have occasional
> > > > > performance
> > > > > > > > > problems when regions are auto-splitting, and at first I
> > > thought
> > > > > this
> > > > > > > was
> > > > > > > > > related but now I se it happens all the time.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Can someone explain what this means really and should we be
> > > > > > concerned?
> > > > > > > I
> > > > > > > > > tracked down the source code that outputs it in
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> > > > > > > > >
> > > > > > > > > but after going through the code I think I'd need to know
> > much
> > > > more
> > > > > > > about
> > > > > > > > > the code to glean anything from it or the associated JIRA
> > > ticket
> > > > > > > > > https://issues.apache.org/jira/browse/HBASE-11240.
> > > > > > > > >
> > > > > > > > > Also, what is this "pipeline" the ticket and code talks
> > about?
> > > > > > > > >
> > > > > > > > > Thanks in advance for any information and/or clarification
> > > anyone
> > > > > can
> > > > > > > > > provide.
> > > > > > > > >
> > > > > > > > > ----
> > > > > > > > >
> > > > > > > > > Saad
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Slow sync cost

Reply via email to