Re: Slow sync cost

Bryan Beaudreault Wed, 27 Apr 2016 11:23:45 -0700

Hey Ted,

Actually, gc_log_visualizer is open-sourced, I will ask the author to
update the post with links: https://github.com/HubSpot/gc_log_visualizer


The author was taking a foundational approach with this blog post. We do
use ParallelGC for backend non-API deployables, such as kafka consumers and
long running daemons, etc. However, we treat HBase like our API's, in that
it must have low latency requests. So we use G1GC for HBase.

Expect another blog post from another HubSpot engineer soon, with all the
details on how we approached G1GC tuning for HBase. I will update this list
when it's published, and will put some pressure on that author to get it
out there :)

On Wed, Apr 27, 2016 at 2:01 PM Ted Yu <yuzhih...@gmail.com> wrote:

> Bryan:
> w.r.t. gc_log_visualizer, is there plan to open source it ?
>
> bq. while backend throughput will be better/cheaper with ParallelGC.
>
> Does the above mean that hbase servers are still using ParallelGC ?
>
> Thanks
>
> On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > We have 6 production clusters and all of them are tuned differently, so
> I'm
> > not sure there is a setting I could easily give you. It really depends on
> > the usage.  One of our devs wrote a blog post on G1GC fundamentals
> > recently. It's rather long, but could be worth a read:
> >
> >
> http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
> >
> > We will also have a blog post coming out in the next week or so that
> talks
> > specifically to tuning G1GC for HBase. I can update this thread when
> that's
> > available.
> >
> > On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti <saad.mu...@gmail.com> wrote:
> >
> > > That is interesting. Would it be possible for you to share what GC
> > settings
> > > you ended up on that gave you the most predictable performance?
> > >
> > > Thanks.
> > >
> > > ----
> > > Saad
> > >
> > >
> > > On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault <
> > > bbeaudrea...@hubspot.com> wrote:
> > >
> > > > We were seeing this for a while with our CDH5 HBase clusters too. We
> > > > eventually correlated it very closely to GC pauses. Through heavily
> > > tuning
> > > > our GC we were able to drastically reduce the logs, by keeping most
> > GC's
> > > > under 100ms.
> > > >
> > > > On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti <saad.mu...@gmail.com>
> > wrote:
> > > >
> > > > > From what I can see in the source code, the default is actually
> even
> > > > lower
> > > > > at 100 ms (can be overridden with
> > hbase.regionserver.hlog.slowsync.ms
> > > ).
> > > > >
> > > > > ----
> > > > > Saad
> > > > >
> > > > >
> > > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling <
> > > kevin.bowl...@kev009.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I see similar log spam while system has reasonable performance.
> > Was
> > > > the
> > > > > > 250ms default chosen with SSDs and 10ge in mind or something?  I
> > > guess
> > > > > I'm
> > > > > > surprised a sync write several times through JVMs to 2 remote
> > > datanodes
> > > > > > would be expected to consistently happen that fast.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti <
> saad.mu...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're
> > > constantly
> > > > > > seeing
> > > > > > > the following messages in the region server logs:
> > > > > > >
> > > > > > > 2016-04-25 14:02:55,178 INFO
> > > > > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync
> cost:
> > > 258
> > > > > ms,
> > > > > > > current pipeline:
> > > > > > > [DatanodeInfoWithStorage[10.99.182.165:50010
> > > > > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> > > > > > > DatanodeInfoWithStorage[10.99.182.236:50010
> > > > > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> > > > > > > DatanodeInfoWithStorage[10.99.182.195:50010
> > > > > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
> > > > > > >
> > > > > > >
> > > > > > > These happen regularly while HBase appear to be operating
> > normally
> > > > with
> > > > > > > decent read and write performance. We do have occasional
> > > performance
> > > > > > > problems when regions are auto-splitting, and at first I
> thought
> > > this
> > > > > was
> > > > > > > related but now I se it happens all the time.
> > > > > > >
> > > > > > >
> > > > > > > Can someone explain what this means really and should we be
> > > > concerned?
> > > > > I
> > > > > > > tracked down the source code that outputs it in
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> > > > > > >
> > > > > > > but after going through the code I think I'd need to know much
> > more
> > > > > about
> > > > > > > the code to glean anything from it or the associated JIRA
> ticket
> > > > > > > https://issues.apache.org/jira/browse/HBASE-11240.
> > > > > > >
> > > > > > > Also, what is this "pipeline" the ticket and code talks about?
> > > > > > >
> > > > > > > Thanks in advance for any information and/or clarification
> anyone
> > > can
> > > > > > > provide.
> > > > > > >
> > > > > > > ----
> > > > > > >
> > > > > > > Saad
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Slow sync cost

Reply via email to