We were seeing this for a while with our CDH5 HBase clusters too. We
eventually correlated it very closely to GC pauses. Through heavily tuning
our GC we were able to drastically reduce the logs, by keeping most GC's
under 100ms.

On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti <saad.mu...@gmail.com> wrote:

> From what I can see in the source code, the default is actually even lower
> at 100 ms (can be overridden with hbase.regionserver.hlog.slowsync.ms).
>
> ----
> Saad
>
>
> On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling <kevin.bowl...@kev009.com>
> wrote:
>
> > I see similar log spam while system has reasonable performance.  Was the
> > 250ms default chosen with SSDs and 10ge in mind or something?  I guess
> I'm
> > surprised a sync write several times through JVMs to 2 remote datanodes
> > would be expected to consistently happen that fast.
> >
> > Regards,
> >
> > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti <saad.mu...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly
> > seeing
> > > the following messages in the region server logs:
> > >
> > > 2016-04-25 14:02:55,178 INFO
> > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258
> ms,
> > > current pipeline:
> > > [DatanodeInfoWithStorage[10.99.182.165:50010
> > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> > > DatanodeInfoWithStorage[10.99.182.236:50010
> > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> > > DatanodeInfoWithStorage[10.99.182.195:50010
> > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
> > >
> > >
> > > These happen regularly while HBase appear to be operating normally with
> > > decent read and write performance. We do have occasional performance
> > > problems when regions are auto-splitting, and at first I thought this
> was
> > > related but now I se it happens all the time.
> > >
> > >
> > > Can someone explain what this means really and should we be concerned?
> I
> > > tracked down the source code that outputs it in
> > >
> > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> > >
> > > but after going through the code I think I'd need to know much more
> about
> > > the code to glean anything from it or the associated JIRA ticket
> > > https://issues.apache.org/jira/browse/HBASE-11240.
> > >
> > > Also, what is this "pipeline" the ticket and code talks about?
> > >
> > > Thanks in advance for any information and/or clarification anyone can
> > > provide.
> > >
> > > ----
> > >
> > > Saad
> > >
> >
>

Reply via email to