Hi, In our large HBase cluster based on CDH 5.5 in AWS, we're constantly seeing the following messages in the region server logs:
2016-04-25 14:02:55,178 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 ms, current pipeline: [DatanodeInfoWithStorage[10.99.182.165:50010,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK], DatanodeInfoWithStorage[10.99.182.236:50010,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK], DatanodeInfoWithStorage[10.99.182.195:50010 ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]] These happen regularly while HBase appear to be operating normally with decent read and write performance. We do have occasional performance problems when regions are auto-splitting, and at first I thought this was related but now I se it happens all the time. Can someone explain what this means really and should we be concerned? I tracked down the source code that outputs it in hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java but after going through the code I think I'd need to know much more about the code to glean anything from it or the associated JIRA ticket https://issues.apache.org/jira/browse/HBASE-11240. Also, what is this "pipeline" the ticket and code talks about? Thanks in advance for any information and/or clarification anyone can provide. ---- Saad
