RegionServer aborting and shutting down

Kristoffer Sjögren Thu, 22 Sep 2016 01:18:01 -0700

Hi

We are running OpenTSDB 2.2 with HBase 1.1.2 and are having problems
with RegionServers that are shutting down sporadically from alleged GC
pauses.


We run 2 OpenTSDB machines and 30 region servers. 8 GB heaps. The
region servers are collocated with data nodes and yarn jobs. Every
region server receive around 1000 req/s each.

Even though the logs says it's a GC pause, monitoring doesn't report
the actual pause. The rather suspicious log line says wal.FSHLog: Slow
sync cost: 56257 ms just after the GC pause detector warned and aborts
the region server. CPU, memory, network looks fine.

We have had this problem for a long time and have been troubleshooting
thoroughly, but we are still clueless.

Any advice would be helpful.

Cheers,
-Kristoffer

[1] https://www.dropbox.com/s/m2cuutcdh81itay/hbase.log?dl=0

RegionServer aborting and shutting down

Reply via email to