Re: Regionservers going down during compaction

Vladimir Rodionov Thu, 16 Jul 2015 12:03:08 -0700

Ankit,

First of all, you need to make sure that your systems do not swap (they do,
I am pretty sure)
There are two reasons, why system go to swap:


1. default setting for 'vm.swappiness' (60) and high memory pressure (not
your case)
2. No high memory pressure, but not enough free memory in a particular zone
of allocation when 'vm.swappiness=0'

I think, 2. is what you have

Your boxes have at least 2 CPU (nodes), probably - 4. It means that Linux
divides overall RAM in 2-4 zones (16-32GB in size). It is possible that one
of the zones (where compaction thread runs ?) during compaction is out of
free pages and file-backed pages and Linux starts swapping (or even kills
process)

To verify this you will need:
1. Confirm that you have si/so events in vmstat (swapping) during compaction
2. dig into ' /proc/*pid*/numa_maps <http://linux.die.net/man/5/numa_maps> '
and verify that you have uneven memory allocation between zones.

Your output will be something similar to:

aaaaad3e000 default anon=13240527 dirty=13223315
  swapcache=3440324 active=13202235 N0=7865429 N1=5375098

There are two zones: N0 and N1. Memory is in number of pages (4K).

If you confirm both 1 and 2 you should change NUMA kernel memory allocation
policy from default (local) to interleaved all.

cmd="/usr/bin/numactl --interleave all $cmd"

Check 'man numactl' how to run application with different NUMA policies.

-Vlad


On Wed, Jul 15, 2015 at 8:23 AM, lars hofhansl <[email protected]> wrote:

> We're running with fine 31g heap (31 to be able to make use of compressed
> oops) after a lot of tuning. Maybe your pattern is different...?
>
> Or... Since it is ParNew on a 1GB only small gen taking that much time...
> Maybe you ran into this: http://www.evanjones.ca/jvm-mmap-pause.html?
>
> -- Lars
>       From: Vladimir Rodionov <[email protected]>
>  To: "[email protected]" <[email protected]>
>  Sent: Monday, July 13, 2015 10:16 AM
>  Subject: Re: Regionservers going down during compaction
>
> Ankit,
>
> -Xms31744m -Xmx31744 seems too high.
>
> You run on SSD's and you probably do not need large (on heap) block cache.
> Large heap + major compaction can result in bad GC behavior and cluster
> instability. Its very hard to tune. Unless you 100% sure that 30GB is
> absolutely necessary I would suggest reducing heap.
>
> -Vlad
>
>
>
> On Mon, Jul 13, 2015 at 8:28 AM, Dave Latham <[email protected]> wrote:
>
> > What JDK are you using?  I've seen such behavior when a machine was
> > swapping.  Can you tell if there was any swap in use?
> >
> > On Mon, Jul 13, 2015 at 3:24 AM, Ankit Singhal
> > <[email protected]> wrote:
> > > Hi Team,
> > >
> > > We are seeing regionservers getting down whenever major compaction is
> > > triggered on table(8.5TB size).
> > > Can anybody help with the resolution or give pointers to resolve this.
> > >
> > > Below are the current observation:-
> > >
> > > Above behaviour is seen even when compaction is run on compacted
> tables.
> > > Load average seems to be normal and under 4(for 32 core machine).
> > > Except bad datanode and JVM pause errors, No other error is seen in the
> > > logs.
> > >
> > >
> > > Cluster configuration:-
> > > 79 Nodes
> > > 32 core machine,64GB RAM ,1.2TB SSDs
> > >
> > > JVM OPTs:-
> > >
> > > export HBASE_OPTS="$HBASE_OPTS  -XX:+UseParNewGC
> > -XX:+PerfDisableSharedMem
> > > -XX:+UseConcMarkSweepGC -XX:ErrorFile={{log_dir}}/hs_err_pid%p.log"
> > > $HBASE_REGIONSERVER_OPTS -XX:+PerfDisableSharedMem -XX:PermSize=128m
> > > -XX:MaxPermSize=256m -XX:+UseCMSInitiatingOccupancyOnly  -Xmn1024m
> > > -XX:CMSInitiatingOccupancyFraction=70  -Xms31744m -Xmx31744
> > >
> > > HBase-site.xml:-
> > > PFA
> > >
> > > GC logs:-
> > >
> > > 2015-07-12T23:15:29.485-0700: 9260.407:
> [GC2015-07-12T23:15:29.485-0700:
> > > 9260.407: [ParNew: 839872K->947K(943744K), 0.0324180 secs]
> > > 1431555K->592630K(32401024K), 0.0325930 secs] [Times: user=0.72
> sys=0.00,
> > > real=0.03 secs]
> > >
> > > 2015-07-12T23:15:30.532-0700: 9261.454:
> [GC2015-07-12T23:15:30.532-0700:
> > > 9261.454: [ParNew: 839859K->1017K(943744K), 31.0324970 secs]
> > > 1431542K->592702K(32401024K), 31.0326950 secs] [Times: user=0.89
> > sys=0.02,
> > > real=31.03 secs]
> > >
> > > 2015-07-12T23:16:02.490-0700: 9293.412:
> [GC2015-07-12T23:16:02.490-0700:
> > > 9293.412: [ParNew: 839929K->1100K(943744K), 0.0319400 secs]
> > > 1431614K->592785K(32401024K), 0.0321580 secs] [Times: user=0.71
> sys=0.00,
> > > real=0.03 secs]
> > >
> > > 2015-07-12T23:16:03.747-0700: 9294.669:
> [GC2015-07-12T23:16:03.747-0700:
> > > 9294.669: [ParNew: 840012K->894K(943744K), 0.0304370 secs]
> > > 1431697K->592579K(32401024K), 0.0305330 secs] [Times: user=0.67
> sys=0.01,
> > > real=0.03 secs]
> > >
> > > Heap
> > >
> > >  par new generation  total 943744K, used 76608K [0x00007f54d4000000,
> > > 0x00007f5514000000, 0x00007f5514000000)
> > >
> > >  eden space 838912K,  9% used [0x00007f54d4000000, 0x00007f54d89f0728,
> > > 0x00007f5507340000)
> > >
> > >  from space 104832K,  0% used [0x00007f5507340000, 0x00007f550741fab0,
> > > 0x00007f550d9a0000)
> > >
> > >  to  space 104832K,  0% used [0x00007f550d9a0000, 0x00007f550d9a0000,
> > > 0x00007f5514000000)
> > >
> > >  concurrent mark-sweep generation total 31457280K, used 591685K
> > > [0x00007f5514000000, 0x00007f5c94000000, 0x00007f5c94000000)
> > >
> > >  concurrent-mark-sweep perm gen total 131072K, used 44189K
> > > [0x00007f5c94000000, 0x00007f5c9c000000, 0x00007f5ca4000000)
> > >
> > >
> > >
> > >
> > > Regionserver logs:-
> > >
> > >
> > > 2015-07-12 23:16:01,565 WARN  [regionserver60020.periodicFlusher]
> > > util.Sleeper: We slept 38712ms instead of 10000ms, this is likely due
> to
> > a
> > > long garbage collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > >
> > > 2015-07-12 23:16:01,565 WARN  [ResponseProcessor for block
> > > BP-552832523-xxx.xxx.xxx.xxx-1433419204036:blk_1075292594_1595455]
> > > hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
> > >
> >
> BP-552832523-xxx.xxx.xxx.xxxxxx.xxx.xxx.xxx-1433419204036:blk_1075292594_1595455
> > >
> > > java.io.EOFException: Premature EOF: no length prefix available
> > >
> > >        at
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2208)
> > >
> > >        at
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
> > >
> > >        at
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:868)
> > >
> > > 2015-07-12 23:16:01,565 INFO
> > > [regionserver60020-SendThread(<hostname>:2181)] zookeeper.ClientCnxn:
> > Client
> > > session timed out, have not heard from server in 41080ms for sessionid
> > > 0x24e76a76ee6dc50, closing socket connection and attempting reconnect
> > >
> > > 2015-07-12 23:16:01,565 INFO
> > > [regionserver60020-SendThread(<hostname>:2181)] zookeeper.ClientCnxn:
> > Client
> > > session timed out, have not heard from server in 39748ms for sessionid
> > > 0x34e76a76f05e006, closing socket connection and attempting reconnect
> > >
> > > 2015-07-12 23:16:01,565 INFO
> > >
> >
> [regionserver60020-smallCompactions-1436759027218-SendThread(<hostname>:2181)]
> > > zookeeper.ClientCnxn: Client session timed out, have not heard from
> > server
> > > in 42697ms for sessionid 0x14e76a7707202a2, closing socket connection
> and
> > > attempting reconnect
> > >
> > > 2015-07-12 23:16:01,565 INFO
> > > [regionserver60020-SendThread(<hostname>:2181)] zookeeper.ClientCnxn:
> > Client
> > > session timed out, have not heard from server in 33764ms for sessionid
> > > 0x14e76a77071dd59, closing socket connection and attempting reconnect
> > >
> > > 2015-07-12 23:16:01,565 WARN  [ResponseProcessor for block
> > > BP-552832523-xxx.xxx.xxx.xxx-1433419204036:blk_1075293683_1596593]
> > > hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
> > > BP-552832523-xxx.xxx.xxx.xxx-1433419204036:blk_1075293683_1596593
> > >
> > > java.io.EOFException: Premature EOF: no length prefix available
> > >
> > >        at
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2208)
> > >
> > >        at
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
> > >
> > >        at
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:868)
> > >
> > > 2015-07-12 23:16:01,565 WARN  [regionserver60020] util.Sleeper: We
> slept
> > > 33688ms instead of 3000ms, this is likely due to a long garbage
> > collecting
> > > pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > >
> > >
> > >
> > > Regards,
> > > Ankit Singhal
> >
>
>
>

Re: Regionservers going down during compaction

Reply via email to