----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44573/ -----------------------------------------------------------
(Updated March 10, 2016, 12:49 p.m.) Review request for Ambari, Arpit Gupta, enis, and Vitalyi Brodetskyi. Bugs: AMBARI-15355 https://issues.apache.org/jira/browse/AMBARI-15355 Repository: ambari Description ------- On one of the clusters i did EU from 2.2.x to 2.3.x. During upgrade there were problems with HBase service checks for region servers and thus upgrade is paused. Region server start is failing with error {code} 2016-03-03 19:55:31,203 ERROR [regionserver:16020] regionserver.HRegionServer: Failed init java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at org.apache.hadoop.hbase.util.ByteBufferArray.<init>(ByteBufferArray.java:65) at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.<init>(ByteBufferIOEngine.java:47) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:307) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.<init>(BucketCache.java:217) at org.apache.hadoop.hbase.io.hfile.CacheConfig.getBucketCache(CacheConfig.java:614) at org.apache.hadoop.hbase.io.hfile.CacheConfig.getL2(CacheConfig.java:553) at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:637) at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1361) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:899) at java.lang.Thread.run(Thread.java:745) 2016-03-03 19:55:31,206 FATAL [regionserver:16020] regionserver.RSRpcServices: Run out of memory; RSRpcServices will abort itself immediately java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at org.apache.hadoop.hbase.util.ByteBufferArray.<init>(ByteBufferArray.java:65) at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.<init>(ByteBufferIOEngine.java:47) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:307) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.<init>(BucketCache.java:217) at org.apache.hadoop.hbase.io.hfile.CacheConfig.getBucketCache(CacheConfig.java:614) at org.apache.hadoop.hbase.io.hfile.CacheConfig.getL2(CacheConfig.java:553) at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:637) at org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1361) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:899) at java.lang.Thread.run(Thread.java:745) 2016-03-03 19:55:35,138 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-3485--1, built on 12/16/2015 02:35 GMT {code} issue is not that hbase_max_direct_memory_size is not set, but the value coming from "HBase off-heap MaxDirectMemorySize" which I assume comes as the hbase_max_direct_memory_size templete variable is set to 12288, but the bucket cache is set to 18G: hbase.bucketcache.size 18432 Since bucket cache is an offheap cache, hbase_max_direct_memory_size should be > hbase.bucketcache.size This was seen on the following cluster: https://s.c:8443/#/main/services/HBASE/configs the bucket cache config (which is an offheap cache) is set to 18G, but the hbase_max_direct_memory_size is set to 12G. hbase_max_direct_memory_size should always be higher than the offheap cache size. Both are configured from Ambari. it reproduces on nodes with a lot of RAM (>23 GB), stack advisor performs recommendation of hbase_max_direct_memory_size Diffs (updated) ----- ambari-server/src/main/resources/stacks/HDP/2.2/services/stack_advisor.py 2518528 ambari-server/src/test/python/stacks/2.3/common/test_stack_advisor.py 11818ba Diff: https://reviews.apache.org/r/44573/diff/ Testing ------- mvn clean test Thanks, Dmitro Lisnichenko