On Fri, Jul 18, 2014 at 4:46 PM, Jane Tao <[email protected]> wrote:
Hi there,
Our goal is to fully utilize the free RAM on each node/region server for
HBase. At the same time, we do not want to incur too much pressure from GC
(garbage collection). Based on Ted's sugguestion, we are trying to using
bucket cache.
However, we are not sure:
Sorry. Config is a little complicated at the moment. It has had some
cleanup in trunk. Meantime...
- The relation between XX:MaxDirectMemorySize and java heap size. Is
MaxDirectMemorySize part of java heap size ?
No. It is the maximum for how much the JVM should use OFFHEAP. Here is a
bit of a note I just added to the refguide:
<para>The default maximum direct memory varies by JVM.
Traditionally it is 64M
or some relation to allocated heap size (-Xmx) or no
limit at all (JDK7 apparently).
HBase servers use direct memory, in particular
short-circuit reading, the hosted DFSClient will
allocate direct memory buffers. If you do offheap
block caching, you'll
be making use of direct memory. Starting your JVM,
make sure
the <varname>-XX:MaxDirectMemorySize</varname> setting
in
<filename>conf/hbase-env.sh</filename> is set to some
value that is
higher than what you have allocated to your offheap
blockcache
(<varname>hbase.bucketcache.size</varname>). It
should be larger than your offheap block
cache and then some for DFSClient usage (How much the
DFSClient uses is not
easy to quantify; it is the number of open hfiles *
<varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
where hbase.dfs.client.read.shortcircuit.buffer.size
is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
default configurations).
</para>
- The relation between XX:MaxDirectMemorySize and hbase.bucketcache.size.
Are they equal?
XX:MaxDirectMemorySize should be larger than hbase.bucketcache.size. They
should not be equal. See note above for why.
- How to adjust hbase.bucketcache.percentage.in.combinedcache?
Or just leave it as is. To adjust, just set it to other than the default
which is 0.9 (0.9 of hbase.bucketcache.size). This configuration has been
removed from trunk because it is confusing.
Right now, we have the following configuration. Does it make sense?
- java heap size of each hbase region server to 12 GB
- -XX:MaxDirectMemorySize to be 6GB
Why not set it to 48G since you have the RAM?
- hbase-site.xml :
<property>
<name>hbase.offheapcache.percentage</name>
<value>0</value>
</property>
This setting is not needed. 0 is the default.
<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
</property>
<property>
<name>hbase.bucketcache.percentage.in.combinedcache</name>
<value>0.8</value>
</property>
Or you could just undo this setting and go with the default which is 0.9.
<property>
<name>hbase.bucketcache.size</name>
<value>6144</value>
</property>
Adjust this to be 40000? (smile).
Let us know how it goes.
What version of HBase you running? Thanks.
St.Ack
Thanks,
Jane
On 7/17/2014 3:05 PM, Ted Yu wrote:
Have you considered using BucketCache ?
Please read 9.6.4.1 under
http://hbase.apache.org/book.html#regionserver.arch
Note: remember to verify the config values against the hbase release
you're
using.
Cheers
On Thu, Jul 17, 2014 at 2:53 PM, Jane Tao <[email protected]> wrote:
Hi Ted,
In my case, there is a 6 Node HBase cluster setup (running on Oracle
BDA).
Each node has plenty of RAM (64GB) and CPU cores. Several articles seem
to
suggest
that it is not a good idea to allocate too much RAM to region server's
heap setting.
If each region server has 10GB heap and there is only one region server
per node, then
I have 10x6=60GB for the whole HBase. This setting is good for ~100M rows
but starts
to incur lots of GC activities on region servers when loading billions of
rows.
Basically, I need a configuration that can fully utilize the free RAM on
each node for HBase.
Thanks,
Jane
On 7/16/2014 4:17 PM, Ted Yu wrote:
Jane:
Can you briefly describe the use case where multiple region servers are
needed on the same host ?
Cheers
On Wed, Jul 16, 2014 at 3:14 PM, Dhaval Shah <
[email protected]
wrote:
Its certainly possible (atleast with command line) but probably very
messy. You will need to have different ports, different log files,
different pid files, possibly even different configs on the same
machine.
Regards,
Dhaval
________________________________
From: Jane Tao <[email protected]>
To: [email protected]
Sent: Wednesday, 16 July 2014 6:06 PM
Subject: multiple region servers at one machine
Hi there,
Is it possible to run multiple region servers at one machine/node? If
this is possible, how to start multiple region servers with command
lines or cloudera manager?
Thanks,
Jane
--
--
--