> -----Original Message-----
> From: Jacques [mailto:[email protected]]
> Sent: Friday, July 30, 2010 1:16 PM
> To: [email protected]
> Subject: Memory Consumption and Processing questions
> 
> Hello all,
> 
> I'm planning an hbase implementation and had some questions I was
> hoping
> someone could help with.
> 
> 1. Can someone give me a basic overview of how memory is used in Hbase?
>  Various places on the web people state that 16-24gb is the minimum for
> region servers if they also operate as hdfs/mr nodes.  Assuming that
> hdfs/mr
> nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase.  It
> seems
> like lots of people suggesting use of even 24gb+ for hbase.  Why so
> much?
>  Is it simply to avoid gc problems?  Have data in memory for fast
> random
> reads? Or?

Where exactly are you reading this from?  I'm not actually aware of people 
using 24GB+ heaps for HBase.

I would not recommend using less than 4GB for RegionServers.  Beyond that, it 
very much depends on your application.  8GB is often sufficient but I've seen 
as much as 16GB used in production.

You need at least 4GB because of GC.  General experience has been that below 
that the CMS GC does not work well.

Memory is used primarily for the MemStores (write cache) and Block Cache (read 
cache).  In addition, memory is allocated as part of normal operations to store 
in-memory state and in processing reads.

> 2. What types of things put more/less pressure on memory?  I saw
> insinuation
> that insert speed can create substantial memory pressure.  What type of
> relative memory pressure do scanners, random reads, random writes,
> region
> quantity and compactions cause?

Writes are buffered and flushed to disk when the write buffer gets to a local 
or global limit.  The local limit (per region) defaults to 64MB.  The global 
limit is based on the total amount of heap available (default, I think, is 
40%).  So there is interplay between how much heap you have and how many 
regions are actively written to.  If you have too many regions and not enough 
memory to allow them to hit the local/region limit, you end up flushing 
undersized files.

Scanning/random reading will utilize the block cache, if configured to.  The 
more room for the block cache, the more data you can keep in-memory.  Reads 
from the block cache are significantly faster than non-cached reads, obviously.

Compactions are not generally an issue.

> 2. How cpu intensive are the region servers?  It seems like most of
> their
> performance is based on i/o.  (I've noted the caution in starving
> region
> servers of cycles--which seems primarily focused on avoiding zk timeout
> >
> region reassignment problems.)  Does anyone suggest or recommend
> against
> dedicating only one or two cores to a region server?  Do individual
> compactions benefit from multiple cores are they single-threaded?

I would dedicate at least one core to a region server, but as we add more and 
more concurrency, it may become important to have two cores available.  Many 
things, like compactions, are only single threaded today but there's a very 
good chance you will be able to configure multiple threads in the next major 
release.

> 3. What are the memory and cpu resource demands of the master server?
> It
> seems like more and more of that load is moving to zk.

Not too much.  I'm putting a change in TRUNK right now that keeps all region 
assignments in the master, so there is some memory usage, but not much.  I 
would think 2GB heap and 1-2 cores is sufficient.

> 4. General HDFS question-- when the namenode dies, what happens to the
> datanodes and how does that relate to Hbase?  E.g., can hbase continue
> to
> operate in a read-only mode (assuming no datanode/regionserver failures
> post
> namenode failure)?

Today, HBase will probably die ungracefully once it does start to hit the NN.  
There are some open JIRAs about HBase behavior under different HDFS faults and 
trying to be as graceful as possible when they happen, including HBASE-2183 
about riding over an HDFS restart.

> 
> Thanks for your help,
> Jacques

Reply via email to