Thanks, that was very helpful. Regarding 24gb-- I saw people using servers with 32gb of server memory (a recent thread here and hstack.org). I extrapolated the use since it seems most people use ~8 for hdfs/mr.
-Jacques On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray <[email protected]> wrote: > > > > -----Original Message----- > > From: Jacques [mailto:[email protected]] > > Sent: Friday, July 30, 2010 1:16 PM > > To: [email protected] > > Subject: Memory Consumption and Processing questions > > > > Hello all, > > > > I'm planning an hbase implementation and had some questions I was > > hoping > > someone could help with. > > > > 1. Can someone give me a basic overview of how memory is used in Hbase? > > Various places on the web people state that 16-24gb is the minimum for > > region servers if they also operate as hdfs/mr nodes. Assuming that > > hdfs/mr > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase. It > > seems > > like lots of people suggesting use of even 24gb+ for hbase. Why so > > much? > > Is it simply to avoid gc problems? Have data in memory for fast > > random > > reads? Or? > > Where exactly are you reading this from? I'm not actually aware of people > using 24GB+ heaps for HBase. > > I would not recommend using less than 4GB for RegionServers. Beyond that, > it very much depends on your application. 8GB is often sufficient but I've > seen as much as 16GB used in production. > > You need at least 4GB because of GC. General experience has been that > below that the CMS GC does not work well. > > Memory is used primarily for the MemStores (write cache) and Block Cache > (read cache). In addition, memory is allocated as part of normal operations > to store in-memory state and in processing reads. > > > 2. What types of things put more/less pressure on memory? I saw > > insinuation > > that insert speed can create substantial memory pressure. What type of > > relative memory pressure do scanners, random reads, random writes, > > region > > quantity and compactions cause? > > Writes are buffered and flushed to disk when the write buffer gets to a > local or global limit. The local limit (per region) defaults to 64MB. The > global limit is based on the total amount of heap available (default, I > think, is 40%). So there is interplay between how much heap you have and > how many regions are actively written to. If you have too many regions and > not enough memory to allow them to hit the local/region limit, you end up > flushing undersized files. > > Scanning/random reading will utilize the block cache, if configured to. > The more room for the block cache, the more data you can keep in-memory. > Reads from the block cache are significantly faster than non-cached reads, > obviously. > > Compactions are not generally an issue. > > > 2. How cpu intensive are the region servers? It seems like most of > > their > > performance is based on i/o. (I've noted the caution in starving > > region > > servers of cycles--which seems primarily focused on avoiding zk timeout > > > > > region reassignment problems.) Does anyone suggest or recommend > > against > > dedicating only one or two cores to a region server? Do individual > > compactions benefit from multiple cores are they single-threaded? > > I would dedicate at least one core to a region server, but as we add more > and more concurrency, it may become important to have two cores available. > Many things, like compactions, are only single threaded today but there's a > very good chance you will be able to configure multiple threads in the next > major release. > > > 3. What are the memory and cpu resource demands of the master server? > > It > > seems like more and more of that load is moving to zk. > > Not too much. I'm putting a change in TRUNK right now that keeps all > region assignments in the master, so there is some memory usage, but not > much. I would think 2GB heap and 1-2 cores is sufficient. > > > 4. General HDFS question-- when the namenode dies, what happens to the > > datanodes and how does that relate to Hbase? E.g., can hbase continue > > to > > operate in a read-only mode (assuming no datanode/regionserver failures > > post > > namenode failure)? > > Today, HBase will probably die ungracefully once it does start to hit the > NN. There are some open JIRAs about HBase behavior under different HDFS > faults and trying to be as graceful as possible when they happen, including > HBASE-2183 about riding over an HDFS restart. > > > > > Thanks for your help, > > Jacques >
