Hello all, I'm planning an hbase implementation and had some questions I was hoping someone could help with.
1. Can someone give me a basic overview of how memory is used in Hbase? Various places on the web people state that 16-24gb is the minimum for region servers if they also operate as hdfs/mr nodes. Assuming that hdfs/mr nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase. It seems like lots of people suggesting use of even 24gb+ for hbase. Why so much? Is it simply to avoid gc problems? Have data in memory for fast random reads? Or? 2. What types of things put more/less pressure on memory? I saw insinuation that insert speed can create substantial memory pressure. What type of relative memory pressure do scanners, random reads, random writes, region quantity and compactions cause? 2. How cpu intensive are the region servers? It seems like most of their performance is based on i/o. (I've noted the caution in starving region servers of cycles--which seems primarily focused on avoiding zk timeout > region reassignment problems.) Does anyone suggest or recommend against dedicating only one or two cores to a region server? Do individual compactions benefit from multiple cores are they single-threaded? 3. What are the memory and cpu resource demands of the master server? It seems like more and more of that load is moving to zk. 4. General HDFS question-- when the namenode dies, what happens to the datanodes and how does that relate to Hbase? E.g., can hbase continue to operate in a read-only mode (assuming no datanode/regionserver failures post namenode failure)? Thanks for your help, Jacques
