Have you looked at the logs of the region servers? That is a good first place to look. How many regions are in your system? If you are using MSLAB, it reserves 2MB/region as a buffer -- that can add up when you have lots of regions.
Given so little information all my guesses are going to be wild, but they might help: 4GB may not be enough for your current load. Have you considered changing your memory allocation, giving less to your map/reduce jobs and more to HBase? What is your key distribution like? Are you writing to all regions equally, or are you hotspotting on one region? Check your cell/row sizes. Are they really large (e.g. cells > 1 MB; rows > 100 MB)? Increasing region size should help here, but there may be an issue with your RAM allocation for HBase. Are you sure that you are not overloading the machine memory? How much RAM do you allocate for map reduce jobs? How do you distribute your processes over machines? Does your master run namenode, hmaster, jobtracker, and zookeeper, while your slaves run datanode, tasktracker, and hregionserver? If so, then your memory allocation is: 4 GB for regionserver 1 GB for OS 1 GB for datanode 1 GB for tasktracker 9/6 GB for M/R So, are you sure that all of your m/r tasks take less than 1 GB? Dave -----Original Message----- From: Oleg Ruchovets [mailto:[email protected]] Sent: Tuesday, August 23, 2011 2:15 AM To: [email protected] Subject: how to make tuning for hbase (every couple of days hbase region sever/s crashe) Hi , Our environment hbase 90.2 (10 machine) We have 10 machine grid: master has 48G ram slaves machine has 16G ram. Region Server process has 4G ram Zookeeper process has 2G ram We have 4map/2reducer per machine We write from m/r job to hbase (2 jobs a day). 3 months system works without any problem , but now every 3/4 days region server crashes. What we done so far: 1) We running major compaction manually once a day 2) We increases regions size to prevent automatic split. Question: What is the way to make a HBase tuning ? How to debug such problem , because it is still not clear for me what is the root cause of region's crashes? We started from this post. http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production <http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production> Regards Oleg.
