Hi, We're currently running into issues with running a MapReduce job over a complete HBase table - we can't seem to find a balance between having dfs.datanode.max.xcievers set too low (and getting "xceiverCount X exceeds the limit of concurrent xcievers") and getting OutOfMemoryErrors on datanodes.
When trying to run a MapReduce job on the complete table we inevitably get one of the two above errors eventually -- using a more restrictive Scan with a startRow and stopRow for the job runs without problems. An important note is that the table that is being scanned has a large disparity in the size of the values being stored -- one column family contains values that are all generally around 256 kB in size, while the other column families in the table contain values that are closer to 256 bytes. The hbase.hregion.max.filesize setting is still at the default (256 MB), meaning that we have HFiles for the big column that are around 256 MB, and HFiles for the other columns that are around 256 kB. The dfs.datanode.max.xcievers setting is currently at 2048, and this is running a 5-node cluster. The table in question has about 7 million rows, and we're using Cloudera CDH3 (HBase 0.89.20100924 and Hadoop 0.20.2). As far as I have been able to discover, the correct thing to do (or to have done) is to set the hbase.hregion.max.filesize to a larger value to have a smaller number of rows, which as I understand would probably solve the issue here. My questions are: 1. Is my analysis about having a larger hbase.hregion.max.filesize correct? 2. Is there something else that we can do to resolve this? 3. Am I correct in assuming that the best way to resolve this now is to make the hbase.hregion.max.filesize setting larger, and then use the org.apache.hadoop.hbase.util.Merge tool as discussed at http://osdir.com/ml/general/2010-12/msg00534.html ? Any help on this would be greatly appreciated. Thanks, Gabriel
