Hi Lars, All of the max heap sizes are left on their default values (ie 1000MB).
The OOMEs that I encountered in the data nodes was only when I put the dfs.datanode.max.xcievers unrealistically high (8192) in an effort to escape the "xceiverCount X exceeds the limit of concurrent xcievers" errors. The datanodes weren't really having hard crashes, but they were getting OOMEs and becoming unusable until a restart. - Gabriel On Mon, Dec 6, 2010 at 12:33 PM, Lars George <[email protected]> wrote: > Hi Gabriel, > > What max heap to you give the various daemons? This is really odd that > you see OOMEs, I would like to know what it has consumed. You are > saying the Hadoop DataNodes actually crash with the OOME? > > Lars > > On Mon, Dec 6, 2010 at 9:02 AM, Gabriel Reid <[email protected]> wrote: >> Hi, >> >> We're currently running into issues with running a MapReduce job over >> a complete HBase table - we can't seem to find a balance between >> having dfs.datanode.max.xcievers set too low (and getting >> "xceiverCount X exceeds the limit of concurrent xcievers") and getting >> OutOfMemoryErrors on datanodes. >> >> When trying to run a MapReduce job on the complete table we inevitably >> get one of the two above errors eventually -- using a more restrictive >> Scan with a startRow and stopRow for the job runs without problems. >> >> An important note is that the table that is being scanned has a large >> disparity in the size of the values being stored -- one column family >> contains values that are all generally around 256 kB in size, while >> the other column families in the table contain values that are closer >> to 256 bytes. The hbase.hregion.max.filesize setting is still at the >> default (256 MB), meaning that we have HFiles for the big column that >> are around 256 MB, and HFiles for the other columns that are around >> 256 kB. The dfs.datanode.max.xcievers setting is currently at 2048, >> and this is running a 5-node cluster. >> >> The table in question has about 7 million rows, and we're using >> Cloudera CDH3 (HBase 0.89.20100924 and Hadoop 0.20.2). >> >> As far as I have been able to discover, the correct thing to do (or to >> have done) is to set the hbase.hregion.max.filesize to a larger value >> to have a smaller number of rows, which as I understand would probably >> solve the issue here. >> >> My questions are: >> 1. Is my analysis about having a larger hbase.hregion.max.filesize correct? >> 2. Is there something else that we can do to resolve this? >> 3. Am I correct in assuming that the best way to resolve this now is >> to make the hbase.hregion.max.filesize setting larger, and then use >> the org.apache.hadoop.hbase.util.Merge tool as discussed at >> http://osdir.com/ml/general/2010-12/msg00534.html ? >> >> Any help on this would be greatly appreciated. >> >> Thanks, >> >> Gabriel >> >
