Hi, This problem is widely know, but I'm not able to come up with a decent solution for it.
I'm scanning 1.000.000+ rows from one table in order to index their content. Each row has around 100 KB. The problem is that I keep getting the exception: Exception in thread "org.apache.hadoop.dfs.datanode$dataxceiveser...@82d37" java.lang.OutOfMemoryError: unable to create new native thread This is a Hadoop exception and it causes the DataNote to go down, so I decreased the dfs.datanode.max.xcievers from 4048 to 512. Well, that led me to another problem: java.io.IOException: xceiverCount 513 exceeds the limit of concurrent xcievers 512 This time the DataNode doesn't die, nor HBase, but my scan, and the whole indexing process, suffers a lot. After reading different posts about this issue, I have the impression that HBase can't handle this limits transparently for the user. The scanner is a sequential process, so I thought it would free Hadoop resources already used in order to make room for new requests for data under HDFS. What I am missing? Should I slow down the scanning process? Should I scan portions of the table sequentially instead of doing a full scan in all 1.000.000+ rows? Is there a timeout so unused Hadoop resources can be released? Thanks in advance, Lucas