Are you running on EC2? Couldn't you simply up the heap size for the java processes?
I do not think there is a hard and fast rule to how many xcievers you need, trial and error is common. Or ifmyou have enough heap simply set it too high, like 4096 and that usually works fine. It all depends on how many regions and column families you have on each server. Lars On Wed, Nov 17, 2010 at 5:31 PM, Lucas Nazário dos Santos <nazario.lu...@gmail.com> wrote: > I'm using Linux, the Amazon beta version that they recently released. I'm > not very familiar with Linux, so I think the kernel version > is 2.6.34.7-56.40.amzn1.x86_64. Hadoop version is 0.20.2 and HBase version > is 0.20.6. Hadoop and HBase have 2 GB each and they are not sawpping. > > Besides all other questions I posed, I have one more. How can I calculate > the maximum number of xcievers? Is there a formula? > > Lucas > > > > On Wed, Nov 17, 2010 at 2:12 PM, Lars George <lars.geo...@gmail.com> wrote: > >> Hi Lucas, >> >> What OS are you on? What kernel version? What is your Hadoop and HBase >> version? How much heap do you assign to each Java process? >> >> Lars >> >> On Wed, Nov 17, 2010 at 3:05 PM, Lucas Nazário dos Santos >> <nazario.lu...@gmail.com> wrote: >> > Hi, >> > >> > This problem is widely know, but I'm not able to come up with a decent >> > solution for it. >> > >> > I'm scanning 1.000.000+ rows from one table in order to index their >> content. >> > Each row has around 100 KB. The problem is that I keep getting the >> > exception: >> > >> > Exception in thread >> "org.apache.hadoop.dfs.datanode$dataxceiveser...@82d37" >> > java.lang.OutOfMemoryError: unable to create new native thread >> > >> > This is a Hadoop exception and it causes the DataNote to go down, so I >> > decreased the dfs.datanode.max.xcievers from 4048 to 512. Well, that led >> me >> > to another problem: >> > >> > java.io.IOException: xceiverCount 513 exceeds the limit of concurrent >> > xcievers 512 >> > >> > This time the DataNode doesn't die, nor HBase, but my scan, and the whole >> > indexing process, suffers a lot. >> > >> > After reading different posts about this issue, I have the impression >> that >> > HBase can't handle this limits transparently for the user. The scanner is >> a >> > sequential process, so I thought it would free Hadoop resources already >> used >> > in order to make room for new requests for data under HDFS. What I am >> > missing? Should I slow down the scanning process? Should I scan portions >> of >> > the table sequentially instead of doing a full scan in all 1.000.000+ >> rows? >> > Is there a timeout so unused Hadoop resources can be released? >> > >> > Thanks in advance, >> > Lucas >> > >> >