Based on what we saw... there shouldn't be a reason why you don't bump it up to something north of 32K or even 64K. Granted our data nodes have 32GB of memory and the fact that we don't have users on the machine so setting up 64K ulimit -n is really just noise.
I think most Unix/Linux have the number of files a user can simultaneously keep open is 1024, but with today's machines, if you don't have a lot of users, you can really bump it up and if you're creating a Linux image for your nodes, you may just want to make the default for all users to be a soft 64K and a hard 128K. YMMV -Mike > Date: Wed, 17 Nov 2010 23:02:41 +0100 > Subject: Re: Xceiver problem > From: lars.geo...@gmail.com > To: user@hbase.apache.org > > That is what I was also thinking about, thanks for jumping in Todd. > > I was simply not sure if that is just on .27 or all after that one and > the defaults have never been increased. > > On Wed, Nov 17, 2010 at 8:24 PM, Todd Lipcon <t...@cloudera.com> wrote: > > On that new of a kernel you'll also need to increase your epoll limit. Some > > tips about that here: > > > > http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/ > > > > Thanks > > -Todd > > > > On Wed, Nov 17, 2010 at 9:10 AM, Lars George <lars.geo...@gmail.com> wrote: > > > >> Are you running on EC2? Couldn't you simply up the heap size for the > >> java processes? > >> > >> I do not think there is a hard and fast rule to how many xcievers you > >> need, trial and error is common. Or ifmyou have enough heap simply set > >> it too high, like 4096 and that usually works fine. It all depends on > >> how many regions and column families you have on each server. > >> > >> Lars > >> > >> On Wed, Nov 17, 2010 at 5:31 PM, Lucas Nazário dos Santos > >> <nazario.lu...@gmail.com> wrote: > >> > I'm using Linux, the Amazon beta version that they recently released. I'm > >> > not very familiar with Linux, so I think the kernel version > >> > is 2.6.34.7-56.40.amzn1.x86_64. Hadoop version is 0.20.2 and HBase > >> version > >> > is 0.20.6. Hadoop and HBase have 2 GB each and they are not sawpping. > >> > > >> > Besides all other questions I posed, I have one more. How can I calculate > >> > the maximum number of xcievers? Is there a formula? > >> > > >> > Lucas > >> > > >> > > >> > > >> > On Wed, Nov 17, 2010 at 2:12 PM, Lars George <lars.geo...@gmail.com> > >> wrote: > >> > > >> >> Hi Lucas, > >> >> > >> >> What OS are you on? What kernel version? What is your Hadoop and HBase > >> >> version? How much heap do you assign to each Java process? > >> >> > >> >> Lars > >> >> > >> >> On Wed, Nov 17, 2010 at 3:05 PM, Lucas Nazário dos Santos > >> >> <nazario.lu...@gmail.com> wrote: > >> >> > Hi, > >> >> > > >> >> > This problem is widely know, but I'm not able to come up with a decent > >> >> > solution for it. > >> >> > > >> >> > I'm scanning 1.000.000+ rows from one table in order to index their > >> >> content. > >> >> > Each row has around 100 KB. The problem is that I keep getting the > >> >> > exception: > >> >> > > >> >> > Exception in thread > >> >> "org.apache.hadoop.dfs.datanode$dataxceiveser...@82d37" > >> >> > java.lang.OutOfMemoryError: unable to create new native thread > >> >> > > >> >> > This is a Hadoop exception and it causes the DataNote to go down, so I > >> >> > decreased the dfs.datanode.max.xcievers from 4048 to 512. Well, that > >> led > >> >> me > >> >> > to another problem: > >> >> > > >> >> > java.io.IOException: xceiverCount 513 exceeds the limit of concurrent > >> >> > xcievers 512 > >> >> > > >> >> > This time the DataNode doesn't die, nor HBase, but my scan, and the > >> whole > >> >> > indexing process, suffers a lot. > >> >> > > >> >> > After reading different posts about this issue, I have the impression > >> >> that > >> >> > HBase can't handle this limits transparently for the user. The scanner > >> is > >> >> a > >> >> > sequential process, so I thought it would free Hadoop resources > >> already > >> >> used > >> >> > in order to make room for new requests for data under HDFS. What I am > >> >> > missing? Should I slow down the scanning process? Should I scan > >> portions > >> >> of > >> >> > the table sequentially instead of doing a full scan in all 1.000.000+ > >> >> rows? > >> >> > Is there a timeout so unused Hadoop resources can be released? > >> >> > > >> >> > Thanks in advance, > >> >> > Lucas > >> >> > > >> >> > >> > > >> > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > >