RE: Xceiver problem

Andrew Purtell Thu, 18 Nov 2010 08:40:02 -0800

I can get about 1000 regions per node operating comfortably on a 5 node 
c1.xlarge EC2 cluster using:


Somewhere out of /etc/rc.local:

echo "root soft nofile 65536" >> /etc/security/limits.conf
echo "root hard nofile 65536" >> /etc/security/limits.conf
sysctl -w fs.file-max=65536
sysctl -w fs.epoll.max_user_instances=65536 > /dev/null 2>&1
ulimit -n 65536

In hdfs-site.xml:

<property>
 <name>dfs.datanode.max.xcievers</name>
 <value>10000</value>
</property>
<property>
 <name>dfs.datanode.handler.count</name>
 <value>10</value>
</property>

(and we use 0.20-append so we also enable dfs.support.append globally here and 
link hdfs-site.xml into hbase/conf also)

In hbase-site-xml:

<property>
 <name>hbase.regionserver.handler.count</name>
 <value>100</value>
</property>
<property>
 <name>dfs.datanode.socket.write.timeout</name>
 <value>0</value>
</property>
<property>
 <name>zookeeper.session.timeout</name>
 <value>60000</value>
</property>
<!-- setting this means you must manually kick of major compaction -->
<property>
 <name>hbase.hregion.majorcompaction</name>
 <value>0</value>
</property>

and in hbase-env.sh:

export HBASE_MASTER_OPTS="-Xms1000m -Xmx1000m -Xmn256m -XX:+UseConcMarkSweepGC 
-XX:+AggressiveOpts -XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -Xloggc:/mnt/hbase/logs/hbase-master-gc.log"

export HBASE_REGIONSERVER_OPTS="-Xms4000m -Xmx4000m -Xmn256m 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=85 
-XX:+AggressiveOpts -XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -Xloggc:/mnt/hbase/logs/hbase-regionserver-gc.log"

YMMV, but hope that helps.

Best regards,

    - Andy



--- On Thu, 11/18/10, Michael Segel <michael_se...@hotmail.com> wrote:

> From: Michael Segel <michael_se...@hotmail.com>
> Subject: RE: Xceiver problem
> To: user@hbase.apache.org
> Date: Thursday, November 18, 2010, 8:08 AM
> 
> Based on what we saw... there shouldn't be a reason why you
> don't bump it up to something north of 32K or even 64K.
> Granted our data nodes have 32GB of memory and the fact
> that we don't have users on the machine so setting up 64K
> ulimit -n is really just noise.
> 
> I think most Unix/Linux have the number of files a user can
> simultaneously keep open is 1024, but with today's machines,
> if you don't have a lot of users, you can really bump it up
> and if you're creating a Linux image for your nodes, you may
> just want to make the default for all users to be a soft 64K
> and a hard 128K.
> 
> YMMV
> 
> -Mike
> 
> 
> > Date: Wed, 17 Nov 2010 23:02:41 +0100
> > Subject: Re: Xceiver problem
> > From: lars.geo...@gmail.com
> > To: user@hbase.apache.org
> > 
> > That is what I was also thinking about, thanks for
> jumping in Todd.
> > 
> > I was simply not sure if that is just on .27 or all
> after that one and
> > the defaults have never been increased.
> > 
> > On Wed, Nov 17, 2010 at 8:24 PM, Todd Lipcon <t...@cloudera.com>
> wrote:
> > > On that new of a kernel you'll also need to
> increase your epoll limit. Some
> > > tips about that here:
> > >
> > > http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/
> > >
> > > Thanks
> > > -Todd
> > >
> > > On Wed, Nov 17, 2010 at 9:10 AM, Lars George
> <lars.geo...@gmail.com>
> wrote:
> > >
> > >> Are you running on EC2? Couldn't you simply
> up the heap size for the
> > >> java processes?
> > >>
> > >> I do not think there is a hard and fast rule
> to how many xcievers you
> > >> need, trial and error is common. Or ifmyou
> have enough heap simply set
> > >> it too high, like 4096 and that usually works
> fine. It all depends on
> > >> how many regions and column families you have
> on each server.
> > >>
> > >> Lars
> > >>
> > >> On Wed, Nov 17, 2010 at 5:31 PM, Lucas
> Nazário dos Santos
> > >> <nazario.lu...@gmail.com>
> wrote:
> > >> > I'm using Linux, the Amazon beta version
> that they recently released. I'm
> > >> > not very familiar with Linux, so I think
> the kernel version
> > >> > is 2.6.34.7-56.40.amzn1.x86_64. Hadoop
> version is 0.20.2 and HBase
> > >> version
> > >> > is 0.20.6. Hadoop and HBase have 2 GB
> each and they are not sawpping.
> > >> >
> > >> > Besides all other questions I posed, I
> have one more. How can I calculate
> > >> > the maximum number of xcievers? Is there
> a formula?
> > >> >
> > >> > Lucas
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Nov 17, 2010 at 2:12 PM, Lars
> George <lars.geo...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> Hi Lucas,
> > >> >>
> > >> >> What OS are you on? What kernel
> version? What is your Hadoop and HBase
> > >> >> version? How much heap do you assign
> to each Java process?
> > >> >>
> > >> >> Lars
> > >> >>
> > >> >> On Wed, Nov 17, 2010 at 3:05 PM,
> Lucas Nazário dos Santos
> > >> >> <nazario.lu...@gmail.com>
> wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > This problem is widely know,
> but I'm not able to come up with a decent
> > >> >> > solution for it.
> > >> >> >
> > >> >> > I'm scanning 1.000.000+ rows
> from one table in order to index their
> > >> >> content.
> > >> >> > Each row has around 100 KB. The
> problem is that I keep getting the
> > >> >> > exception:
> > >> >> >
> > >> >> > Exception in thread
> > >> >>
> "org.apache.hadoop.dfs.datanode$dataxceiveser...@82d37"
> > >> >> > java.lang.OutOfMemoryError:
> unable to create new native thread
> > >> >> >
> > >> >> > This is a Hadoop exception and
> it causes the DataNote to go down, so I
> > >> >> > decreased the
> dfs.datanode.max.xcievers from 4048 to 512. Well, that
> > >> led
> > >> >> me
> > >> >> > to another problem:
> > >> >> >
> > >> >> > java.io.IOException:
> xceiverCount 513 exceeds the limit of concurrent
> > >> >> > xcievers 512
> > >> >> >
> > >> >> > This time the DataNode doesn't
> die, nor HBase, but my scan, and the
> > >> whole
> > >> >> > indexing process, suffers a
> lot.
> > >> >> >
> > >> >> > After reading different posts
> about this issue, I have the impression
> > >> >> that
> > >> >> > HBase can't handle this limits
> transparently for the user. The scanner
> > >> is
> > >> >> a
> > >> >> > sequential process, so I
> thought it would free Hadoop resources
> > >> already
> > >> >> used
> > >> >> > in order to make room for new
> requests for data under HDFS. What I am
> > >> >> > missing? Should I slow down the
> scanning process? Should I scan
> > >> portions
> > >> >> of
> > >> >> > the table sequentially instead
> of doing a full scan in all 1.000.000+
> > >> >> rows?
> > >> >> > Is there a timeout so unused
> Hadoop resources can be released?
> > >> >> >
> > >> >> > Thanks in advance,
> > >> >> > Lucas
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
>     
>         
>           
>

RE: Xceiver problem

Reply via email to