I found the problem, so I thought I would post it here for future reference.

The problem was IPv6 enabled network. Though IPv6 in HDFS ( 
HADOOP_OPTS=-Djava.net.preferIPv4Stack=true), and in HBase ( 
-Djava.net.preferIPv4Stack=true) was already disabled, but for some of the 
machines in cluster IPv6 was not disabled in kernel (through sysctl). 

So hbase was using IPv6 for its services on some of the hosts. So I am guessing 
at start of every workload, HBase tries to resolve AAAA records, which 
eventually times out. And then it resolves to IPv4 address, and thats when 
operations start at normal rate. 

On the same note, surprisingly, in one of the host disabling IPv6 through 
sysctl (persisted in sysctl.conf) was not enough to discourage HBase to use 
IPv6 communication. I had to disable IPv6 in grub (default grub cmdline in 
/etc/default/grub) on this host. 

After there was *no IPv6 whatsoever* in the cluster, YCSB clients start doing 
operation on HBase immediately.

Thanks,
Akshay



________________________________
 From: Akshay Singh <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Tuesday, 15 January 2013 10:36 AM
Subject: Re: Slow start of HBase operations with YCSB, possibly because of 
zookeeper ?
 
Thanks Samar.

You are right YCSB writes data to a single table 'usertable', but I see very 
slow operations (in order of 1-2 operations/second) even for read/update 
workload and not only for inserts. So, the region is already split in to 
multiple RS before I start my transaction workload.

And keys are fairly random in YCSB, so I doubt if the slow operations are owing 
to the fact that table is initially limited to one region.

To my knowledge this should have something to do with Zookeeper, as (said in 
the original mail) if I increase the 
"hbase.zookeeper.watcher.sync.connected.wait" (to 10 sec) I dont see the 
exceptions thrown by ZookeeperWatcher, which I see with default value of 2s. I 
have a stand-alone zookeeper instance, to which all RS connects to.

Any other component I should closely monitor ?

Thanks,
Akshay



________________________________
From: samar kumar <[email protected]>
To: [email protected] 
Sent: Tuesday, 15 January 2013 3:58 AM
Subject: Re: Slow start of HBase operations with YCSB, possibly because of 
zookeeper ?

YCSB would be writing all data to one table.. So initially when the table
is small or just created all the writes would go to one RS.. As the table
grows the Region is split into different RS. The would allow parallel
writes, if the keys are random and could possibly make the writes faster.
Samar

On 15/01/13 6:34 AM, "Akshay Singh" <[email protected]> wrote:

> 
>Hi hbase users,
>
>I am running HBase (on top of HDFS) in
>distributed mode (on 8 VMs), and things like JPS look fine on all the
>machines in the cluster. I am also able to run hbase shell and
>interact with HBase though it. But when I want to benchmark my HBase
>cluster with YCSB (Yahoo! Cloud System Benchmark,
>https://github.com/brianfrankcooper/YCSB/) I see this weird problem
>of slow start of the HBase operations and then picking up later.
>
>Basically when I start the YCSB
>workload from a client machine, I see these problems in chronological
>order :
>
>1) ERROR zookeeper.ZooKeeperWatcher: ZK
>is null on connection event
>
>###########
>ERROR zookeeper.ZooKeeperWatcher: ZK is
>null on connection event -- see stack trace for the stack trace when
>constructor was called on this zkw
>java.lang.Exception: ZKW CONSTRUCTOR
>STACK TRACE FOR DEBUGGING
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher
>.java:142)
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher
>.java:126)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getZooKeeperWatcher(HConnectionManager.java:1322)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.ensureZookeeperTrackers(HConnectionManager.java:584)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:827)
>at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.locateRegion(HConnectionManager.java:810)
>at
>org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232)
>at
>org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172)
>at
>org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:131)
>at
>com.yahoo.ycsb.db.HBaseClient.getHTable(HBaseClient.java:155)
>###########
>
>2) org.apache.zookeeper.ClientCnxn -
>Error while calling watcher
>java.lang.NullPointerException: ZK
>is null
>
>############
>ERROR org.apache.zookeeper.ClientCnxn -
>Error while calling watcher
>java.lang.NullPointerException: ZK is
>null
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeep
>erWatcher.java:334)
>at
>org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatche
>r.java:271)
>at
>org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:5
>21)
>at
>org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
>############
>
>3) And then finally it starts the
>operation on HBase (which means Zookeeper is running fine and can be
>connected to )
>
>4) The operations remains below 10
>ops/sec for first 60-70 sec, and then grow gradually to reach aroun
>1300 ops/sec (normally expected number)
>
>Here are the actual logs :: http://pastebin.com/NC1zKwRF
>
>I am running
>1) Hadoop-1.0.1
>2) HBase-0.94.1
>3) Zookeeper-3.3.6
>4) Java 1.6.0_24 (openJDK-6)
>5) OS : Ubuntu-11.10
>6) YCSB-0.14
>
>What I have already tried :
>
>1) Checked my DNS setting (just to be
>sure .. using synced /etc/hosts file) .. no luck
>2) Increasing
>"hbase.zookeeper.watcher.sync.connected.wait" to 10000
>(default:2000), this get rid of "ZK is null ****" errors,
>but slow start is still the issue with no improvement.
>
>I am clueless as to what may be the
>reason behind this 'slowly picking up' behavior of my set-up.
>Please advise.
>
>Thanks,
>Akshay

Reply via email to