Thanks for sharing, Akshay. I think the solution should be part of hbase reference guide.
On Fri, Jan 18, 2013 at 7:55 AM, Akshay Singh <[email protected]> wrote: > I found the problem, so I thought I would post it here for future > reference. > > The problem was IPv6 enabled network. Though IPv6 in HDFS ( > HADOOP_OPTS=-Djava.net.preferIPv4Stack=true), and in HBase ( > -Djava.net.preferIPv4Stack=true) was already disabled, but for some of the > machines in cluster IPv6 was not disabled in kernel (through sysctl). > > So hbase was using IPv6 for its services on some of the hosts. So I am > guessing at start of every workload, HBase tries to resolve AAAA records, > which eventually times out. And then it resolves to IPv4 address, and thats > when operations start at normal rate. > > On the same note, surprisingly, in one of the host disabling IPv6 through > sysctl (persisted in sysctl.conf) was not enough to discourage HBase to use > IPv6 communication. I had to disable IPv6 in grub (default grub cmdline in > /etc/default/grub) on this host. > > After there was *no IPv6 whatsoever* in the cluster, YCSB clients start > doing operation on HBase immediately. > > Thanks, > Akshay > > > > ________________________________ > From: Akshay Singh <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Tuesday, 15 January 2013 10:36 AM > Subject: Re: Slow start of HBase operations with YCSB, possibly because of > zookeeper ? > > Thanks Samar. > > You are right YCSB writes data to a single table 'usertable', but I see > very slow operations (in order of 1-2 operations/second) even for > read/update workload and not only for inserts. So, the region is already > split in to multiple RS before I start my transaction workload. > > And keys are fairly random in YCSB, so I doubt if the slow operations are > owing to the fact that table is initially limited to one region. > > To my knowledge this should have something to do with Zookeeper, as (said > in the original mail) if I increase the > "hbase.zookeeper.watcher.sync.connected.wait" (to 10 sec) I dont see the > exceptions thrown by ZookeeperWatcher, which I see with default value of > 2s. I have a stand-alone zookeeper instance, to which all RS connects to. > > Any other component I should closely monitor ? > > Thanks, > Akshay > > > > ________________________________ > From: samar kumar <[email protected]> > To: [email protected] > Sent: Tuesday, 15 January 2013 3:58 AM > Subject: Re: Slow start of HBase operations with YCSB, possibly because of > zookeeper ? > > YCSB would be writing all data to one table.. So initially when the table > is small or just created all the writes would go to one RS.. As the table > grows the Region is split into different RS. The would allow parallel > writes, if the keys are random and could possibly make the writes faster. > Samar > > On 15/01/13 6:34 AM, "Akshay Singh" <[email protected]> wrote: > > > > >Hi hbase users, > > > >I am running HBase (on top of HDFS) in > >distributed mode (on 8 VMs), and things like JPS look fine on all the > >machines in the cluster. I am also able to run hbase shell and > >interact with HBase though it. But when I want to benchmark my HBase > >cluster with YCSB (Yahoo! Cloud System Benchmark, > >https://github.com/brianfrankcooper/YCSB/) I see this weird problem > >of slow start of the HBase operations and then picking up later. > > > >Basically when I start the YCSB > >workload from a client machine, I see these problems in chronological > >order : > > > >1) ERROR zookeeper.ZooKeeperWatcher: ZK > >is null on connection event > > > >########### > >ERROR zookeeper.ZooKeeperWatcher: ZK is > >null on connection event -- see stack trace for the stack trace when > >constructor was called on this zkw > >java.lang.Exception: ZKW CONSTRUCTOR > >STACK TRACE FOR DEBUGGING > >at > >org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher > >.java:142) > >at > >org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher > >.java:126) > >at > >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio > >n.getZooKeeperWatcher(HConnectionManager.java:1322) > >at > >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio > >n.ensureZookeeperTrackers(HConnectionManager.java:584) > >at > >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio > >n.locateRegion(HConnectionManager.java:827) > >at > >org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio > >n.locateRegion(HConnectionManager.java:810) > >at > >org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:232) > >at > >org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:172) > >at > >org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:131) > >at > >com.yahoo.ycsb.db.HBaseClient.getHTable(HBaseClient.java:155) > >########### > > > >2) org.apache.zookeeper.ClientCnxn - > >Error while calling watcher > >java.lang.NullPointerException: ZK > >is null > > > >############ > >ERROR org.apache.zookeeper.ClientCnxn - > >Error while calling watcher > >java.lang.NullPointerException: ZK is > >null > >at > >org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeep > >erWatcher.java:334) > >at > >org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatche > >r.java:271) > >at > >org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:5 > >21) > >at > >org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) > >############ > > > >3) And then finally it starts the > >operation on HBase (which means Zookeeper is running fine and can be > >connected to ) > > > >4) The operations remains below 10 > >ops/sec for first 60-70 sec, and then grow gradually to reach aroun > >1300 ops/sec (normally expected number) > > > >Here are the actual logs :: http://pastebin.com/NC1zKwRF > > > >I am running > >1) Hadoop-1.0.1 > >2) HBase-0.94.1 > >3) Zookeeper-3.3.6 > >4) Java 1.6.0_24 (openJDK-6) > >5) OS : Ubuntu-11.10 > >6) YCSB-0.14 > > > >What I have already tried : > > > >1) Checked my DNS setting (just to be > >sure .. using synced /etc/hosts file) .. no luck > >2) Increasing > >"hbase.zookeeper.watcher.sync.connected.wait" to 10000 > >(default:2000), this get rid of "ZK is null ****" errors, > >but slow start is still the issue with no improvement. > > > >I am clueless as to what may be the > >reason behind this 'slowly picking up' behavior of my set-up. > >Please advise. > > > >Thanks, > >Akshay >
