Thanks Buttler. I will read this link. Thanks for resolving my queries. :)
-----Original Message----- From: Buttler, David [mailto:[email protected]] Sent: Tuesday, September 20, 2011 10:46 PM To: [email protected] Subject: RE: Queries on Zookeeper failure and RegionServer restartup Have you looked at this: http://hbase.apache.org/book.html#zookeeper Inline... -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Tuesday, September 20, 2011 9:32 AM To: [email protected] Subject: RE: Queries on Zookeeper failure and RegionServer restartup Hi David, Thanks for your response. I am not clear with few things here : 1. Odd number of nodes in your zookeeper ensemble. Why is it required. Can you please explain with example. Does that mean that if I have 3 nodes on which I am running zookeeper and out of which 1 is failed, then the cluster will work. And if out of 3 , 2 are failed then cluster will be down. Buttler> Yes, this is correct. 2. " you do realize that you have to have a majority of zookeeper nodes alive for zookeeper to work," Please explain this. Buttler> Zookeeper needs a quorum of nodes. The algorithm that zookeeper uses defines a quorum as a simple majority. I.e. more than half. If you have 4 nodes, and 2 die, then you have only 2 nodes alive, which is exactly half, not "more than half". Zookeeper will then assume that it can no longer function. Therefore, the advice in the book is to have an odd number of nodes so that you will never be in the case of having "exactly" half of your nodes working. Thanks -----Original Message----- From: Buttler, David [mailto:[email protected]] Sent: Tuesday, September 20, 2011 9:08 PM To: [email protected] Subject: RE: Queries on Zookeeper failure and RegionServer restartup Wait, you do realize that you have to have a majority of zookeeper nodes alive for zookeeper to work, right? That means that you get lower reliability with two nodes than one node: if either node goes down, zookeeper will give up. This also implies that you need to have an odd number of nodes in your zookeeper ensemble. Also, hbase requires synchronized time across the cluster. You can't rely on the built-in clocks to keep time synchronized to a close enough delta over a reasonable period of time (e.g. after a month things will fall apart). Luckily this is a solved problem: ntp Dave -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Tuesday, September 20, 2011 4:40 AM To: [email protected] Subject: RE: Queries on Zookeeper failure and RegionServer restartup Hi Ramkrishna, Thanks for reply, I setup the system date and rechecked ,now region server are starting . Thanks Stuti -----Original Message----- From: Ramkrishna S Vasudevan [mailto:[email protected]] Sent: Tuesday, September 20, 2011 1:56 PM To: [email protected] Subject: RE: Queries on Zookeeper failure and RegionServer restartup Reg the clockoutofSync exception, just check if your cluster has same time set. This problem comes when you have time differences. Best Regards Ram -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Tuesday, September 20, 2011 1:28 PM To: [email protected] Subject: Queries on Zookeeper failure and RegionServer restartup Hi all, I have 2 node cluster. I run Regionserver, Zookeeper on both nodes and Master on 1 and Backup Master on other. Here what I did : I stopped Zookeeper on 1 node and after that I was unable to access Hbase. ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. Queries : 1. If one of the zookeeper is going down , cluster is inaccessible then why we are running multiple zookeeper nodes? 2. Is there some way that if one of zookeeper nodes are working, cluster can be accessible? Some other test : If I stop RegionServer and Master on 1 node, then bakupMaster becomes Master and I can access the Hbase cluster but when I try to restart Region server on the same node on which I have shut down it gives me following error . How to fix this ? 2011-09-20 12:06:03,647 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=master,60020,1316500563205, load=(requests=0, regions=0, usedHeap=22, maxHeap=993): Unhandled exception: org.apache.hadoop.hbase.ClockOutOfSyncException: Server master,60020,1316500563205 has been rejected; Reported time is too far out of sync with master. Time difference of 352381ms > max allowed of 30000ms org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server master,60020,1316500563205 has been rejected; Reported time is too far out of sync with master. Time difference of 352381ms > max allowed of 30000ms at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces sorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc torAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.j ava:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException. java:80) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServ er.java:1515) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionS erver.java:1479) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:57 1) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server master,60020,1316500563205 has been rejected; Reported time is too far out of sync with master. Time difference of 352381ms > max allowed of 30000ms at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.ja va:181) at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManag er.java:129) at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:615) Your inputs are required Thanks Stuti ________________________________ ::DISCLAIMER:: ---------------------------------------------------------------------------- ------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect. ---------------------------------------------------------------------------- -------------------------------------------
