I modified zoo.cfg and propagated it to all my nodes and the same problem occurred.
At this point I am going to place my NameNode, SecondaryNameNode, RSs, ZKs, HMaster, and a few DNs on the same physical switch and see if the issue goes away. Thanks again for reviewing my logs. --- Jay Wilson On 7/3/2012 9:40 PM, Amandeep Khurana wrote: > Jay, > > You need to modify the zoo.cfg to reflect the quorum. > > server.0=localhost:2888:3888 will change to something like > > server.0=zk_host_1:2888:3888 > server.1=zk_host_2:2888:3888 > server.3=zk_host_3:2888:3888 > > The same config needs to be on all the zookeeper hosts. > > Also, I assume it's a self managed ZK. > > Secondly, I'm seeing session timeouts between RS and ZK, which means there is > something going on because of which RS is not able to talk to ZK. This could > happen due to the following reasons: > > 1. RS is loaded and is not able to communicate with ZK. This could be due to > a GC pause as well. Based on what you are saying, there is nothing happening > on the cluster so that should not be the case > > 2. The network is acting up. It is very much possible that packets are > getting dropped. I have send that happen myself and it was really hard to > debug. The NoRouteToHostExceptions hints at that. I'm seeing those in your RS > logs too, although that's to do with it not being able to talk to HDFS: > >> 2012-07-03 18:47:25,161 INFO org.apache.hadoop.hdfs.DFSClient: Exception in >> createBlockOutputStream 172.18.0.18:50010 java.net.NoRouteToHostException: >> No route to host > Do you have monitoring in place? Can you get more info on whats going on on > the hosts and the network? > > Also, you can collocate datanodes and region servers, which is not what you > have done currently. > > What's the hardware config on these boxes? > > -Amandeep > > > On Tuesday, July 3, 2012 at 8:16 PM, Jay Wilson wrote: > >> First, thank you for looking at this for me. >> >> Second, the network is up. It is dedicated to the cluster and it appears >> stable. >> >> Third, I haven't modified the zoo.cfg; however, I have put it on >> pastebin. I made all my zookeeper changes in hbase-site.xml >> >> zoo.cfg -- http://pastebin.com/download.php?i=askC9VRG >> hbase-site.xml -- http://pastebin.com/download.php?i=DkLGr57G >> >> HMASTER LOG -- http://pastebin.com/download.php?i=i4U52cWf >> >> ZK (devrackA-03) -- http://pastebin.com/download.php?i=CRyQFKFF >> ZK (devrackA-04) -- http://pastebin.com/download.php?i=WAqAhjdh >> ZK (devrackA-05) -- http://pastebin.com/download.php?i=cS1Gm19x >> >> RS (devrackA-06) -- http://pastebin.com/download.php?i=XayB2HeX >> RS (devrackB-07) -- http://pastebin.com/download.php?i=RQZ45a8j >> RS (devrackB-08) -- http://pastebin.com/download.php?i=ZDZD0z7B >> >> --- >> Jay Wilson >>
