I modified zoo.cfg and propagated it to all my nodes and the same
problem occurred.

At this point I am going to place my NameNode, SecondaryNameNode, RSs,
ZKs, HMaster, and a few DNs on the same physical switch and see if the
issue goes away.

Thanks again for reviewing my logs.

---
Jay Wilson


On 7/3/2012 9:40 PM, Amandeep Khurana wrote:
> Jay, 
> 
> You need to modify the zoo.cfg to reflect the quorum.
> 
> server.0=localhost:2888:3888 will change to something like
> 
> server.0=zk_host_1:2888:3888
> server.1=zk_host_2:2888:3888
> server.3=zk_host_3:2888:3888
> 
> The same config needs to be on all the zookeeper hosts.
> 
> Also, I assume it's a self managed ZK.
> 
> Secondly, I'm seeing session timeouts between RS and ZK, which means there is 
> something going on because of which RS is not able to talk to ZK. This could 
> happen due to the following reasons:
> 
> 1. RS is loaded and is not able to communicate with ZK. This could be due to 
> a GC pause as well. Based on what you are saying, there is nothing happening 
> on the cluster so that should not be the case
> 
> 2. The network is acting up. It is very much possible that packets are 
> getting dropped. I have send that happen myself and it was really hard to 
> debug. The NoRouteToHostExceptions hints at that. I'm seeing those in your RS 
> logs too, although that's to do with it not being able to talk to HDFS:
> 
>> 2012-07-03 18:47:25,161 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
>> createBlockOutputStream 172.18.0.18:50010 java.net.NoRouteToHostException: 
>> No route to host
> Do you have monitoring in place? Can you get more info on whats going on on 
> the hosts and the network?
> 
> Also, you can collocate datanodes and region servers, which is not what you 
> have done currently.
> 
> What's the hardware config on these boxes?
> 
> -Amandeep 
> 
> 
> On Tuesday, July 3, 2012 at 8:16 PM, Jay Wilson wrote:
> 
>> First, thank you for looking at this for me.
>>
>> Second, the network is up. It is dedicated to the cluster and it appears
>> stable.
>>
>> Third, I haven't modified the zoo.cfg; however, I have put it on
>> pastebin. I made all my zookeeper changes in hbase-site.xml
>>
>> zoo.cfg -- http://pastebin.com/download.php?i=askC9VRG
>> hbase-site.xml -- http://pastebin.com/download.php?i=DkLGr57G
>>
>> HMASTER LOG -- http://pastebin.com/download.php?i=i4U52cWf
>>
>> ZK (devrackA-03) -- http://pastebin.com/download.php?i=CRyQFKFF
>> ZK (devrackA-04) -- http://pastebin.com/download.php?i=WAqAhjdh
>> ZK (devrackA-05) -- http://pastebin.com/download.php?i=cS1Gm19x
>>
>> RS (devrackA-06) -- http://pastebin.com/download.php?i=XayB2HeX
>> RS (devrackB-07) -- http://pastebin.com/download.php?i=RQZ45a8j
>> RS (devrackB-08) -- http://pastebin.com/download.php?i=ZDZD0z7B
>>
>> ---
>> Jay Wilson
>>



Reply via email to