I suppose NN2 is standby, please check ZKFC2 is alive before stop network on nn1
Sent from my iPhone5s > On 2014年3月15日, at 10:53, dlmarion <[email protected]> wrote: > > Apache Hadoop 2.3.0 > > > Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone > > > -------- Original message -------- > From: Azuryy > Date:03/14/2014 10:45 PM (GMT-05:00) > To: [email protected] > Subject: Re: HA NN Failover question > > Which Hadoop version you used? > > > Sent from my iPhone5s > > On 2014年3月15日, at 9:29, dlmarion <[email protected]> wrote: > >> Server 1: NN1 and ZKFC1 >> Server 2: NN2 and ZKFC2 >> Server 3: Journal1 and ZK1 >> Server 4: Journal2 and ZK2 >> Server 5: Journal3 and ZK3 >> Server 6+: Datanode >> >> All in the same rack. I would expect the ZKFC from the active name node >> server to lose its lock and the other ZKFC to tell the standby namenode that >> it should become active (I’m assuming that’s how it works). >> >> - Dave >> >> From: Juan Carlos [mailto:[email protected]] >> Sent: Friday, March 14, 2014 9:12 PM >> To: [email protected] >> Subject: Re: HA NN Failover question >> >> Hi Dave, >> How many zookeeper servers do you have and where are them? >> >> Juan Carlos Fernández Rodríguez >> >> El 15/03/2014, a las 01:21, dlmarion <[email protected]> escribió: >> >> I was doing some testing with HA NN today. I set up two NN with active >> failover (ZKFC) using sshfence. I tested that its working on both NN by >> doing ‘kill -9 <pid>’ on the active NN. When I did this on the active node, >> the standby would become the active and everything seemed to work. Next, I >> logged onto the active NN and did a ‘service network stop’ to simulate a >> NIC/network failure. The standby did not become the active in this scenario. >> In fact, it remained in standby mode and complained in the log that it could >> not communicate with (what was) the active NN. I was unable to find anything >> relevant via searches in Google in Jira. Does anyone have experience >> successfully testing this? I’m hoping that it is just a configuration >> problem. >> >> FWIW, when the network was restarted on the active NN, it failed over almost >> immediately. >> >> Thanks, >> >> Dave
