Could you have also prevented the standby from communicating with Zookeeper? Chris On Mar 14, 2014 8:22 PM, "dlmarion" <[email protected]> wrote:
> I was doing some testing with HA NN today. I set up two NN with active > failover (ZKFC) using sshfence. I tested that its working on both NN by > doing 'kill -9 <pid>' on the active NN. When I did this on the active node, > the standby would become the active and everything seemed to work. Next, I > logged onto the active NN and did a 'service network stop' to simulate a > NIC/network failure. The standby did not become the active in this > scenario. In fact, it remained in standby mode and complained in the log > that it could not communicate with (what was) the active NN. I was unable > to find anything relevant via searches in Google in Jira. Does anyone have > experience successfully testing this? I'm hoping that it is just a > configuration problem. > > > > FWIW, when the network was restarted on the active NN, it failed over > almost immediately. > > > > Thanks, > > > > Dave >
