Hi Emmanuel, Thank you for the suggestion. If I am getting it right, Fencing can be configured to shutdown the node on which the ethernet interface has gone down. And that appears to be a correct suggestion. But I have a few queries still.
Queries: * Is the test case ³put down the ethernet interface² not a valid one ? * Why is the node unable to detect that it is cut off from the cluster and shut the services down as per the ³no-quorum-policy² configuration ? Regards, Debabrata On 14/02/16 19:31, "emmanuel segura" <[email protected]> wrote: >use fence and after you configured the fencing you need to use >iptables for testing your cluster, with iptables you can block 5404 >and 5405 ports > >2016-02-14 14:09 GMT+01:00 Debabrata Pani <[email protected]>: >> Hi, >> We ran into some problems when we pull down the ethernet interface using >> ³ifconfig eth0 down² >> >> Our cluster has the following configurations and resources >> >> Two network interfaces : eth0 and lo(cal) >> 3 nodes with one node put in maintenance mode >> No-quorum-policy=stop >> Stonith-enabled=false >> Postgresql Master/Slave >> vip master and vip replication IPs >> VIPs will run on the node where Postgresql Master is running >> >> >> Two test cases that we executed are as follows >> >> Introduce delay in the ethernet interface o f the postgresql PRIMARY >>node >> (Command : tc qdisc add dev eth0 root netem delay 8000ms) >> `Ifconfig eth0 down` on the postgresql PRIMARY Node >> We expected that both these test cases test for network problems in the >> cluster >> >> >> In the first case (ethernet interface delay) >> >> Cluster is divided into ³partition WITH quorum² and ³partition WITHOUT >> quorum² >> Partition WITHOUT quorum shuts down all the services >> Partition WITH quorum takes over as Postgresql PRIMARY and VIPs >> Everything as expected. Wow ! >> >> >> In the second case (ethernet interface down) >> >> We see lots of errors like the following . On the node >> >> Feb 12 14:09:48 corosync [MAIN ] Totem is unable to form a cluster >>because >> of an operating system or network fault. The most common cause of this >> message is that the local firewall is configured improperly. >> Feb 12 14:09:49 corosync [MAIN ] Totem is unable to form a cluster >>because >> of an operating system or network fault. The most common cause of this >> message is that the local firewall is configured improperly. >> Feb 12 14:09:51 corosync [MAIN ] Totem is unable to form a cluster >>because >> of an operating system or network fault. The most common cause of this >> message is that the local firewall is configured improperly. >> >> But the `crm_mon Afr` (from the node whose eth0 is down) always shows >>the >> cluster to be fully formed. >> >> It shows all the nodes as UP >> It shows itself as the one running the postgresql PRIMARY (as was the >>case >> before putting the ethernet interface is down) >> >> `crm_mon -Afr` on the OTHER nodes show a different story >> >> They show the other node as down >> One of the other two nodes takes over the postgresql PRIMARY >> >> This leads to a split brain situation which was gracefully avoided in >>the >> test case where only ³delay is introduced into the interface² >> >> >> Questions : >> >> Is it a known issue with pacemaker when the ethernet interface is >>pulled >> down ? >> Is it an incorrect way of testing the cluster ? There is some >>information >> regarding the same in this thread >> http://www.gossamer-threads.com/lists/linuxha/pacemaker/59738 >> >> >> Regards, >> Deba >> >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > >-- > .~. > /V\ > // \\ >/( )\ >^`~'^ > >_______________________________________________ >Users mailing list: [email protected] >http://clusterlabs.org/mailman/listinfo/users > >Project Home: http://www.clusterlabs.org >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
