Re: [ClusterLabs] Pacemaker issue when ethernet interface is pulled down

Debabrata Pani Sun, 14 Feb 2016 06:50:54 -0800

Hi Emmanuel,

Thank you for the suggestion.
If I am getting it right, Fencing can be configured to shutdown the node
on which the ethernet interface has gone down.
And that appears to be a correct suggestion.
But I have a few queries still.


Queries:
* Is the test case ³put down the ethernet interface² not a valid one ?
* Why is the node unable to detect that it is cut off from the cluster and
shut the services down as per the ³no-quorum-policy² configuration ?


Regards,
Debabrata

On 14/02/16 19:31, "emmanuel segura" <[email protected]> wrote:

>use fence and after you configured the fencing you need to use
>iptables for testing your cluster, with iptables you can block 5404
>and 5405 ports
>
>2016-02-14 14:09 GMT+01:00 Debabrata Pani <[email protected]>:
>> Hi,
>> We ran into some problems when we pull down the ethernet interface using
>> ³ifconfig eth0 down²
>>
>> Our cluster has the following configurations and resources
>>
>> Two  network interfaces : eth0 and lo(cal)
>> 3 nodes with one node put in maintenance mode
>> No-quorum-policy=stop
>> Stonith-enabled=false
>> Postgresql Master/Slave
>> vip master and vip replication IPs
>> VIPs will run on the node where Postgresql Master is running
>>
>>
>> Two test cases that we executed are as follows
>>
>> Introduce delay in the ethernet interface o f the postgresql PRIMARY
>>node
>> (Command  : tc qdisc add dev eth0 root netem delay 8000ms)
>> `Ifconfig eth0 down` on the postgresql PRIMARY Node
>> We expected that both these test cases test for network problems in the
>> cluster
>>
>>
>> In the first case (ethernet interface delay)
>>
>> Cluster is divided into ³partition WITH quorum² and ³partition WITHOUT
>> quorum²
>> Partition WITHOUT quorum shuts down all the services
>> Partition WITH quorum takes over as Postgresql PRIMARY and VIPs
>> Everything as expected. Wow !
>>
>>
>> In the second case (ethernet interface down)
>>
>> We see lots of errors like the following . On the node
>>
>> Feb 12 14:09:48 corosync [MAIN  ] Totem is unable to form a cluster
>>because
>> of an operating system or network fault. The most common cause of this
>> message is that the local firewall is configured improperly.
>> Feb 12 14:09:49 corosync [MAIN  ] Totem is unable to form a cluster
>>because
>> of an operating system or network fault. The most common cause of this
>> message is that the local firewall is configured improperly.
>> Feb 12 14:09:51 corosync [MAIN  ] Totem is unable to form a cluster
>>because
>> of an operating system or network fault. The most common cause of this
>> message is that the local firewall is configured improperly.
>>
>> But the `crm_mon Afr` (from the node whose eth0 is down)  always shows
>>the
>> cluster to be fully formed.
>>
>> It shows all the nodes as UP
>> It shows itself as the one running the postgresql PRIMARY  (as was the
>>case
>> before putting the ethernet interface is down)
>>
>> `crm_mon -Afr` on the OTHER nodes show a different story
>>
>> They show the other node as down
>> One of the other two nodes takes over the postgresql PRIMARY
>>
>> This leads to a split brain situation which was gracefully avoided in
>>the
>> test case where only ³delay is introduced into the interface²
>>
>>
>> Questions :
>>
>>  Is it a known issue with pacemaker when the ethernet interface is
>>pulled
>> down ?
>> Is it an incorrect way of testing the cluster ? There is some
>>information
>> regarding the same in this thread
>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/59738
>>
>>
>> Regards,
>> Deba
>>
>>
>> _______________________________________________
>> Users mailing list: [email protected]
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
>-- 
>  .~.
>  /V\
> //  \\
>/(   )\
>^`~'^
>
>_______________________________________________
>Users mailing list: [email protected]
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker issue when ethernet interface is pulled down

Reply via email to