Re: [ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

Somanath Jeeva Tue, 25 Jun 2019 04:13:42 -0700


With Regards
Somanath Thilak J

-----Original Message-----
From: Jan Friesse <[email protected]> 
Sent: Monday, June 24, 2019 12:23
To: Cluster Labs - All topics related to open-source clustering welcomed 
<[email protected]>; Somanath Jeeva <[email protected]>
Subject: Re: [ClusterLabs] Two node cluster goes into split brain scenario 
during CPU intensive tasks

Somanath,

> > Hi All,
> > 
> > I have a two node cluster with multicast (udp) transport . The multicast IP 
> > used in 224.1.1.1 .

>Would you mind to give a try to UDPU (unicast)? For two node cluster there is 
>going to be no difference in terms of speed/throughput.

Sure we will try with UPDU.

> > 
> > Whenever there is a CPU intensive task the pcs cluster goes into split 
> > brain scenario and doesn't recover automatically . We have to do a manual 
> > restart of services to bring both nodes online again. 
Before the nodes goes into split brain , the corosync log shows ,
> > 
> > May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 7c 
> > 7e May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit List: 
> > 7c 7e May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] Retransmit 
> > List: 7c 7e May 24 15:10:02 server1 corosync[4745]:  [TOTEM ] 
> > Retransmit List: 7c 7e May 24 15:10:02 server1 corosync[4745]:  [TOTEM 
> > ] Retransmit List: 7c 7e

> This is usually happening when:
> - multicast is somehow rate-limited on switch side (configuration/bad switch 
> implementation/...)
> - MTU of network is smaller than 1500 bytes and fragmentation is not allowed 
> -> try reduce totem.netmtu

I tried reducing the value of netmtu to up to 500, but the issue still occurs.

> Regards,

> Honza


> > May 24 15:51:42 server1 corosync[4745]:  [TOTEM ] A processor failed, 
> > forming new configuration.
> > May 24 16:41:42 server1 corosync[4745]:  [TOTEM ] A new membership 
> > (10.241.31.12:29276) was formed. Members left: 1 May 24 16:41:42 
> > server1 corosync[4745]:  [TOTEM ] Failed to receive the leave message. 
> > failed: 1
> > 
> > Is there any way we can overcome this or this may be due to any multicast 
> > issues in the network side.
> > 
> > With Regards
> > Somanath Thilak J
> > 
> > 
> > 
> >
> >
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://protect2.fireeye.com/url?k=d10c1078-8d86c5cc-d10c50e3-866a015d
> > d3d5-960763fb3b1ee55e&q=1&u=https%3A%2F%2Flists.clusterlabs.org%2Fmail
> > man%2Flistinfo%2Fusers
> > 
> > ClusterLabs home: 
> > https://protect2.fireeye.com/url?k=6922cdde-35a8186a-69228d45-866a015d
> > d3d5-41cd9af6268b57bb&q=1&u=https%3A%2F%2Fwww.clusterlabs.org%2F
>  >

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Two node cluster goes into split brain scenario during CPU intensive tasks

Reply via email to