Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: It looks like some timing issue or race condition. After reboot node manages to contact qnetd first, before connection to other node is

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Jehan-Guillaume de Rorthais
On Tue, 7 Apr 2020 14:13:35 -0400 Sherrard Burton wrote: > On 4/7/20 1:16 PM, Andrei Borzenkov wrote: > > 07.04.2020 00:21, Sherrard Burton пишет: > >>> > >>> It looks like some timing issue or race condition. After reboot node > >>> manages to contact qnetd first, before connection to other

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 1:16 PM, Andrei Borzenkov wrote: 07.04.2020 00:21, Sherrard Burton пишет: It looks like some timing issue or race condition. After reboot node manages to contact qnetd first, before connection to other node is established. Qnetd behaves as documented - it sees two equal size

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Andrei Borzenkov
07.04.2020 00:21, Sherrard Burton пишет: >> >> It looks like some timing issue or race condition. After reboot node >> manages to contact qnetd first, before connection to other node is >> established. Qnetd behaves as documented - it sees two equal size >> partitions and favors the partition that

[ClusterLabs] how to properly add/delete qdevice for an existing cluster

2020-04-07 Thread Sherrard Burton
please forgive me if i have overlooked the answer somewhere. i have an existing cluster that is already configured with a qdevice. i now wish to update that configuration to point at a different qdevice. background: for the sake of working through the initial configuration details, tuning,

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
i could not determine which of these sub-threads to include this in, so i am going to (reluctantly) top-post it. i switched the transport to udp, and in limited testing i seem to not be hitting the race condition. of course i have no idea whether this will behave consistently, or which part

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 8:40 AM, Jan Friesse wrote: Sherrard, On 4/7/20 12:53 AM, Strahil Nikolov wrote: Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Strahil, i have actually reduced the qnet timers in order to improve failover response time, per Jan's

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the

Re: [ClusterLabs] Antw: [EXT] Re: temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 2:50 AM, Ulrich Windl wrote: Andrei Borzenkov schrieb am 06.04.2020 um 22:10 in Nachricht <17546_1586203904_5E8B8D00_17546_12_1_73cdd72d-c884-05a4-6c64-2e354912c28f@gmail com>: [...] I cannot reproduce it, but I also do not use knet. From documentation I have impression that

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Jan Friesse
Sherrard, On 4/7/20 12:53 AM, Strahil Nikolov wrote: Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Strahil, i have actually reduced the qnet timers in order to improve failover response time, per Jan's suggestion on the thread '[ClusterLabs] >

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 12:53 AM, Strahil Nikolov wrote: Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Strahil, i have actually reduced the qnet timers in order to improve failover response time, per Jan's suggestion on the thread '[ClusterLabs] reducing

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Jan Friesse
Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the quorum node: ... Apr 05 23:10:17 debug  

[ClusterLabs] Antw: [EXT] Re: temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 06.04.2020 um 22:10 in Nachricht <17546_1586203904_5E8B8D00_17546_12_1_73cdd72d-c884-05a4-6c64-2e354912c28f@gmail com>: [...] > I cannot reproduce it, but I also do not use knet. From documentation I > have impression that knet has artificial delay before it