Re: [ClusterLabs] reducing corosync-qnetd "response time"

2019-10-25 Thread Sherrard Burton
On 10/25/19 3:17 AM, Jan Friesse wrote: Sherrard Burton napsal(a): On 10/24/19 1:30 PM, Andrei Borzenkov wrote: 24.10.2019 16:54, Sherrard Burton пишет: background: we are upgrading a (very) old HA cluster running heartbeat DRBD and NFS, with no stonith, to a much more modern

Re: [ClusterLabs] active/passive resource config

2019-10-25 Thread Sherrard Burton
On 10/25/19 2:03 AM, jyd wrote: Hi:     I want to user pacemaker to mange a resource named A,i want A only started on one node, only when the node is down or A can not started in this node,the A resource will started on other nodes. And config a virtual ip resource for A,the virtual ip

[ClusterLabs] reducing corosync-qnetd "response time"

2019-10-24 Thread Sherrard Burton
background: we are upgrading a (very) old HA cluster running heartbeat DRBD and NFS, with no stonith, to a much more modern implementation. for the existing cluster, as well as the new one, the disk space requirements make running a full three-node cluster infeasible, so i am trying to

[ClusterLabs] reducing corosync-qnetd "response time"

2019-10-24 Thread Sherrard Burton
background: we are upgrading a (very) old HA cluster running heartbeat DRBD and NFS, with no stonith, to a much more modern implementation. for the existing cluster, as well as the new one, the disk space requirements make running a full three-node cluster infeasible, so i am trying to

Re: [ClusterLabs] reducing corosync-qnetd "response time"

2019-10-24 Thread Sherrard Burton
On 10/24/19 1:30 PM, Andrei Borzenkov wrote: 24.10.2019 16:54, Sherrard Burton пишет: background: we are upgrading a (very) old HA cluster running heartbeat DRBD and NFS, with no stonith, to a much more modern implementation. for the existing cluster, as well as the new one, the disk space

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-08 Thread Sherrard Burton
On 4/8/20 1:09 PM, Andrei Borzenkov wrote: 08.04.2020 10:12, Jan Friesse пишет: Sherrard, i could not determine which of these sub-threads to include this in, so i am going to (reluctantly) top-post it. i switched the transport to udp, and in limited testing i seem to not be hitting the

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 8:40 AM, Jan Friesse wrote: Sherrard, On 4/7/20 12:53 AM, Strahil Nikolov wrote: Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Strahil, i have actually reduced the qnet timers in order to improve failover response time, per Jan's

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the quorum node: ... Ap

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from

[ClusterLabs] how to properly add/delete qdevice for an existing cluster

2020-04-07 Thread Sherrard Burton
please forgive me if i have overlooked the answer somewhere. i have an existing cluster that is already configured with a qdevice. i now wish to update that configuration to point at a different qdevice. background: for the sake of working through the initial configuration details, tuning,

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 12:53 AM, Strahil Nikolov wrote: Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Strahil, i have actually reduced the qnet timers in order to improve failover response time, per Jan's suggestion on the thread '[ClusterLabs] reducing

Re: [ClusterLabs] Antw: [EXT] Re: temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 2:50 AM, Ulrich Windl wrote: Andrei Borzenkov schrieb am 06.04.2020 um 22:10 in Nachricht <17546_1586203904_5E8B8D00_17546_12_1_73cdd72d-c884-05a4-6c64-2e354912c28f@gmail com>: [...] I cannot reproduce it, but I also do not use knet. From documentation I have impression that

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 1:16 PM, Andrei Borzenkov wrote: 07.04.2020 00:21, Sherrard Burton пишет: It looks like some timing issue or race condition. After reboot node manages to contact qnetd first, before connection to other node is established. Qnetd behaves as documented - it sees two equal size

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Sherrard Burton
On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: It looks like some timing issue or race condition. After reboot node manages to contact qnetd first, before connection to other node is established

[ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-06 Thread Sherrard Burton
...or at least that's that i think is happening :-) two-node cluster, plus quorum-only node. testing the behavior when active node is gracefully rebooted. all seems well initially. resources are migrated, come up and function as expected. but, when the rebooted node starts to come back up,

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-06 Thread Sherrard Burton
On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the quorum node: ... Apr 05 23:10:17 debug   Client :::192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1) sent quorum node list. Apr 05 23:10:17 debug msg seq num = 6 Apr 05 23:10

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-06 Thread Sherrard Burton
On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the quorum node: ... Apr 05 23:10:17 debug   Client :::192.168.250.50:54462 (cluster xen-nfs01_xen-nfs02, node_id 1) sent quorum node list. Apr 05 23

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-06 Thread Sherrard Burton
On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: ...or at least that's that i think is happening :-) two-node cluster, plus quorum-only node. testing the behavior when active node is gracefully rebooted. all seems well initially. resources are migrated

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-06 Thread Sherrard Burton
On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the quorum node: ... Apr 05 23:10:17 debug   Client ::

Re: [ClusterLabs] qdevice up and running -- but questions

2020-04-12 Thread Sherrard Burton
On 4/11/20 6:52 PM, Eric Robinson wrote: 1. What command can I execute on the qdevice node which tells me which client nodes are connected and alive? i use corosync-qnetd-tool -v -l 2. In the output of the pcs qdevice status command, what is the meaning of…     Vote: 

Re: [ClusterLabs] how to properly add/delete qdevice for an existing cluster

2020-04-08 Thread Sherrard Burton
On 4/8/20 5:13 AM, Jan Friesse wrote: please forgive me if i have overlooked the answer somewhere. i have an existing cluster that is already configured with a qdevice. i now wish to update that configuration to point at a different qdevice. background: for the sake of working through the

Re: [ClusterLabs] Antw: [EXT] Re: temporary loss of quorum when member starts to rejoin

2020-04-08 Thread Sherrard Burton
On 4/8/20 3:09 AM, Ulrich Windl wrote: Jehan-Guillaume de Rorthais schrieb am 07.04.2020 um 23:02 in Nachricht <20140_1586293370_5E8CEA7A_20140_452_1_20200407230230.1bc9b7b0@firost>: On Tue, 7 Apr 2020 14:13:35 -0400 Sherrard Burton wrote: [...] But the best protection is to d