Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-09 Thread Олег Самойлов
> 9 авг. 2019 г., в 9:25, Jan Friesse написал(а): > Please do not set dpd_interval that high. dpd_interval on qnetd side is not > about how often is the ping is sent. Could you please retry your test with > dpd_interval=1000? I'm pretty sure it will work then. > > Honza Yep. As far as I

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-09 Thread Yan Gao
On 8/9/19 6:40 PM, Andrei Borzenkov wrote: > 09.08.2019 16:34, Yan Gao пишет: >> Hi, >> >> With disk-less sbd, it's fine to stop cluster service from the cluster >> nodes all at the same time. >> >> But if to stop the nodes one by one, for example with a 3-node cluster, >> after stopping the 2nd

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-09 Thread Andrei Borzenkov
09.08.2019 16:34, Yan Gao пишет: > Hi, > > With disk-less sbd, it's fine to stop cluster service from the cluster > nodes all at the same time. > > But if to stop the nodes one by one, for example with a 3-node cluster, > after stopping the 2nd node, the only remaining node resets itself

[ClusterLabs] Ubuntu 18.04 and corosync-qdevice

2019-08-09 Thread Nickle, Richard
I've built a two-node DRBD cluster with SBD and STONITH, following advice from ClusterLabs, LinBit, Beekhof's blog on SBD. I still cannot get automated failover when I down one of the nodes. I thought that perhaps I needed to have an odd-numbered quorum so I attempted to follow the

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Ken Gaillot
On Fri, 2019-08-09 at 08:19 +, Roger Zhou wrote: > > On 8/9/19 3:39 PM, Jan Friesse wrote: > > Roger Zhou napsal(a): > > > > > > On 8/9/19 2:27 PM, Roger Zhou wrote: > > > > > > > > On 7/29/19 12:24 AM, Andrei Borzenkov wrote: > > > > > corosync.service sets StopWhenUnneded=yes which

[ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-09 Thread Yan Gao
Hi, With disk-less sbd, it's fine to stop cluster service from the cluster nodes all at the same time. But if to stop the nodes one by one, for example with a 3-node cluster, after stopping the 2nd node, the only remaining node resets itself with: Aug 09 14:30:20 opensuse150-1 sbd[1079]:

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Roger Zhou
On 8/9/19 3:39 PM, Jan Friesse wrote: > Roger Zhou napsal(a): >> >> On 8/9/19 2:27 PM, Roger Zhou wrote: >>> >>> On 7/29/19 12:24 AM, Andrei Borzenkov wrote: corosync.service sets StopWhenUnneded=yes which normally stops it when pacemaker is shut down. >> >> One more thought, >> >>

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-09 Thread Jan Friesse
Andrei Borzenkov napsal(a): On Fri, Aug 9, 2019 at 9:25 AM Jan Friesse wrote: Олег Самойлов napsal(a): Hello all. I have a test bed with several virtual machines to test pacemaker. I simulate random failure on one of the node. The cluster will be on several data centres, so there is not

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Jan Friesse
Roger Zhou napsal(a): On 8/9/19 2:27 PM, Roger Zhou wrote: On 7/29/19 12:24 AM, Andrei Borzenkov wrote: corosync.service sets StopWhenUnneded=yes which normally stops it when pacemaker is shut down. One more thought, Make sense to add "RefuseManualStop=true" to pacemaker.service? The same

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Roger Zhou
On 8/9/19 2:27 PM, Roger Zhou wrote: > > On 7/29/19 12:24 AM, Andrei Borzenkov wrote: >> corosync.service sets StopWhenUnneded=yes which normally stops it when >> pacemaker is shut down. One more thought, Make sense to add "RefuseManualStop=true" to pacemaker.service? The same for

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-09 Thread Andrei Borzenkov
On Fri, Aug 9, 2019 at 9:25 AM Jan Friesse wrote: > > Олег Самойлов napsal(a): > > Hello all. > > > > I have a test bed with several virtual machines to test pacemaker. I > > simulate random failure on one of the node. The cluster will be on several > > data centres, so there is not stonith

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Roger Zhou
On 7/29/19 12:24 AM, Andrei Borzenkov wrote: > corosync.service sets StopWhenUnneded=yes which normally stops it when > pacemaker is shut down. `systemctl stop corosync.service` is the right command to stop those cluster stack. It stops pacemaker and corosync-qdevice first, and stop SBD too.

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-09 Thread Jan Friesse
Олег Самойлов napsal(a): Hello all. I have a test bed with several virtual machines to test pacemaker. I simulate random failure on one of the node. The cluster will be on several data centres, so there is not stonith device, instead I use qnetd on the third data centre and watchdog