Re: [ClusterLabs] Tuchanka

2020-10-05 Thread Олег Самойлов
> On 2 Oct 2020, at 17:19, Klaus Wenninger wrote: > >>> My english is poor, I'll try to find other words. My primary and main task >>> was to create a prototype for an automatic deploy system. So I used only the >>> same technique that will be used on the real hardware servers: RedHat dvd

Re: [ClusterLabs] Tuchanka

2020-10-02 Thread Олег Самойлов
> On 29 Sep 2020, at 11:34, Jehan-Guillaume de Rorthais wrote: > > > Vagrant use virtualbox by default, which supports softdog, but it support many > other virtualization plateform, including eg. libvirt/kvm where you can use > virtualized watchdog card. > >> > > Vagrant can use Chef,

Re: [ClusterLabs] Tuchanka

2020-09-25 Thread Олег Самойлов
Sorry for the late reply. I was on leave and after this some problems at my work. > On 3 Sep 2020, at 17:23, Jehan-Guillaume de Rorthais wrote: > > Hi, > > Thanks for sharing. > > I had a very quick glance at your project. I wonder if you were aware of some > existing projects/scripts that

[ClusterLabs] Tuchanka

2020-09-02 Thread Олег Самойлов
Hi all. I have developed a test bed to test high available clusters based on Pacemaker and PostgreSQL. The combination of words "test bed" was given to me by a dictionary. For an russian this is rather funny, so, please, tell me is this suitable phrase for this? The test bed is deployed on

[ClusterLabs] multi-site clusters vs disaster recovery clusters

2020-02-05 Thread Олег Самойлов
Hi all. I am reading the documentation about new (for me) pacemaker, which came with RedHat 8. And I see two different chapters, which both tried to solve exactly the same problem. One is CONFIGURING DISASTER RECOVERY CLUSTERS (pcs dr): This is about infrastructure to create two different

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-13 Thread Олег Самойлов
> 13 авг. 2019 г., в 15:55, Jan Friesse написал(а): > > There is going to be slightly different solution (set this timeouts based on > corosync token timeout) which I'm working on, but it's kind of huge amount of > work and not super high prio (workaround exists), so no ETA yet. Is it will

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-13 Thread Олег Самойлов
> 12 авг. 2019 г., в 8:46, Jan Friesse написал(а): > > Let me try to bring some light in there: > > - dpd_interval is qnetd variable how often qnetd walks thru the list of all > clients (qdevices) and checks timestamp of last sent message. If diff between > current timestamp and last sent

Re: [ClusterLabs] Strange lost quorum with qdevice

2019-08-09 Thread Олег Самойлов
> 9 авг. 2019 г., в 9:25, Jan Friesse написал(а): > Please do not set dpd_interval that high. dpd_interval on qnetd side is not > about how often is the ping is sent. Could you please retry your test with > dpd_interval=1000? I'm pretty sure it will work then. > > Honza Yep. As far as I

[ClusterLabs] Strange lost quorum with qdevice

2019-08-08 Thread Олег Самойлов
Hello all. I have a test bed with several virtual machines to test pacemaker. I simulate random failure on one of the node. The cluster will be on several data centres, so there is not stonith device, instead I use qnetd on the third data centre and watchdog (softdog). And sometimes (not

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Олег Самойлов
> 30 апр. 2019 г., в 19:38, Andrei Borzenkov написал(а): > > 30.04.2019 19:34, Олег Самойлов пишет: >> >>> No. I simply want reliable way to shutdown the whole cluster (for >>> maintenance). >> >> Official way is `pcs cluster stop --all`. > &

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Олег Самойлов
> No. I simply want reliable way to shutdown the whole cluster (for > maintenance). Official way is `pcs cluster stop --all`. But it’s not always worked as expected for me. ___ Manage your subscription:

Re: [ClusterLabs] How to correctly stop cluster with active stonith watchdog?

2019-04-30 Thread Олег Самойлов
May be you will be interesting in `allow_downscale: 1` option https://www.systutorials.com/docs/linux/man/5-votequorum/ > 30 апр. 2019 г., в 7:07, Andrei Borzenkov написал(а): > > As soon as majority of nodes are stopped, the remaining nodes are out of > quorum and watchdog reboot kicks in. >

Re: [ClusterLabs] SBD as watchdog daemon

2019-04-16 Thread Олег Самойлов
Well, I checked this PR https://github.com/ClusterLabs/sbd/pull/27 from author repository https://github.com/jjd27/sbd/tree/cluster-quorum The problem is still exists. When corosync is frozen on one node, both node are rebooted. Don’t apply this PR. > 16 апр. 2019 г., в 19:13, Klaus Wenninger

Re: [ClusterLabs] SBD as watchdog daemon

2019-04-16 Thread Олег Самойлов
> 16 апр. 2019 г., в 16:21, Klaus Wenninger написал(а): > > On 4/16/19 3:12 PM, Олег Самойлов wrote: >> Okey, looked like I found where it must be fixed. >> >> sbd-cluster.c >> >>/* TODO - Make a CPG call and only call

Re: [ClusterLabs] SBD as watchdog daemon

2019-04-15 Thread Олег Самойлов
> 14 апр. 2019 г., в 10:12, Andrei Borzenkov написал(а): Thanks for explanation, I think this will be good addition to the SBD manual. (SBD manual need in this.) But my problem lies in other plain. I investigated SBD. A common watchdog is a much simple. One infinite loop, checks some tests

[ClusterLabs] SBD as watchdog daemon

2019-04-13 Thread Олег Самойлов
Hi all. I am developing HA PostgreSQL cluster for 2 or 3 datacenters. In case of DataCenter failure (blackout) the fencing will not work and will prevent to switching to working DC. So I disable the fencing. The cluster working is based on a quorum and I added a quorum device on a third DC in

Re: [ClusterLabs] SBD as watchdog daemon

2019-04-12 Thread Олег Самойлов
> 11 апр. 2019 г., в 20:00, Klaus Wenninger написал(а): > > On 4/11/19 5:27 PM, Олег Самойлов wrote: >> Hi all. >> I am developing HA PostgreSQL cluster for 2 or 3 datacenters. In case of >> DataCenter failure (blackout) the fencing will not work and will prevent to

[ClusterLabs] SBD as watchdog daemon

2019-04-11 Thread Олег Самойлов
Hi all. I am developing HA PostgreSQL cluster for 2 or 3 datacenters. In case of DataCenter failure (blackout) the fencing will not work and will prevent to switching to working DC. So I disable the fencing. The cluster working is based on a quorum and I added a quorum device on a third DC in