Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Andrei Borzenkov
05.12.2017 13:34, Gao,Yan пишет:
> On 12/05/2017 08:57 AM, Dejan Muhamedagic wrote:
>> On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote:
>>> 04.12.2017 14:48, Gao,Yan пишет:
 On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:
> 30.11.2017 13:48, Gao,Yan пишет:
>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>>> VM on VSphere using shared VMDK as SBD. During basic tests by
>>> killing
>>> corosync and forcing STONITH pacemaker was not started after reboot.
>>> In logs I see during boot
>>>
>>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>>> just fenced by sapprod01p for sapprod01p
>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>>> process (3151) can no longer be respawned,
>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>>> Pacemaker
>>>
>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems
>>> that
>>> stonith with SBD always takes msgwait (at least, visually host is
>>> not
>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>>> and is up and running long before timeout expires.
>>>
>>> I think I have seen similar report already. Is it something that can
>>> be fixed by SBD/pacemaker tuning?
>> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>>
>
> I tried it (on openSUSE Tumbleweed which is what I have at hand, it
> has
> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
> disk at all.
 It simply waits that long on startup before starting the rest of the
 cluster stack to make sure the fencing that targeted it has
 returned. It
 intentionally doesn't watch anything during this period of time.

>>>
>>> Unfortunately it waits too long.
>>>
>>> ha1:~ # systemctl status sbd.service
>>> ● sbd.service - Shared-storage based fencing daemon
>>>     Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
>>> preset: disabled)
>>>     Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
>>> 4min 16s ago
>>>    Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
>>> status=0/SUCCESS)
>>>    Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
>>> watch (code=killed, signa
>>>   Main PID: 1792 (code=exited, status=0/SUCCESS)
>>>
>>> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
>>> daemon...
>>> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
>>> Terminating.
>>> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
>>> fencing daemon.
>>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
>>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result
>>> 'timeout'.
>>>
>>> But the real problem is - in spite of SBD failed to start, the whole
>>> cluster stack continues to run; and because SBD blindly trusts in well
>>> behaving nodes, fencing appears to succeed after timeout ... without
>>> anyone taking any action on poison pill ...
>>
>> That's something I always wondered about: if a node is capable of
>> reading a poison pill then it could before shutdown also write an
>> "I'm leaving" message into its slot. Wouldn't that make sbd more
>> reliable? Any reason not to implement that?
> Probably it's not considered necessary :) SBD is a fencing mechanism
> which only needs to ensure fencing works.

I'm sorry, but SBD has zero chances to ensure fencing works. Recently I
did storage vMotion of VM with shared VMDK for SBD - it silently created
copy of VMDK which was indistinguishable from original one. As result
both VMs run with own copy. Of course fencing did not work - but each VM
*assumed* it worked because it posted message and waited for timeout ...

I would expect "monitor" action of SBD fencing agent to actually test
whether messages are seen by remote node(s) ...

> SBD on the fencing target is
> either there eating the pill or getting reset by watchdog, otherwise
> it's not there which is supposed to imply the whole cluster stack is not
> running so that it doesn't need to actually eat the pill.
> 
> How systemd should handle the service dependencies is another topic...
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Andrei Borzenkov
05.12.2017 12:59, Gao,Yan пишет:
> On 12/04/2017 07:55 PM, Andrei Borzenkov wrote:
>> 04.12.2017 14:48, Gao,Yan пишет:
>>> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:
 30.11.2017 13:48, Gao,Yan пишет:
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>> Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>

 I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
 SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
 disk at all.
>>> It simply waits that long on startup before starting the rest of the
>>> cluster stack to make sure the fencing that targeted it has returned. It
>>> intentionally doesn't watch anything during this period of time.
>>>
>>
>> Unfortunately it waits too long.
>>
>> ha1:~ # systemctl status sbd.service
>> ● sbd.service - Shared-storage based fencing daemon
>>     Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
>> preset: disabled)
>>     Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
>> 4min 16s ago
>>    Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
>> status=0/SUCCESS)
>>    Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
>> watch (code=killed, signa
>>   Main PID: 1792 (code=exited, status=0/SUCCESS)
>>
>> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
>> daemon...
>> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
>> Terminating.
>> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
>> fencing daemon.
>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result
>> 'timeout'.
>>
>> But the real problem is - in spite of SBD failed to start, the whole
>> cluster stack continues to run; and because SBD blindly trusts in well
>> behaving nodes, fencing appears to succeed after timeout ... without
>> anyone taking any action on poison pill ...
> Start of sbd reaches systemd's timeout for starting units and systemd
> proceeds...
> 

You consider it normal and intended behavior? Again - currently it is
possible that cluster stack starts without having working STONITH and
because there is no confirmation whether stonith via SBD worked at all,
we get into split brain.

> TimeoutStartSec should be configured in sbd.service accordingly to be
> longer than msgwait.
> 

And where is it documented? You did not say it earlier,
/etc/sysconfig/sbd does not say it, "man sbd" does not say it. How
should users be aware about this?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Gao,Yan

On 12/05/2017 08:57 AM, Dejan Muhamedagic wrote:

On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote:

04.12.2017 14:48, Gao,Yan пишет:

On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:

30.11.2017 13:48, Gao,Yan пишет:

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot

Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
Pacemaker

SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.

I think I have seen similar report already. Is it something that can
be fixed by SBD/pacemaker tuning?

SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.



I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
disk at all.

It simply waits that long on startup before starting the rest of the
cluster stack to make sure the fencing that targeted it has returned. It
intentionally doesn't watch anything during this period of time.



Unfortunately it waits too long.

ha1:~ # systemctl status sbd.service
● sbd.service - Shared-storage based fencing daemon
Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
preset: disabled)
Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
4min 16s ago
   Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
status=0/SUCCESS)
   Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
watch (code=killed, signa
  Main PID: 1792 (code=exited, status=0/SUCCESS)

дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
daemon...
дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
Terminating.
дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
fencing daemon.
дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result 'timeout'.

But the real problem is - in spite of SBD failed to start, the whole
cluster stack continues to run; and because SBD blindly trusts in well
behaving nodes, fencing appears to succeed after timeout ... without
anyone taking any action on poison pill ...


That's something I always wondered about: if a node is capable of
reading a poison pill then it could before shutdown also write an
"I'm leaving" message into its slot. Wouldn't that make sbd more
reliable? Any reason not to implement that?
Probably it's not considered necessary :) SBD is a fencing mechanism 
which only needs to ensure fencing works. SBD on the fencing target is 
either there eating the pill or getting reset by watchdog, otherwise 
it's not there which is supposed to imply the whole cluster stack is not 
running so that it doesn't need to actually eat the pill.


How systemd should handle the service dependencies is another topic...

Regards,
  Yan





Thanks,

Dejan

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Gao,Yan

On 12/04/2017 07:55 PM, Andrei Borzenkov wrote:

04.12.2017 14:48, Gao,Yan пишет:

On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:

30.11.2017 13:48, Gao,Yan пишет:

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot

Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
Pacemaker

SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.

I think I have seen similar report already. Is it something that can
be fixed by SBD/pacemaker tuning?

SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.



I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
disk at all.

It simply waits that long on startup before starting the rest of the
cluster stack to make sure the fencing that targeted it has returned. It
intentionally doesn't watch anything during this period of time.



Unfortunately it waits too long.

ha1:~ # systemctl status sbd.service
● sbd.service - Shared-storage based fencing daemon
Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
preset: disabled)
Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
4min 16s ago
   Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
status=0/SUCCESS)
   Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
watch (code=killed, signa
  Main PID: 1792 (code=exited, status=0/SUCCESS)

дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
daemon...
дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
Terminating.
дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
fencing daemon.
дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result 'timeout'.

But the real problem is - in spite of SBD failed to start, the whole
cluster stack continues to run; and because SBD blindly trusts in well
behaving nodes, fencing appears to succeed after timeout ... without
anyone taking any action on poison pill ...
Start of sbd reaches systemd's timeout for starting units and systemd 
proceeds...


TimeoutStartSec should be configured in sbd.service accordingly to be 
longer than msgwait.


Regards,
  Yan




ha1:~ # systemctl show sbd.service -p RequiredBy
RequiredBy=corosync.service

but

ha1:~ # systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; static;
vendor preset: disabled)
Active: active (running) since Mon 2017-12-04 21:45:33 MSK; 7min ago
  Docs: man:corosync
man:corosync.conf
man:corosync_overview
   Process: 1860 ExecStop=/usr/share/corosync/corosync stop (code=exited,
status=0/SUCCESS)
   Process: 2059 ExecStart=/usr/share/corosync/corosync start
(code=exited, status=0/SUCCESS)
  Main PID: 2073 (corosync)
 Tasks: 2 (limit: 4915)
CGroup: /system.slice/corosync.service
└─2073 corosync

and

ha1:~ # crm_mon -1r
Stack: corosync
Current DC: ha1 (version 1.1.17-3.3-36d2962a8) - partition with quorum
Last updated: Mon Dec  4 21:53:24 2017
Last change: Mon Dec  4 21:47:25 2017 by hacluster via crmd on ha1

2 nodes configured
1 resource configured

Online: [ ha1 ha2 ]

Full list of resources:

  stonith-sbd   (stonith:external/sbd): Started ha1

and if I now sever connection between two nodes I will get two single
node clusters each believing it won ...

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-05 Thread Dejan Muhamedagic
On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote:
> 04.12.2017 14:48, Gao,Yan пишет:
> > On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:
> >> 30.11.2017 13:48, Gao,Yan пишет:
> >>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>  SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>  VM on VSphere using shared VMDK as SBD. During basic tests by killing
>  corosync and forcing STONITH pacemaker was not started after reboot.
>  In logs I see during boot
> 
>  Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>  just fenced by sapprod01p for sapprod01p
>  Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>  process (3151) can no longer be respawned,
>  Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>  Pacemaker
> 
>  SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>  stonith with SBD always takes msgwait (at least, visually host is not
>  declared as OFFLINE until 120s passed). But VM rebots lightning fast
>  and is up and running long before timeout expires.
> 
>  I think I have seen similar report already. Is it something that can
>  be fixed by SBD/pacemaker tuning?
> >>> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
> >>>
> >>
> >> I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
> >> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
> >> disk at all. 
> > It simply waits that long on startup before starting the rest of the
> > cluster stack to make sure the fencing that targeted it has returned. It
> > intentionally doesn't watch anything during this period of time.
> > 
> 
> Unfortunately it waits too long.
> 
> ha1:~ # systemctl status sbd.service
> ● sbd.service - Shared-storage based fencing daemon
>Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
> preset: disabled)
>Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
> 4min 16s ago
>   Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
> status=0/SUCCESS)
>   Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
> watch (code=killed, signa
>  Main PID: 1792 (code=exited, status=0/SUCCESS)
> 
> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
> daemon...
> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
> Terminating.
> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
> fencing daemon.
> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result 'timeout'.
> 
> But the real problem is - in spite of SBD failed to start, the whole
> cluster stack continues to run; and because SBD blindly trusts in well
> behaving nodes, fencing appears to succeed after timeout ... without
> anyone taking any action on poison pill ...

That's something I always wondered about: if a node is capable of
reading a poison pill then it could before shutdown also write an
"I'm leaving" message into its slot. Wouldn't that make sbd more
reliable? Any reason not to implement that?

Thanks,

Dejan

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Andrei Borzenkov
04.12.2017 14:48, Gao,Yan пишет:
> On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:
>> 30.11.2017 13:48, Gao,Yan пишет:
>>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
 SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
 VM on VSphere using shared VMDK as SBD. During basic tests by killing
 corosync and forcing STONITH pacemaker was not started after reboot.
 In logs I see during boot

 Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
 just fenced by sapprod01p for sapprod01p
 Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
 process (3151) can no longer be respawned,
 Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
 Pacemaker

 SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
 stonith with SBD always takes msgwait (at least, visually host is not
 declared as OFFLINE until 120s passed). But VM rebots lightning fast
 and is up and running long before timeout expires.

 I think I have seen similar report already. Is it something that can
 be fixed by SBD/pacemaker tuning?
>>> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>>>
>>
>> I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
>> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
>> disk at all. 
> It simply waits that long on startup before starting the rest of the
> cluster stack to make sure the fencing that targeted it has returned. It
> intentionally doesn't watch anything during this period of time.
> 

Unfortunately it waits too long.

ha1:~ # systemctl status sbd.service
● sbd.service - Shared-storage based fencing daemon
   Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
preset: disabled)
   Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
4min 16s ago
  Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
status=0/SUCCESS)
  Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
watch (code=killed, signa
 Main PID: 1792 (code=exited, status=0/SUCCESS)

дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
daemon...
дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
Terminating.
дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
fencing daemon.
дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result 'timeout'.

But the real problem is - in spite of SBD failed to start, the whole
cluster stack continues to run; and because SBD blindly trusts in well
behaving nodes, fencing appears to succeed after timeout ... without
anyone taking any action on poison pill ...

ha1:~ # systemctl show sbd.service -p RequiredBy
RequiredBy=corosync.service

but

ha1:~ # systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; static;
vendor preset: disabled)
   Active: active (running) since Mon 2017-12-04 21:45:33 MSK; 7min ago
 Docs: man:corosync
   man:corosync.conf
   man:corosync_overview
  Process: 1860 ExecStop=/usr/share/corosync/corosync stop (code=exited,
status=0/SUCCESS)
  Process: 2059 ExecStart=/usr/share/corosync/corosync start
(code=exited, status=0/SUCCESS)
 Main PID: 2073 (corosync)
Tasks: 2 (limit: 4915)
   CGroup: /system.slice/corosync.service
   └─2073 corosync

and

ha1:~ # crm_mon -1r
Stack: corosync
Current DC: ha1 (version 1.1.17-3.3-36d2962a8) - partition with quorum
Last updated: Mon Dec  4 21:53:24 2017
Last change: Mon Dec  4 21:47:25 2017 by hacluster via crmd on ha1

2 nodes configured
1 resource configured

Online: [ ha1 ha2 ]

Full list of resources:

 stonith-sbd(stonith:external/sbd): Started ha1

and if I now sever connection between two nodes I will get two single
node clusters each believing it won ...

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-04 Thread Gao,Yan

On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:

30.11.2017 13:48, Gao,Yan пишет:

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot

Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
Pacemaker

SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.

I think I have seen similar report already. Is it something that can
be fixed by SBD/pacemaker tuning?

SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.



I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
disk at all. 
It simply waits that long on startup before starting the rest of the 
cluster stack to make sure the fencing that targeted it has returned. It 
intentionally doesn't watch anything during this period of time.


Regards,
  Yan



First, at startup no slot is allocated for a node at all
(confirmed with "sbd list"). I manually allocated slots for both nodes,
then I see that stonith agent does post "reboot" message (confirmed with
"sbd list" again) and sbd never reacts to it. Even after system reboot
message on disk is not cleared.

Removing SBD_DELAY_START and restarting pacemaker (with implicit SBD
restart) immediately cleared pending messages.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-12-02 Thread Andrei Borzenkov
30.11.2017 13:48, Gao,Yan пишет:
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>> Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
> 

I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
disk at all. First, at startup no slot is allocated for a node at all
(confirmed with "sbd list"). I manually allocated slots for both nodes,
then I see that stonith agent does post "reboot" message (confirmed with
"sbd list" again) and sbd never reacts to it. Even after system reboot
message on disk is not cleared.

Removing SBD_DELAY_START and restarting pacemaker (with implicit SBD
restart) immediately cleared pending messages.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:48 PM, Gao,Yan  wrote:
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>>
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>> Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
>
> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>

Sounds promising. Is it enough? Comment in /etc/sysconfig/sbd says
"Whether to delay after starting sbd on boot for "msgwait" seconds.",
but as I understand, stonith agent timeout is 2 * msgwait.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-30 Thread Gao,Yan

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot

Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down Pacemaker

SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.

I think I have seen similar report already. Is it something that can
be fixed by SBD/pacemaker tuning?

SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.

Regards,
  Yan



I can provide full logs tomorrow if needed.

TIA

-andrei

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-26 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет:
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down 
>> Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
> Don't know it from sbd but have seen where fencing using
> the cycle-method with machines that boot quickly leads to
> strange behavior.
> If you configure sbd to not clear the disk-slot on startup
> (SBD_START_MODE=clean) it should be left to the other
> side to do that which should prevent the other node from
> coming up while the one fencing is still waiting. You might
> set the method from cycle to off/on to make the fencing
> side clean the slot.
> 
>>
>> I can provide full logs tomorrow if needed.
> Yes would be interesting to see more ...
> 

crm_report attached (it's from different trivial test cluster). Actually
I can reliably reproduce it as long as node is rebooted and pacemaker is
started before stonith agent confirmed node kill.

Unfortunately in case of SBD I cannot set stonith timeout too low as we
need to account for possible storage path failover.


hb_report-Sun-26-Nov-2017.tar.bz2
Description: application/bzip
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

2017-11-22 Thread Andrei Borzenkov
22.11.2017 22:45, Klaus Wenninger пишет:
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
> Using a two node cluster with a single shared disk might
> be dangerous if using sbd before 1.3.1. (if pacemaker-watcher
> is enabled a loss of the virtual-disk will make the node
> fall back to quorum  - which doesn't really tell much in case
> of two node clusters - so your disk will possibly become a
> single point of failure - even worse you will get corruption
> if the disk is lost - the side that is still able to write to the
> disk will think it has fenced the other while that doesn't see
> the poison-pill but is still happy having quorum due to the
> two node corosync feature)
>>

Given one single external shared storage array is there much advantages
in adding more devices? I just followed SUSE best practices paper and
documentation:

One Device
The most simple implementation. It is appropriate for clusters where all
of your data is on the same shared storage.

https://www.suse.com/docrep/documents/crfn7g3wji/sap_hana_sr_cost_optimized_scenario_12_sp1.pdf

(cluster is configured basically as in the latter link, names adjusted).

I suppose, VSphere adds some possible source of corruption so having
several devices across different datastores may be considered.
Unfortunately I had no response to my general question about SBD in
virtual environment so it probably not that common ... :)

>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down 
>> Pacemaker
>>
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
>>
>> I think I have seen similar report already. Is it something that can
>> be fixed by SBD/pacemaker tuning?
> Don't know it from sbd but have seen where fencing using
> the cycle-method with machines that boot quickly leads to
> strange behavior.
> If you configure sbd to not clear the disk-slot on startup
> (SBD_START_MODE=clean) it should be left to the other
> side to do that which should prevent the other node from
> coming up while the one fencing is still waiting. 

That's what happens already and that I would like to (be able to) avoid.

> You might
> set the method from cycle to off/on to make the fencing
> side clean the slot.
> 

Hmm ... but what would power on system which is self powered off by SBD?

Also this is not clear from SBD documentation - does it behave
differently when stonith is set to reboot or power cycle?

>>
>> I can provide full logs tomorrow if needed.
> Yes would be interesting to see more ...
> 

OK, today I setup another cluster, will see if I get the same behavior
and collect logs then.

> If what I'm writing doesn't make too much sense
> to you this might be due to me not really knowing
> how sbd is configured with SLES ;-)
> 

It does make all sort of sense, just I'm not so deep in that stuff.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org