Re: [ClusterLabs] Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-06 Thread Ken Gaillot
On 02/06/2017 03:28 AM, Ulrich Windl wrote:
 RaSca  schrieb am 03.02.2017 um 14:00 in
> Nachricht
> <0de64981-904f-5bdb-c98f-9c59ee47b...@miamammausalinux.org>:
> 
>> On 03/02/2017 11:06, Ferenc Wágner wrote:
>>> Ken Gaillot  writes:
>>>
 On 01/10/2017 04:24 AM, Stefan Schloesser wrote:

> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
> seems to be working ok including the STONITH.
> For test purposes I issued a "pkill -f pace" killing all pacemaker
> processes on one node.
>
> Result:
> The node is marked as "pending", all resources stay on it. If I
> manually kill a resource it is not noticed. On the other node a drbd
> "promote" command fails (drbd is still running as master on the first
> node).

 I suspect that, when you kill pacemakerd, systemd respawns it quickly
 enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
 pacemaker".
>>>
>>> What exactly is "quickly enough"?
>>
>> What Ken is saying is that Pacemaker, as a service managed by systemd,
>> have in its service definition file
>> (/usr/lib/systemd/system/pacemaker.service) this option:
>>
>> Restart=on-failure
>>
>> Looking at [1] it is explained: systemd restarts immediately the process
>> if it ends for some unexpected reason (like a forced kill).
> 
> Isn't the question: Is crmd a process that is expected to die (and thus need
> restarting)? Or wouldn't one prefer to debug this situation. I fear that
> restarting it might just cover some fatal failure...

If crmd or corosync dies, the node will be fenced (if fencing is enabled
and working). If one of the crmd's persistent connections (such as to
the cib) fails, it will exit, so it ends up the same. But the other
daemons (such as pacemakerd or attrd) can die and respawn without any
risk to services.

The failure will be logged, but it will not be reported in cluster
status, so there is a chance of not noticing it.

> 
>>
>> [1] https://www.freedesktop.org/software/systemd/man/systemd.service.html 
>>
>> -- 
>> RaSca
>> ra...@miamammausalinux.org 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-06 Thread Ulrich Windl
>>> RaSca  schrieb am 03.02.2017 um 14:00 in
Nachricht
<0de64981-904f-5bdb-c98f-9c59ee47b...@miamammausalinux.org>:

> On 03/02/2017 11:06, Ferenc Wágner wrote:
>> Ken Gaillot  writes:
>> 
>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
>>>
 I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
 seems to be working ok including the STONITH.
 For test purposes I issued a "pkill -f pace" killing all pacemaker
 processes on one node.

 Result:
 The node is marked as "pending", all resources stay on it. If I
 manually kill a resource it is not noticed. On the other node a drbd
 "promote" command fails (drbd is still running as master on the first
 node).
>>>
>>> I suspect that, when you kill pacemakerd, systemd respawns it quickly
>>> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
>>> pacemaker".
>> 
>> What exactly is "quickly enough"?
> 
> What Ken is saying is that Pacemaker, as a service managed by systemd,
> have in its service definition file
> (/usr/lib/systemd/system/pacemaker.service) this option:
> 
> Restart=on-failure
> 
> Looking at [1] it is explained: systemd restarts immediately the process
> if it ends for some unexpected reason (like a forced kill).

Isn't the question: Is crmd a process that is expected to die (and thus need
restarting)? Or wouldn't one prefer to debug this situation. I fear that
restarting it might just cover some fatal failure...

> 
> [1] https://www.freedesktop.org/software/systemd/man/systemd.service.html 
> 
> -- 
> RaSca
> ra...@miamammausalinux.org 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org