[ClusterLabs] New status reporting for starting/stopping resources in 1.1.19-8.el7

2019-08-30 Thread Chris Walker
Hello, The 1.1.19-8 EL7 version of Pacemaker contains a commit ‘Feature: crmd: default record-pending to TRUE’ that is not in the ClusterLabs Github repo. This commit changes the reporting for resources that are in the process of starting and stopping for (at least) crm_mon and crm_resource

Re: [ClusterLabs] why is node fenced ?

2019-08-12 Thread Chris Walker
When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for example, Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification: Quorum retained | membership=1320 members=1 after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed as

Re: [ClusterLabs] [EXTERNAL] Re: "node is unclean" leads to gratuitous reboot

2019-07-11 Thread Chris Walker
On 7/11/19 6:52 AM, Users wrote: On Thu, Jul 11, 2019 at 12:58 PM Lars Ellenberg wrote: On Wed, Jul 10, 2019 at 06:15:56PM +, Michael Powell wrote: Thanks to you and Andrei for your responses. In our particular situation, we want to be able to operate

Re: [ClusterLabs] HA domain controller fences newly joined node after fence_ipmilan delay even if transition was aborted.

2018-12-18 Thread Chris Walker
Looks like rhino66-left was scheduled for fencing because it was not present 20 seconds (the dc-deadtime parameter) after rhino66-right started Pacemaker (startup fencing). I can think of a couple of ways to allow all nodes to survive if they come up far apart in time (i.e., father apart than

[ClusterLabs] short circuiting the corosync token timeout

2018-08-10 Thread Chris Walker
Hello, Before Pacemaker can declare a node as 'offline', the Corosync layer must first declare that the node is no longer part of the cluster after waiting a full token timeout.  For example, if I manually STONITH a node with 'crm -F node fence node2', even if the fence operation happens

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-26 Thread Chris Walker
: crmd requests STONITH stonith-ng successfully STONITHs node corosync communicates membership change to stonith-ng stonith-ng communicates successful STONITH to crmd cluster reacts to down node Thanks, Chris On Wed, Apr 5, 2017 at 5:07 PM, Chris Walker <christopher.wal...@gmail.com> wrote: &g

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Chris Walker
Thanks for your reply Digimer. On Mon, Mar 13, 2017 at 1:35 PM, Digimer <li...@alteeve.ca> wrote: > On 13/03/17 12:07 PM, Chris Walker wrote: > > Hello, > > > > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: > > 2.4.0-4; libqb: 1.0-1), >

[ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Chris Walker
Hello, On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync: 2.4.0-4; libqb: 1.0-1), it looks like successful STONITH operations are not communicated from stonith-ng back to theinitiator (in this case, crmd) until the STONITHed node is removed from the cluster when Corosync notices

Re: [ClusterLabs] question about dc-deadtime

2017-01-10 Thread Chris Walker
On Mon, Jan 9, 2017 at 6:55 PM, Andrew Beekhof <abeek...@redhat.com> wrote: > On Fri, Dec 16, 2016 at 8:52 AM, Chris Walker > <christopher.wal...@gmail.com> wrote: > > Thanks for your response Ken. I'm puzzled ... in my case node remain > > UNCLEAN (offline) until

Re: [ClusterLabs] question about dc-deadtime

2016-12-15 Thread Chris Walker
15, 2016 at 3:26 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 12/15/2016 02:00 PM, Chris Walker wrote: > > Hello, > > > > I have a quick question about dc-deadtime. I believe that Digimer and > > others on this list might have already addressed this, b

[ClusterLabs] question about dc-deadtime

2016-12-15 Thread Chris Walker
Hello, I have a quick question about dc-deadtime. I believe that Digimer and others on this list might have already addressed this, but I want to make sure I'm not missing something. If my understanding is correct, dc-deadtime sets the amount of time that must elapse before a cluster is formed