Re: [ClusterLabs] Problems with master/slave failovers

2019-07-04 Thread Harvey Shepherd
I would tend to agree with you on this matter Andrei. To me it makes more sense 
for Pacemaker to prioritise maintaining a master over restarting a failed 
resource. If master scores are set in a sensible manner, the promoted master 
would immediately be given a high score and hence the other instance coming 
back online later would not cause a second failover. It becomes more difficult 
to maintain desired master scores if things depend on how long it takes for the 
failed resource to restart, and it then becomes a matter of timing as to 
whether or not Pacemaker causes the resource to failover or just restart and 
promote on the failed node. I'm pretty sure that's why I've been seeing the 
behaviour I've reported.

Regards,
Harvey


From: Users  on behalf of Andrei Borzenkov 

Sent: Wednesday, 3 July 2019 8:59 p.m.
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers

On Wed, Jul 3, 2019 at 12:59 AM Ken Gaillot  wrote:
>
> On Mon, 2019-07-01 at 23:30 +, Harvey Shepherd wrote:
> > > The "transition summary" is just a resource-by-resource list, not
> > > the
> > > order things will be done. The "executing cluster transition"
> > > section
> > > is the order things are being done.
> >
> > Thanks Ken. I think that's where the problem is originating. If you
> > look at the "executing cluster transition" section, it's actually
> > restarting the failed king instance BEFORE promoting the remaining
> > in-service slave. When the failed resource comes back online, that
> > adjusts the master scores, resulting in the transition being aborted.
> > Both nodes then end up having the same master score for the king
> > resource, and Pacemaker decides to re-promote the original master. I
> > would have expected Pacemaker's priority to be to ensure that there
> > was a master available first, then to restart the failed instance in
> > slave mode. Is there a way to configure it to do that?
>
> No, that's intentional behavior. Starts are done before promotes so
> that promotion scores are in their final state before ultimately
> choosing the master.

There are applications that take tens of minutes to start while
failover is near to instantaneous.  Enforcing slave restart before
promoting means extended period of service unavailability. At the very
least this must be configurable.

> Otherwise, you'd end up in the same final
> situation, but the master would fail over first then fail back.
>

Now, really - while of course resource agent is free to throw dice to
decide master scores in real life in all cases I am familiar with
master score is decided by underlying application state. If agent
comes up and sees another instance running as master, it is highly
unlikely that agent will voluntarily force master away. And if it
happens I'd say agent is buggy and it is not pacemaker job to work
around it.

> It's up to the agent to set master scores in whatever fashion it
> considers ideal.
>

Except pacemaker makes started resource prerequisite for it. In real
life it may not even be possible to start former master before
re-configuring it.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Problems with master/slave failovers

2019-07-03 Thread Andrei Borzenkov
On Wed, Jul 3, 2019 at 12:59 AM Ken Gaillot  wrote:
>
> On Mon, 2019-07-01 at 23:30 +, Harvey Shepherd wrote:
> > > The "transition summary" is just a resource-by-resource list, not
> > > the
> > > order things will be done. The "executing cluster transition"
> > > section
> > > is the order things are being done.
> >
> > Thanks Ken. I think that's where the problem is originating. If you
> > look at the "executing cluster transition" section, it's actually
> > restarting the failed king instance BEFORE promoting the remaining
> > in-service slave. When the failed resource comes back online, that
> > adjusts the master scores, resulting in the transition being aborted.
> > Both nodes then end up having the same master score for the king
> > resource, and Pacemaker decides to re-promote the original master. I
> > would have expected Pacemaker's priority to be to ensure that there
> > was a master available first, then to restart the failed instance in
> > slave mode. Is there a way to configure it to do that?
>
> No, that's intentional behavior. Starts are done before promotes so
> that promotion scores are in their final state before ultimately
> choosing the master.

There are applications that take tens of minutes to start while
failover is near to instantaneous.  Enforcing slave restart before
promoting means extended period of service unavailability. At the very
least this must be configurable.

> Otherwise, you'd end up in the same final
> situation, but the master would fail over first then fail back.
>

Now, really - while of course resource agent is free to throw dice to
decide master scores in real life in all cases I am familiar with
master score is decided by underlying application state. If agent
comes up and sees another instance running as master, it is highly
unlikely that agent will voluntarily force master away. And if it
happens I'd say agent is buggy and it is not pacemaker job to work
around it.

> It's up to the agent to set master scores in whatever fashion it
> considers ideal.
>

Except pacemaker makes started resource prerequisite for it. In real
life it may not even be possible to start former master before
re-configuring it.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Problems with master/slave failovers

2019-07-02 Thread Ken Gaillot
On Mon, 2019-07-01 at 23:30 +, Harvey Shepherd wrote:
> > The "transition summary" is just a resource-by-resource list, not
> > the
> > order things will be done. The "executing cluster transition"
> > section
> > is the order things are being done.
> 
> Thanks Ken. I think that's where the problem is originating. If you
> look at the "executing cluster transition" section, it's actually
> restarting the failed king instance BEFORE promoting the remaining
> in-service slave. When the failed resource comes back online, that
> adjusts the master scores, resulting in the transition being aborted.
> Both nodes then end up having the same master score for the king
> resource, and Pacemaker decides to re-promote the original master. I
> would have expected Pacemaker's priority to be to ensure that there
> was a master available first, then to restart the failed instance in
> slave mode. Is there a way to configure it to do that?

No, that's intentional behavior. Starts are done before promotes so
that promotion scores are in their final state before ultimately
choosing the master. Otherwise, you'd end up in the same final
situation, but the master would fail over first then fail back.

It's up to the agent to set master scores in whatever fashion it
considers ideal.

You definitely want to separate the constraints for primitive and clone
dependencies of the king resource. The primitives are currently failing
over before the master has stopped because they're only ordered after
the king resource in any role, and since the slave is active, they can
start there.

> > 
> > Current cluster status:
> > Online: [ primary secondary ]
> > 
> >  stk_shared_ip  (ocf::heartbeat:IPaddr2):   Started secondary
> >  Clone Set: ms_king_resource [king_resource] (promotable)
> >  king_resource  (ocf::aviat:king-resource-ocf):FAILED
> > primary
> >  Slaves: [ secondary ]
> >  Clone Set: ms_servant1 [servant1]
> >  Started: [ primary secondary ]
> >  Clone Set: ms_servant2 [servant2] (promotable)
> >  Masters: [ primary ]
> >  Slaves: [ secondary ]
> >  Clone Set: ms_servant3 [servant3] (promotable)
> >  Masters: [ primary ]
> >  Slaves: [ secondary ]
> >  servant4(lsb:servant4):  Started primary
> >  servant5  (lsb:servant5):Started primary
> >  servant6  (lsb:servant6):Started primary
> >  servant7  (lsb:servant7):  Started primary
> >  servant8  (lsb:servant8):Started primary
> >  Resource Group: servant9_active_disabled
> >  servant9_resource1  (lsb:servant9_resource1):Started
> > primary
> >  servant9_resource2   (lsb:servant9_resource2): Started primary
> >  servant10 (lsb:servant10):   Started primary
> >  servant11 (lsb:servant11):  Started primary
> >  servant12(lsb:servant12):  Started primary
> >  servant13(lsb:servant13):  Started primary
> > 
> > Transition Summary:
> >  * Recoverking_resource:0 ( Slave primary )
> >  * Promoteking_resource:1 ( Slave -> Master secondary )
> >  * Demote servant2:0  (   Master -> Slave primary )
> >  * Promoteservant2:1  ( Slave -> Master secondary )
> >  * Demote servant3:0  (   Master -> Slave primary )
> >  * Promoteservant3:1  ( Slave -> Master secondary )
> >  * Move   servant4 (  primary -> secondary )
> >  * Move   servant5   (  primary -> secondary )
> >  * Move   servant6   (  primary -> secondary )
> >  * Move   servant7   (  primary -> secondary )
> >  * Move   servant8   (  primary -> secondary )
> >  * Move   servant9_resource1   (  primary ->
> > secondary )
> >  * Move   servant9_resource2(  primary -> secondary )
> >  * Move   servant10  (  primary -> secondary )
> >  * Move   servant11  (  primary -> secondary )
> >  * Move   servant12 (  primary -> secondary
> > )
> >  * Move   servant13 (  primary -> secondary )
> > 
> > Executing cluster transition:
> >  * Pseudo action:   ms_king_resource_pre_notify_stop_0
> >  * Pseudo action:   ms_servant2_pre_notify_demote_0
> >  * Resource action: servant3cancel=1 on primary
> >  * Resource action: servant3cancel=11000 on secondary
> >  * Pseudo action:   ms_servant3_pre_notify_demote_0
> >  * Resource action: servant4 stop on primary
> >  * Resource action: servant5   stop on primary
> >  * Resource action: servant6   stop on primary
> >  * Resource action: servant7   stop on primary
> >  * Resource action: servant8   stop on primary
> >  * Pseudo action:   servant9_active_disabled_stop_0
> >  * Resource action: servant9_resource2 stop on primary
> >  * Resource action: servant10  stop on primary
> >  * Resource action: servant11  stop on primary
> >  * Resource action: 

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Harvey Shepherd
I initially thought it was only in one direction, but it actually isn't. It's 
just that occasionally if the timing is just right then the failover manages to 
succeed. Besides, I don't think that has any bearing on why Pacemaker is trying 
to restart the failed resource instance before promoting the slave. 

From: Users  on behalf of Andrei Borzenkov 

Sent: Tuesday, 2 July 2019 3:42 p.m.
To: users@clusterlabs.org
Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers

02.07.2019 2:30, Harvey Shepherd пишет:
>> The "transition summary" is just a resource-by-resource list, not the
>> order things will be done. The "executing cluster transition" section
>> is the order things are being done.
>
> Thanks Ken. I think that's where the problem is originating. If you look at 
> the "executing cluster transition" section, it's actually restarting the 
> failed king instance BEFORE promoting the remaining in-service slave. When 
> the failed resource comes back online, that adjusts the master scores, 
> resulting in the transition being aborted. Both nodes then end up having the 
> same master score for the king resource, and Pacemaker decides to re-promote 
> the original master.

It does not explain why it happens only in one direction. Unless your
resource agent is doing something differently in each case, but that is
something only you can check.

> I would have expected Pacemaker's priority to be to ensure that there was a 
> master available first, then to restart the failed instance in slave mode. Is 
> there a way to configure it to do that?
>
>>
>> Current cluster status:
>> Online: [ primary secondary ]
>>
>>  stk_shared_ip  (ocf::heartbeat:IPaddr2):   Started secondary
>>  Clone Set: ms_king_resource [king_resource] (promotable)
>>  king_resource  (ocf::aviat:king-resource-ocf):FAILED
>> primary
>>  Slaves: [ secondary ]
>>  Clone Set: ms_servant1 [servant1]
>>  Started: [ primary secondary ]
>>  Clone Set: ms_servant2 [servant2] (promotable)
>>  Masters: [ primary ]
>>  Slaves: [ secondary ]
>>  Clone Set: ms_servant3 [servant3] (promotable)
>>  Masters: [ primary ]
>>  Slaves: [ secondary ]
>>  servant4(lsb:servant4):  Started primary
>>  servant5  (lsb:servant5):Started primary
>>  servant6  (lsb:servant6):Started primary
>>  servant7  (lsb:servant7):  Started primary
>>  servant8  (lsb:servant8):Started primary
>>  Resource Group: servant9_active_disabled
>>  servant9_resource1  (lsb:servant9_resource1):Started
>> primary
>>  servant9_resource2   (lsb:servant9_resource2): Started primary
>>  servant10 (lsb:servant10):   Started primary
>>  servant11 (lsb:servant11):  Started primary
>>  servant12(lsb:servant12):  Started primary
>>  servant13(lsb:servant13):  Started primary
>>
>> Transition Summary:
>>  * Recoverking_resource:0 ( Slave primary )
>>  * Promoteking_resource:1 ( Slave -> Master secondary )
>>  * Demote servant2:0  (   Master -> Slave primary )
>>  * Promoteservant2:1  ( Slave -> Master secondary )
>>  * Demote servant3:0  (   Master -> Slave primary )
>>  * Promoteservant3:1  ( Slave -> Master secondary )
>>  * Move   servant4 (  primary -> secondary )
>>  * Move   servant5   (  primary -> secondary )
>>  * Move   servant6   (  primary -> secondary )
>>  * Move   servant7   (  primary -> secondary )
>>  * Move   servant8   (  primary -> secondary )
>>  * Move   servant9_resource1   (  primary ->
>> secondary )
>>  * Move   servant9_resource2(  primary -> secondary )
>>  * Move   servant10  (  primary -> secondary )
>>  * Move   servant11  (  primary -> secondary )
>>  * Move   servant12 (  primary -> secondary
>> )
>>  * Move   servant13 (  primary -> secondary )
>>
>> Executing cluster transition:
>>  * Pseudo action:   ms_king_resource_pre_notify_stop_0
>>  * Pseudo action:   ms_servant2_pre_notify_demote_0
>>  * Resource action: servant3cancel=1 on primary
>>  * Resource action: servant3cancel=11000 on secondary
>>  * Pseudo action:   ms_servant3_pre_notify_demote_0
>>  * Resource action: servant4 stop on primary
>>  * 

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Andrei Borzenkov
02.07.2019 2:30, Harvey Shepherd пишет:
>> The "transition summary" is just a resource-by-resource list, not the
>> order things will be done. The "executing cluster transition" section
>> is the order things are being done.
> 
> Thanks Ken. I think that's where the problem is originating. If you look at 
> the "executing cluster transition" section, it's actually restarting the 
> failed king instance BEFORE promoting the remaining in-service slave. When 
> the failed resource comes back online, that adjusts the master scores, 
> resulting in the transition being aborted. Both nodes then end up having the 
> same master score for the king resource, and Pacemaker decides to re-promote 
> the original master.

It does not explain why it happens only in one direction. Unless your
resource agent is doing something differently in each case, but that is
something only you can check.

> I would have expected Pacemaker's priority to be to ensure that there was a 
> master available first, then to restart the failed instance in slave mode. Is 
> there a way to configure it to do that?
> 
>>
>> Current cluster status:
>> Online: [ primary secondary ]
>>
>>  stk_shared_ip  (ocf::heartbeat:IPaddr2):   Started secondary
>>  Clone Set: ms_king_resource [king_resource] (promotable)
>>  king_resource  (ocf::aviat:king-resource-ocf):FAILED
>> primary
>>  Slaves: [ secondary ]
>>  Clone Set: ms_servant1 [servant1]
>>  Started: [ primary secondary ]
>>  Clone Set: ms_servant2 [servant2] (promotable)
>>  Masters: [ primary ]
>>  Slaves: [ secondary ]
>>  Clone Set: ms_servant3 [servant3] (promotable)
>>  Masters: [ primary ]
>>  Slaves: [ secondary ]
>>  servant4(lsb:servant4):  Started primary
>>  servant5  (lsb:servant5):Started primary
>>  servant6  (lsb:servant6):Started primary
>>  servant7  (lsb:servant7):  Started primary
>>  servant8  (lsb:servant8):Started primary
>>  Resource Group: servant9_active_disabled
>>  servant9_resource1  (lsb:servant9_resource1):Started
>> primary
>>  servant9_resource2   (lsb:servant9_resource2): Started primary
>>  servant10 (lsb:servant10):   Started primary
>>  servant11 (lsb:servant11):  Started primary
>>  servant12(lsb:servant12):  Started primary
>>  servant13(lsb:servant13):  Started primary
>>
>> Transition Summary:
>>  * Recoverking_resource:0 ( Slave primary )
>>  * Promoteking_resource:1 ( Slave -> Master secondary )
>>  * Demote servant2:0  (   Master -> Slave primary )
>>  * Promoteservant2:1  ( Slave -> Master secondary )
>>  * Demote servant3:0  (   Master -> Slave primary )
>>  * Promoteservant3:1  ( Slave -> Master secondary )
>>  * Move   servant4 (  primary -> secondary )
>>  * Move   servant5   (  primary -> secondary )
>>  * Move   servant6   (  primary -> secondary )
>>  * Move   servant7   (  primary -> secondary )
>>  * Move   servant8   (  primary -> secondary )
>>  * Move   servant9_resource1   (  primary ->
>> secondary )
>>  * Move   servant9_resource2(  primary -> secondary )
>>  * Move   servant10  (  primary -> secondary )
>>  * Move   servant11  (  primary -> secondary )
>>  * Move   servant12 (  primary -> secondary
>> )
>>  * Move   servant13 (  primary -> secondary )
>>
>> Executing cluster transition:
>>  * Pseudo action:   ms_king_resource_pre_notify_stop_0
>>  * Pseudo action:   ms_servant2_pre_notify_demote_0
>>  * Resource action: servant3cancel=1 on primary
>>  * Resource action: servant3cancel=11000 on secondary
>>  * Pseudo action:   ms_servant3_pre_notify_demote_0
>>  * Resource action: servant4 stop on primary
>>  * Resource action: servant5   stop on primary
>>  * Resource action: servant6   stop on primary
>>  * Resource action: servant7   stop on primary
>>  * Resource action: servant8   stop on primary
>>  * Pseudo action:   servant9_active_disabled_stop_0
>>  * Resource action: servant9_resource2 stop on primary
>>  * Resource action: servant10  stop on primary
>>  * Resource action: servant11  stop on primary
>>  * Resource action: servant12 stop on primary
>>  * Resource action: servant13 stop on primary
>>  * Resource action: king_resource   notify on primary
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_stop_0
>>  * Pseudo action:   ms_king_resource_stop_0
>>  * Resource action: servant2notify on primary
>>  * Resource action: servant2notify on secondary
>>  * Pseudo action:   ms_servant2_confirmed-pre_notify_demote_0
>>  * Pseudo action:   ms_servant2_demote_0
>>  * 

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Harvey Shepherd
> The "transition summary" is just a resource-by-resource list, not the
> order things will be done. The "executing cluster transition" section
> is the order things are being done.

Thanks Ken. I think that's where the problem is originating. If you look at the 
"executing cluster transition" section, it's actually restarting the failed 
king instance BEFORE promoting the remaining in-service slave. When the failed 
resource comes back online, that adjusts the master scores, resulting in the 
transition being aborted. Both nodes then end up having the same master score 
for the king resource, and Pacemaker decides to re-promote the original master. 
I would have expected Pacemaker's priority to be to ensure that there was a 
master available first, then to restart the failed instance in slave mode. Is 
there a way to configure it to do that?

>
> Current cluster status:
> Online: [ primary secondary ]
>
>  stk_shared_ip  (ocf::heartbeat:IPaddr2):   Started secondary
>  Clone Set: ms_king_resource [king_resource] (promotable)
>  king_resource  (ocf::aviat:king-resource-ocf):FAILED
> primary
>  Slaves: [ secondary ]
>  Clone Set: ms_servant1 [servant1]
>  Started: [ primary secondary ]
>  Clone Set: ms_servant2 [servant2] (promotable)
>  Masters: [ primary ]
>  Slaves: [ secondary ]
>  Clone Set: ms_servant3 [servant3] (promotable)
>  Masters: [ primary ]
>  Slaves: [ secondary ]
>  servant4(lsb:servant4):  Started primary
>  servant5  (lsb:servant5):Started primary
>  servant6  (lsb:servant6):Started primary
>  servant7  (lsb:servant7):  Started primary
>  servant8  (lsb:servant8):Started primary
>  Resource Group: servant9_active_disabled
>  servant9_resource1  (lsb:servant9_resource1):Started
> primary
>  servant9_resource2   (lsb:servant9_resource2): Started primary
>  servant10 (lsb:servant10):   Started primary
>  servant11 (lsb:servant11):  Started primary
>  servant12(lsb:servant12):  Started primary
>  servant13(lsb:servant13):  Started primary
>
> Transition Summary:
>  * Recoverking_resource:0 ( Slave primary )
>  * Promoteking_resource:1 ( Slave -> Master secondary )
>  * Demote servant2:0  (   Master -> Slave primary )
>  * Promoteservant2:1  ( Slave -> Master secondary )
>  * Demote servant3:0  (   Master -> Slave primary )
>  * Promoteservant3:1  ( Slave -> Master secondary )
>  * Move   servant4 (  primary -> secondary )
>  * Move   servant5   (  primary -> secondary )
>  * Move   servant6   (  primary -> secondary )
>  * Move   servant7   (  primary -> secondary )
>  * Move   servant8   (  primary -> secondary )
>  * Move   servant9_resource1   (  primary ->
> secondary )
>  * Move   servant9_resource2(  primary -> secondary )
>  * Move   servant10  (  primary -> secondary )
>  * Move   servant11  (  primary -> secondary )
>  * Move   servant12 (  primary -> secondary
> )
>  * Move   servant13 (  primary -> secondary )
>
> Executing cluster transition:
>  * Pseudo action:   ms_king_resource_pre_notify_stop_0
>  * Pseudo action:   ms_servant2_pre_notify_demote_0
>  * Resource action: servant3cancel=1 on primary
>  * Resource action: servant3cancel=11000 on secondary
>  * Pseudo action:   ms_servant3_pre_notify_demote_0
>  * Resource action: servant4 stop on primary
>  * Resource action: servant5   stop on primary
>  * Resource action: servant6   stop on primary
>  * Resource action: servant7   stop on primary
>  * Resource action: servant8   stop on primary
>  * Pseudo action:   servant9_active_disabled_stop_0
>  * Resource action: servant9_resource2 stop on primary
>  * Resource action: servant10  stop on primary
>  * Resource action: servant11  stop on primary
>  * Resource action: servant12 stop on primary
>  * Resource action: servant13 stop on primary
>  * Resource action: king_resource   notify on primary
>  * Resource action: king_resource   notify on secondary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_stop_0
>  * Pseudo action:   ms_king_resource_stop_0
>  * Resource action: servant2notify on primary
>  * Resource action: servant2notify on secondary
>  * Pseudo action:   ms_servant2_confirmed-pre_notify_demote_0
>  * Pseudo action:   ms_servant2_demote_0
>  * Resource action: servant3notify on primary
>  * Resource action: servant3notify on secondary
>  * Pseudo action:   ms_servant3_confirmed-pre_notify_demote_0
>  * Pseudo action:   ms_servant3_demote_0
>  * Resource action: servant4 start on secondary
>  * Resource action: servant5   start 

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Ken Gaillot
On Sun, 2019-06-30 at 11:13 +, Harvey Shepherd wrote:
> >> There is an ordering constraint - everything must be started after
> the king resource. But even if this constraint didn't exist I don't
> see that it should logically make any difference due to all the non-
> clone resources being colocated with the master of the king resource.
> Surely it would make no sense for Pacemaker to start or move
> colocated resources until a master king resource has been elected?
> >> 
> >>   
> >> 
> >>   
> >>   
> >>   
> >>   
> >>   
> >>   
> >>   
> >>   
> >>   
> >> 
> >>   
> >>   
> >>  score="INFINITY">
> >>sequential="false">
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>   
> >>role="Master">
> >> 
> >> 
> >> 
> >>   
> >> 
> >>  first="ms_servant2" then="servant2_dependents"/>
> >>   
> >> 
> 
> >This ordering constraint is satisfied by slave of ms_servant2. Slave
> is already started at the point failover happens so pacemaker is free
> to start all other resources immediately. If you intend to order
> against master, you need first-action="promote" then-action="start".
> 
> As I mentioned in my last message I have trouble with using first-
> action="promote" because some of the dependents are clone resources.
> I just tried it again and the dependent clones only start on the
> master node with this setting. What I really need is a first-
> action="promote | demote" setting, but this isn't available. I tried
> adding two separate rules but Pacemaker doesn't like that and none of
> the dependents start.

Simply leaving off first-action allows the dependency against either
the master or slave role. It sounds you're mixing primitives and clones
in the same colocation, and want the primitives only with the master
role but the clones with any role -- that will require separate
constraints (the primitives colocated with the master role and ordered
after promote, the clones colocated without specifying the role and
ordered without specifying the action).

> 
> From: Harvey Shepherd
> Sent: Sunday, 30 June 2019 5:34 p.m.
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Subject: Re: EXTERNAL: Re: [ClusterLabs] Problems with master/slave
> failovers
>  
> Thanks Andrei for the time you've taken to look at this issue for me.
> What's actually happening with the master preference scores is that
> initially when the master fails, the other node does have a higher
> preference to become master. Pacemaker tries and fails multiple times
> to perform the failover, resulting in the "transition aborted" logs
> that I posted previously. By the time all that has happened, the
> original master has restarted and therefore has the same master
> preference as the original slave, hence pacemaker just re-promotes
> the same master. Occasionally the failover is successful, but I think
> it's down to luck with timing. 
> 
> I think that the root cause is that pacemaker is trying to move the
> servant resources prior to promoting the king master. Your last
> suggestion about changing the ordering constraint to depend on
> promotion of the master/slave resource rather than when it starts
> make sense, and I'll try changing that and see if it makes a
> difference. I have tried that in the past however and had trouble
> with clone resources that are dependents only starting on the master
> node with that setting. I'll try again though and let you know how it
> goes. 
> 
> Thanks, 
> Harvey 
> 
> On 30 Jun 2019 5:14 pm, Andrei Borzenkov  wrote:
> > 28.06.2019 9:45, Andrei Borzenkov пишет:
> > > On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
> > >  wrote:
> > >>
> > >> Hi All,
> > >>
> > >>
> > >> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one
> > master/slave resource (I'll refer to it as the king resource) and
> > about 20 other resources which are a mixture of:
> > >>
> > >>
> > >> - resources that only run on the king resource master node
> > (colocation constraint with a score of INFINITY)
> > >>
> > >> 

Re: [ClusterLabs] Problems with master/slave failovers

2019-07-01 Thread Ken Gaillot
post_notify_promoted_0
>  * Pseudo action:   ms_servant3_post_notify_promoted_0
>  * Resource action: king_resource   notify on primary
>  * Resource action: king_resource   notify on secondary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_running_0
>  * Resource action: servant2notify on primary
>  * Resource action: servant2notify on secondary
>  * Pseudo action:   ms_servant2_confirmed-post_notify_promoted_0
>  * Resource action: servant3notify on primary
>  * Resource action: servant3notify on secondary
>  * Pseudo action:   ms_servant3_confirmed-post_notify_promoted_0
>  * Pseudo action:   ms_king_resource_pre_notify_promote_0
>  * Resource action: servant2monitor=11000 on primary
>  * Resource action: servant2monitor=1 on secondary
>  * Resource action: servant3monitor=11000 on primary
>  * Resource action: servant3monitor=1 on secondary
>  * Resource action: king_resource   notify on primary
>  * Resource action: king_resource   notify on secondary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_promote_0
>  * Pseudo action:   ms_king_resource_promote_0
>  * Resource action: king_resource   promote on secondary
>  * Pseudo action:   ms_king_resource_promoted_0
>  * Pseudo action:   ms_king_resource_post_notify_promoted_0
>  * Resource action: king_resource   notify on primary
>  * Resource action: king_resource   notify on secondary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_promoted_0
>  * Resource action: king_resource   monitor=11000 on primary
>  * Resource action: king_resource   monitor=1 on secondary
> Using the original execution date of: 2019-06-29 02:33:03Z
> 
> Revised cluster status:
> Online: [ primary secondary ]
> 
>  stk_shared_ip  (ocf::heartbeat:IPaddr2):   Started secondary
>  Clone Set: ms_king_resource [king_resource] (promotable)
>  Masters: [ secondary ]
>  Slaves: [ primary ]
>  Clone Set: ms_servant1 [servant1]
>  Started: [ primary secondary ]
>  Clone Set: ms_servant2 [servant2] (promotable)
>  Masters: [ secondary ]
>  Slaves: [ primary ]
>  Clone Set: ms_servant3 [servant3] (promotable)
>  Masters: [ secondary ]
>  Slaves: [ primary ]
>  servant4(lsb:servant4):  Started secondary
>  servant5  (lsb:servant5):Started secondary
>  servant6  (lsb:servant6):Started secondary
>  servant7  (lsb:servant7):  Started secondary
>  servant8  (lsb:servant8):        Started secondary
>  Resource Group: servant9_active_disabled
>  servant9_resource1  (lsb:servant9_resource1):Started
> secondary
>  servant9_resource2   (lsb:servant9_resource2): Started secondary
>  servant10 (lsb:servant10):   Started secondary
>  servant11 (lsb:servant11):  Started secondary
>  servant12(lsb:servant12):  Started secondary
>  servant13(lsb:servant13):  Started secondary
> 
> 
> I don't think that there is an issue with the CIB constraints
> configuration, otherwise the resources would not be able to start
> upon bootup, but I'll keep digging and report back if I find any
> cause.
> 
> Thanks again,
> Harvey
> 
> 
> From: Users  on behalf of Ken Gaillot
> 
> Sent: Saturday, 29 June 2019 3:10 a.m.
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave
> failovers
> 
> On Fri, 2019-06-28 at 07:36 +, Harvey Shepherd wrote:
> > Thanks for your reply Andrei. Whilst I understand what you say
> > about
> > the difficulties of diagnosing issues without all of the info, it's
> > a
> > compromise between a mailing list posting being very verbose in
> > which
> > case nobody wants to read it, and containing enough relevant
> > information for someone to be able to help. With 20+ resources
> > involved during a failover there are literally thousands of logs
> > generated, and it would be pointless to post them all.
> > 
> > I've tried to focus in on the king resource only to keep things
> > simple, as that is the only resource that can initiate a failover.
> > I
> > provided the real master scores and transition decisions made by
> > pacemaker at the times that I killed the king master resource by
> > showing the crm_simulator output from both tests, and the CIB
> > config
> > is ss described. As I mentioned, migration-threshold is set to zero
> > for all resources, so it shouldn't prevent a second failover.
> > 
> > Regarding the resource agent return codes, the failure is detected
> > by
> > the 1

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-30 Thread Harvey Shepherd
>> There is an ordering constraint - everything must be started after the king 
>> resource. But even if this constraint didn't exist I don't see that it 
>> should logically make any difference due to all the non-clone resources 
>> being colocated with the master of the king resource. Surely it would make 
>> no sense for Pacemaker to start or move colocated resources until a master 
>> king resource has been elected?
>>
>>   
>> 
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>> 
>>   
>>   
>> > score="INFINITY">
>>   
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>   
>>   > role="Master">
>> 
>> 
>> 
>>   
>> 
>> > first="ms_servant2" then="servant2_dependents"/>
>>   
>>

>This ordering constraint is satisfied by slave of ms_servant2. Slave is 
>already started at the point failover happens so pacemaker is free to start 
>all other resources immediately. If you intend to order against master, you 
>need first-action="promote" then-action="start".

As I mentioned in my last message I have trouble with using 
first-action="promote" because some of the dependents are clone resources. I 
just tried it again and the dependent clones only start on the master node with 
this setting. What I really need is a first-action="promote | demote" setting, 
but this isn't available. I tried adding two separate rules but Pacemaker 
doesn't like that and none of the dependents start.



From: Harvey Shepherd
Sent: Sunday, 30 June 2019 5:34 p.m.
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers

Thanks Andrei for the time you've taken to look at this issue for me. What's 
actually happening with the master preference scores is that initially when the 
master fails, the other node does have a higher preference to become master. 
Pacemaker tries and fails multiple times to perform the failover, resulting in 
the "transition aborted" logs that I posted previously. By the time all that 
has happened, the original master has restarted and therefore has the same 
master preference as the original slave, hence pacemaker just re-promotes the 
same master. Occasionally the failover is successful, but I think it's down to 
luck with timing.

I think that the root cause is that pacemaker is trying to move the servant 
resources prior to promoting the king master. Your last suggestion about 
changing the ordering constraint to depend on promotion of the master/slave 
resource rather than when it starts make sense, and I'll try changing that and 
see if it makes a difference. I have tried that in the past however and had 
trouble with clone resources that are dependents only starting on the master 
node with that setting. I'll try again though and let you know how it goes.

Thanks,
Harvey

On 30 Jun 2019 5:14 pm, Andrei Borzenkov  wrote:
28.06.2019 9:45, Andrei Borzenkov пишет:
> On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
>  wrote:
>>
>> Hi All,
>>
>>
>> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave 
>> resource (I'll refer to it as the king resource) and about 20 other 
>> resources which are a mixture of:
>>
>>
>> - resources that only run on the king resource master node (colocation 
>> constraint with a score of INFINITY)
>>
>> - clone resources that run on both nodes
>>
>> - two other master/slave resources where the masters runs on the same node 
>> as the king resource master (colocation constraint with a score of INFINITY)
>>
>>
>> I'll refer to the above set of resources as servant resources.
>>
>>
>> All servant resources have a resource-stickiness of zero and the king 
>> resource has a resource-stickiness of 100. There is an ordering constraint 
>> that the king resource must start before all servant resources. The king 
>> resource is controlled by an OCF script that uses crm_master to set the 
>> preferred master for the king resource (current master has value 100, 
>> current slave is 5, unassigned role or resource failure is 1) - I've 
>> verified that these values are being set as expected upon 
>> promotion/demotion/failure etc, via the logs. That's pretty much all of the 
>> configuration - there is

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-29 Thread Andrei Borzenkov
28.06.2019 9:45, Andrei Borzenkov пишет:
> On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
>  wrote:
>>
>> Hi All,
>>
>>
>> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave 
>> resource (I'll refer to it as the king resource) and about 20 other 
>> resources which are a mixture of:
>>
>>
>> - resources that only run on the king resource master node (colocation 
>> constraint with a score of INFINITY)
>>
>> - clone resources that run on both nodes
>>
>> - two other master/slave resources where the masters runs on the same node 
>> as the king resource master (colocation constraint with a score of INFINITY)
>>
>>
>> I'll refer to the above set of resources as servant resources.
>>
>>
>> All servant resources have a resource-stickiness of zero and the king 
>> resource has a resource-stickiness of 100. There is an ordering constraint 
>> that the king resource must start before all servant resources. The king 
>> resource is controlled by an OCF script that uses crm_master to set the 
>> preferred master for the king resource (current master has value 100, 
>> current slave is 5, unassigned role or resource failure is 1) - I've 
>> verified that these values are being set as expected upon 
>> promotion/demotion/failure etc, via the logs. That's pretty much all of the 
>> configuration - there is no configuration around node preferences and 
>> migration-threshold is zero for everything.
>>
>>
>> What I'm trying to achieve is fairly simple:
>>
>>
>> 1. If any servant resource fails on either node, it is simply restarted. 
>> These resources should never failover onto the other node because of 
>> colocation with the king resource, and they should not contribute in any way 
>> to deciding whether the king resource should failover (which is why they 
>> have a resource-stickiness of zero).
>>
>> 2. If the slave instance of the king resource fails, it should simply be 
>> restarted and again no failover should occur.
>>
>> 3. If the master instance of the king resource fails, then its slave 
>> instance should immediately be promoted, and the failed instance should be 
>> restarted. Failover of all servant resources should then occur due to the 
>> colocation dependency.
>>
>>
>> It's number 3 above that I'm having trouble with. If I kill the master king 
>> resource instance it behaves as I expect - everything fails over and the 
>> king resource is restarted on the new slave. If I then kill the master 
>> instance of the king resource again however, instead of failing back over to 
>> its original node, it restarts and promotes back to master on the same node. 
>> This is not what I want.
>>
> 
> migration-threshold is the first thing that comes in mind. Another
> possibility is hard error returned by resource agent that forces
> resource off node.
> 
> But please realize that without actual configuration and logs at the
> time undesired behavior happens it just becomes game of riddles.
> 
>>
>> The relevant output from crm_simulate for the two tests is shown below. Can 
>> anyone suggest what might be going wrong? Whilst I really like the concept 
>> of crm_simulate, I can't find a good description of how to interpret the 
>> output and I don't understand the difference between clone_color and 
>> native_color, or the difference between "promotion scores" and the various 
>> instances of "allocation scores", nor does it really tell me what is 
>> contributing to the scores. Where does the -INFINITY allocation score come 
>> from for example?
>>
>>
>> Thanks,
>>
>> Harvey
>>
>>
>>
>> FIRST KING RESOURCE MASTER FAILURE (CORRECT BEHAVIOUR - MASTER NODE FAILOVER 
>> OCCURS)
>>
>>
>>  Clone Set: ms_king_resource [king_resource] (promotable)
>>  king_resource  (ocf::aviat:king-resource-ocf):FAILED Master 
>> secondary
>> clone_color: ms_king_resource allocation score on primary: 0
>> clone_color: ms_king_resource allocation score on secondary: 0
>> clone_color: king_resource:0 allocation score on primary: 0
>> clone_color: king_resource:0 allocation score on secondary: 101
>> clone_color: king_resource:1 allocation score on primary: 200
>> clone_color: king_resource:1 allocation score on secondary: 0
>> native_color: king_resource:1 allocation score on primary: 200
>> native_color: king_resource:1 allocation score on secondary: 0
>> native_color: king_resource:0 allocation score on primary: -INFINITY
>> native_color: king_resource:0 allocation score on secondary: 101
>> king_resource:1 promotion score on primary: 100
>> king_resource:0 promotion score on secondary: 1
>>  * Recoverking_resource:0  ( Master -> Slave secondary )
>>  * Promoteking_resource:1  (   Slave -> Master primary )
>>  * Resource action: king_resource   cancel=1 on secondary
>>  * Resource action: king_resource   cancel=11000 on primary
>>  * Pseudo action:   ms_king_resource_pre_notify_demote_0
>>  * Resource action: king_resource   notify on secondary
>>  * Resource action: 

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-29 Thread Andrei Borzenkov
29.06.2019 8:05, Harvey Shepherd пишет:
> There is an ordering constraint - everything must be started after the king 
> resource. But even if this constraint didn't exist I don't see that it should 
> logically make any difference due to all the non-clone resources being 
> colocated with the master of the king resource. Surely it would make no sense 
> for Pacemaker to start or move colocated resources until a master king 
> resource has been elected?
> 
>   
> 
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
>   
>   
>  score="INFINITY">
>   
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
>   
> 
> 
> 
>   
> 
>  first="ms_servant2" then="servant2_dependents"/>
>   
> 

This ordering constraint is satisfied by slave of ms_servant2. Slave is
already started at the point failover happens so pacemaker is free to
start all other resources immediately. If you intend to order against
master, you need first-action="promote" then-action="start".

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
There is an ordering constraint - everything must be started after the king 
resource. But even if this constraint didn't exist I don't see that it should 
logically make any difference due to all the non-clone resources being 
colocated with the master of the king resource. Surely it would make no sense 
for Pacemaker to start or move colocated resources until a master king resource 
has been elected?

  

  
  
  
  
  
  
  
  
  

  
  

  











  
  



  


  

Regards,
Harvey

From: Users  on behalf of Andrei Borzenkov 

Sent: Saturday, 29 June 2019 4:13 p.m.
To: users@clusterlabs.org
Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers

29.06.2019 6:01, Harvey Shepherd пишет:
>
> As you can see, it eventually gives up in the transition attempt and starts a 
> new one. Eventually the failed king resource master has had time to come back 
> online and it then just promotes it again and forgets about trying to 
> failover. I'm not sure if the cluster transition actions listed by 
> crm_simulate are in the order in which Pacemaker tries to carry out the 
> operations, but if so the order is wrong. It should be stopping all servant 
> resources on the failed king master, then failing over the king resource, 
> then migrating the servant resources to the new master node. Instead it seems 
> to be trying to migrate all the servant resources over first, with the king 
> master failover scheduled near the bottom, which won't work due to the 
> colocation constraint with the king master.

Unless you configured explicit ordering between resources, pacemaker is
free to chose any order.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
]
 Clone Set: ms_servant3 [servant3] (promotable)
 Masters: [ secondary ]
 Slaves: [ primary ]
 servant4(lsb:servant4):  Started secondary
 servant5  (lsb:servant5):Started secondary
 servant6  (lsb:servant6):Started secondary
 servant7  (lsb:servant7):  Started secondary
 servant8  (lsb:servant8):Started secondary
 Resource Group: servant9_active_disabled
 servant9_resource1  (lsb:servant9_resource1):Started secondary
 servant9_resource2   (lsb:servant9_resource2): Started secondary
 servant10 (lsb:servant10):   Started secondary
 servant11 (lsb:servant11):  Started secondary
 servant12(lsb:servant12):  Started secondary
 servant13(lsb:servant13):  Started secondary


I don't think that there is an issue with the CIB constraints configuration, 
otherwise the resources would not be able to start upon bootup, but I'll keep 
digging and report back if I find any cause.

Thanks again,
Harvey


From: Users  on behalf of Ken Gaillot 

Sent: Saturday, 29 June 2019 3:10 a.m.
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers

On Fri, 2019-06-28 at 07:36 +, Harvey Shepherd wrote:
> Thanks for your reply Andrei. Whilst I understand what you say about
> the difficulties of diagnosing issues without all of the info, it's a
> compromise between a mailing list posting being very verbose in which
> case nobody wants to read it, and containing enough relevant
> information for someone to be able to help. With 20+ resources
> involved during a failover there are literally thousands of logs
> generated, and it would be pointless to post them all.
>
> I've tried to focus in on the king resource only to keep things
> simple, as that is the only resource that can initiate a failover. I
> provided the real master scores and transition decisions made by
> pacemaker at the times that I killed the king master resource by
> showing the crm_simulator output from both tests, and the CIB config
> is ss described. As I mentioned, migration-threshold is set to zero
> for all resources, so it shouldn't prevent a second failover.
>
> Regarding the resource agent return codes, the failure is detected by
> the 10s king resource master instance monitor operation, which
> returns OCF_ERR_GENERIC because the resource is expected to be
> running and isn't (the OCF resource agent developers guide states
> that monitor should only return OCF_NOT_RUNNING if there is no error
> condition that caused the resource to stop).
>
> What would be really helpful would be if you or someone else could
> help me decipher the crm_simulate output:

I've been working with Pacemaker for years and still look at those
scores only after exhausting all other investigation.

It isn't AI, but the complexity is somewhat similar in that it's not
really possible to boil down the factors that went into a decision in a
few human-readable sentences. We do have a project planned to provide
some insight in human-readable form.

But if you really want the headache:

> 1. What is the difference between clone_color and native_color?

native_color is scores added by the resource as a primitive resource,
i.e. the resource being cloned. clone_color is scores added by the
resource as a clone, i.e. the internal abstraction that allows a
primitive resource to run in multiple places. All it really means is
that different C functions added the scores, which is pretty useless
without staring at the source code of those functions.

> 2. What is the difference between "promotion scores" and "allocation
> scores" and why does the output show several instances of each?

Allocation is placement of particular resources (including individual
clone instances) to particular nodes; promotion is selecting an
instance to be master.

The multiple occurrences are due to multiple factors going into the
final score.

> 3. How does pacemaker use those scores to decide whether to failover?

It doesn't -- it uses them to determine where to failover. Whether to
failover is determined by fail-count and resource operation history
(and affected by configured policies such as on-fail, failure-timeout,
and migration-threshold).

> 4. Why is there a -INFINITY score on one node?

That sometimes requires trace-level debugging and following the path
through the source code. Which I don't recommend unless you're wanting
to make this a full-time gig :)

At this level of investigation, I usually start with giving
crm_simulate -, which will show up to info-level logs. If that
doesn't make it clear, add another -V for debug logs, and then another
-V for trace logs, but that stretches the bounds of human
intelligibility. Somewhat more helpful is PCMK_trace_tags= before crm_simulate, which will give

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Ken Gaillot
On Fri, 2019-06-28 at 07:36 +, Harvey Shepherd wrote:
> Thanks for your reply Andrei. Whilst I understand what you say about
> the difficulties of diagnosing issues without all of the info, it's a
> compromise between a mailing list posting being very verbose in which
> case nobody wants to read it, and containing enough relevant
> information for someone to be able to help. With 20+ resources
> involved during a failover there are literally thousands of logs
> generated, and it would be pointless to post them all.
> 
> I've tried to focus in on the king resource only to keep things
> simple, as that is the only resource that can initiate a failover. I
> provided the real master scores and transition decisions made by
> pacemaker at the times that I killed the king master resource by
> showing the crm_simulator output from both tests, and the CIB config
> is ss described. As I mentioned, migration-threshold is set to zero
> for all resources, so it shouldn't prevent a second failover. 
> 
> Regarding the resource agent return codes, the failure is detected by
> the 10s king resource master instance monitor operation, which
> returns OCF_ERR_GENERIC because the resource is expected to be
> running and isn't (the OCF resource agent developers guide states
> that monitor should only return OCF_NOT_RUNNING if there is no error
> condition that caused the resource to stop).
> 
> What would be really helpful would be if you or someone else could
> help me decipher the crm_simulate output:

I've been working with Pacemaker for years and still look at those
scores only after exhausting all other investigation.

It isn't AI, but the complexity is somewhat similar in that it's not
really possible to boil down the factors that went into a decision in a
few human-readable sentences. We do have a project planned to provide
some insight in human-readable form.

But if you really want the headache:

> 1. What is the difference between clone_color and native_color? 

native_color is scores added by the resource as a primitive resource,
i.e. the resource being cloned. clone_color is scores added by the
resource as a clone, i.e. the internal abstraction that allows a
primitive resource to run in multiple places. All it really means is
that different C functions added the scores, which is pretty useless
without staring at the source code of those functions.

> 2. What is the difference between "promotion scores" and "allocation
> scores" and why does the output show several instances of each? 

Allocation is placement of particular resources (including individual
clone instances) to particular nodes; promotion is selecting an
instance to be master.

The multiple occurrences are due to multiple factors going into the
final score.

> 3. How does pacemaker use those scores to decide whether to failover?

It doesn't -- it uses them to determine where to failover. Whether to
failover is determined by fail-count and resource operation history
(and affected by configured policies such as on-fail, failure-timeout,
and migration-threshold).

> 4. Why is there a -INFINITY score on one node? 

That sometimes requires trace-level debugging and following the path
through the source code. Which I don't recommend unless you're wanting
to make this a full-time gig :)

At this level of investigation, I usually start with giving
crm_simulate -, which will show up to info-level logs. If that
doesn't make it clear, add another -V for debug logs, and then another
-V for trace logs, but that stretches the bounds of human
intelligibility. Somewhat more helpful is PCMK_trace_tags= before crm_simulate, which will give some trace-level output for
the given resource without swamping you with infinite detail. For
clones it's best to use PCMK_trace_tags=,
and sometimes even :0, etc.

> Thanks again for your help. 
> 
> 
> 
> On 28 Jun 2019 6:46 pm, Andrei Borzenkov  wrote:
> > On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
> >  wrote:
> > >
> > > Hi All,
> > >
> > >
> > > I'm running Pacemaker 2.0.2 on a two node cluster. It runs one
> > master/slave resource (I'll refer to it as the king resource) and
> > about 20 other resources which are a mixture of:
> > >
> > >
> > > - resources that only run on the king resource master node
> > (colocation constraint with a score of INFINITY)
> > >
> > > - clone resources that run on both nodes
> > >
> > > - two other master/slave resources where the masters runs on the
> > same node as the king resource master (colocation constraint with a
> > score of INFINITY)
> > >
> > >
> > > I'll refer to the above set of resources as servant resources.
> > >
> > >
> > > All servant resources have a resource-stickiness of zero and the
> > king resource has a resource-stickiness of 100. There is an
> > ordering constraint that the king resource must start before all
> > servant resources. The king resource is controlled by an OCF script
> > that uses crm_master to set the preferred master for the king
> > resource 

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Harvey Shepherd
Thanks for your reply Andrei. Whilst I understand what you say about the 
difficulties of diagnosing issues without all of the info, it's a compromise 
between a mailing list posting being very verbose in which case nobody wants to 
read it, and containing enough relevant information for someone to be able to 
help. With 20+ resources involved during a failover there are literally 
thousands of logs generated, and it would be pointless to post them all.

I've tried to focus in on the king resource only to keep things simple, as that 
is the only resource that can initiate a failover. I provided the real master 
scores and transition decisions made by pacemaker at the times that I killed 
the king master resource by showing the crm_simulator output from both tests, 
and the CIB config is ss described. As I mentioned, migration-threshold is set 
to zero for all resources, so it shouldn't prevent a second failover.

Regarding the resource agent return codes, the failure is detected by the 10s 
king resource master instance monitor operation, which returns OCF_ERR_GENERIC 
because the resource is expected to be running and isn't (the OCF resource 
agent developers guide states that monitor should only return OCF_NOT_RUNNING 
if there is no error condition that caused the resource to stop).

What would be really helpful would be if you or someone else could help me 
decipher the crm_simulate output:

1. What is the difference between clone_color and native_color?
2. What is the difference between "promotion scores" and "allocation scores" 
and why does the output show several instances of each?
3. How does pacemaker use those scores to decide whether to failover?
4. Why is there a -INFINITY score on one node?

Thanks again for your help.



On 28 Jun 2019 6:46 pm, Andrei Borzenkov  wrote:
On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
 wrote:
>
> Hi All,
>
>
> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave 
> resource (I'll refer to it as the king resource) and about 20 other resources 
> which are a mixture of:
>
>
> - resources that only run on the king resource master node (colocation 
> constraint with a score of INFINITY)
>
> - clone resources that run on both nodes
>
> - two other master/slave resources where the masters runs on the same node as 
> the king resource master (colocation constraint with a score of INFINITY)
>
>
> I'll refer to the above set of resources as servant resources.
>
>
> All servant resources have a resource-stickiness of zero and the king 
> resource has a resource-stickiness of 100. There is an ordering constraint 
> that the king resource must start before all servant resources. The king 
> resource is controlled by an OCF script that uses crm_master to set the 
> preferred master for the king resource (current master has value 100, current 
> slave is 5, unassigned role or resource failure is 1) - I've verified that 
> these values are being set as expected upon promotion/demotion/failure etc, 
> via the logs. That's pretty much all of the configuration - there is no 
> configuration around node preferences and migration-threshold is zero for 
> everything.
>
>
> What I'm trying to achieve is fairly simple:
>
>
> 1. If any servant resource fails on either node, it is simply restarted. 
> These resources should never failover onto the other node because of 
> colocation with the king resource, and they should not contribute in any way 
> to deciding whether the king resource should failover (which is why they have 
> a resource-stickiness of zero).
>
> 2. If the slave instance of the king resource fails, it should simply be 
> restarted and again no failover should occur.
>
> 3. If the master instance of the king resource fails, then its slave instance 
> should immediately be promoted, and the failed instance should be restarted. 
> Failover of all servant resources should then occur due to the colocation 
> dependency.
>
>
> It's number 3 above that I'm having trouble with. If I kill the master king 
> resource instance it behaves as I expect - everything fails over and the king 
> resource is restarted on the new slave. If I then kill the master instance of 
> the king resource again however, instead of failing back over to its original 
> node, it restarts and promotes back to master on the same node. This is not 
> what I want.
>

migration-threshold is the first thing that comes in mind. Another
possibility is hard error returned by resource agent that forces
resource off node.

But please realize that without actual configuration and logs at the
time undesired behavior happens it just becomes game of riddles.

>
> The relevant output from crm_simulate for the two tests is shown below. Can 
> anyone suggest what might be going wrong? Whilst I really like the concept of 
> crm_simulate, I can't find a good description of how to interpret the output 
> and I don't understand the difference between clone_color and native_color, 
> or the 

Re: [ClusterLabs] Problems with master/slave failovers

2019-06-28 Thread Andrei Borzenkov
On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
 wrote:
>
> Hi All,
>
>
> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave 
> resource (I'll refer to it as the king resource) and about 20 other resources 
> which are a mixture of:
>
>
> - resources that only run on the king resource master node (colocation 
> constraint with a score of INFINITY)
>
> - clone resources that run on both nodes
>
> - two other master/slave resources where the masters runs on the same node as 
> the king resource master (colocation constraint with a score of INFINITY)
>
>
> I'll refer to the above set of resources as servant resources.
>
>
> All servant resources have a resource-stickiness of zero and the king 
> resource has a resource-stickiness of 100. There is an ordering constraint 
> that the king resource must start before all servant resources. The king 
> resource is controlled by an OCF script that uses crm_master to set the 
> preferred master for the king resource (current master has value 100, current 
> slave is 5, unassigned role or resource failure is 1) - I've verified that 
> these values are being set as expected upon promotion/demotion/failure etc, 
> via the logs. That's pretty much all of the configuration - there is no 
> configuration around node preferences and migration-threshold is zero for 
> everything.
>
>
> What I'm trying to achieve is fairly simple:
>
>
> 1. If any servant resource fails on either node, it is simply restarted. 
> These resources should never failover onto the other node because of 
> colocation with the king resource, and they should not contribute in any way 
> to deciding whether the king resource should failover (which is why they have 
> a resource-stickiness of zero).
>
> 2. If the slave instance of the king resource fails, it should simply be 
> restarted and again no failover should occur.
>
> 3. If the master instance of the king resource fails, then its slave instance 
> should immediately be promoted, and the failed instance should be restarted. 
> Failover of all servant resources should then occur due to the colocation 
> dependency.
>
>
> It's number 3 above that I'm having trouble with. If I kill the master king 
> resource instance it behaves as I expect - everything fails over and the king 
> resource is restarted on the new slave. If I then kill the master instance of 
> the king resource again however, instead of failing back over to its original 
> node, it restarts and promotes back to master on the same node. This is not 
> what I want.
>

migration-threshold is the first thing that comes in mind. Another
possibility is hard error returned by resource agent that forces
resource off node.

But please realize that without actual configuration and logs at the
time undesired behavior happens it just becomes game of riddles.

>
> The relevant output from crm_simulate for the two tests is shown below. Can 
> anyone suggest what might be going wrong? Whilst I really like the concept of 
> crm_simulate, I can't find a good description of how to interpret the output 
> and I don't understand the difference between clone_color and native_color, 
> or the difference between "promotion scores" and the various instances of 
> "allocation scores", nor does it really tell me what is contributing to the 
> scores. Where does the -INFINITY allocation score come from for example?
>
>
> Thanks,
>
> Harvey
>
>
>
> FIRST KING RESOURCE MASTER FAILURE (CORRECT BEHAVIOUR - MASTER NODE FAILOVER 
> OCCURS)
>
>
>  Clone Set: ms_king_resource [king_resource] (promotable)
>  king_resource  (ocf::aviat:king-resource-ocf):FAILED Master 
> secondary
> clone_color: ms_king_resource allocation score on primary: 0
> clone_color: ms_king_resource allocation score on secondary: 0
> clone_color: king_resource:0 allocation score on primary: 0
> clone_color: king_resource:0 allocation score on secondary: 101
> clone_color: king_resource:1 allocation score on primary: 200
> clone_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:1 allocation score on primary: 200
> native_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:0 allocation score on primary: -INFINITY
> native_color: king_resource:0 allocation score on secondary: 101
> king_resource:1 promotion score on primary: 100
> king_resource:0 promotion score on secondary: 1
>  * Recoverking_resource:0  ( Master -> Slave secondary )
>  * Promoteking_resource:1  (   Slave -> Master primary )
>  * Resource action: king_resource   cancel=1 on secondary
>  * Resource action: king_resource   cancel=11000 on primary
>  * Pseudo action:   ms_king_resource_pre_notify_demote_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_demote_0
>  * Pseudo action:   ms_king_resource_demote_0
>  * Resource action: 

[ClusterLabs] Problems with master/slave failovers

2019-06-27 Thread Harvey Shepherd
Hi All,


I'm running Pacemaker 2.0.2 on a two node cluster. It runs one master/slave 
resource (I'll refer to it as the king resource) and about 20 other resources 
which are a mixture of:


- resources that only run on the king resource master node (colocation 
constraint with a score of INFINITY)

- clone resources that run on both nodes

- two other master/slave resources where the masters runs on the same node as 
the king resource master (colocation constraint with a score of INFINITY)


I'll refer to the above set of resources as servant resources.


All servant resources have a resource-stickiness of zero and the king resource 
has a resource-stickiness of 100. There is an ordering constraint that the king 
resource must start before all servant resources. The king resource is 
controlled by an OCF script that uses crm_master to set the preferred master 
for the king resource (current master has value 100, current slave is 5, 
unassigned role or resource failure is 1) - I've verified that these values are 
being set as expected upon promotion/demotion/failure etc, via the logs. That's 
pretty much all of the configuration - there is no configuration around node 
preferences and migration-threshold is zero for everything.


What I'm trying to achieve is fairly simple:


1. If any servant resource fails on either node, it is simply restarted. These 
resources should never failover onto the other node because of colocation with 
the king resource, and they should not contribute in any way to deciding 
whether the king resource should failover (which is why they have a 
resource-stickiness of zero).

2. If the slave instance of the king resource fails, it should simply be 
restarted and again no failover should occur.

3. If the master instance of the king resource fails, then its slave instance 
should immediately be promoted, and the failed instance should be restarted. 
Failover of all servant resources should then occur due to the colocation 
dependency.


It's number 3 above that I'm having trouble with. If I kill the master king 
resource instance it behaves as I expect - everything fails over and the king 
resource is restarted on the new slave. If I then kill the master instance of 
the king resource again however, instead of failing back over to its original 
node, it restarts and promotes back to master on the same node. This is not 
what I want.


The relevant output from crm_simulate for the two tests is shown below. Can 
anyone suggest what might be going wrong? Whilst I really like the concept of 
crm_simulate, I can't find a good description of how to interpret the output 
and I don't understand the difference between clone_color and native_color, or 
the difference between "promotion scores" and the various instances of 
"allocation scores", nor does it really tell me what is contributing to the 
scores. Where does the -INFINITY allocation score come from for example?


Thanks,

Harvey



FIRST KING RESOURCE MASTER FAILURE (CORRECT BEHAVIOUR - MASTER NODE FAILOVER 
OCCURS)


 Clone Set: ms_king_resource [king_resource] (promotable)
 king_resource  (ocf::aviat:king-resource-ocf):FAILED Master 
secondary
clone_color: ms_king_resource allocation score on primary: 0
clone_color: ms_king_resource allocation score on secondary: 0
clone_color: king_resource:0 allocation score on primary: 0
clone_color: king_resource:0 allocation score on secondary: 101
clone_color: king_resource:1 allocation score on primary: 200
clone_color: king_resource:1 allocation score on secondary: 0
native_color: king_resource:1 allocation score on primary: 200
native_color: king_resource:1 allocation score on secondary: 0
native_color: king_resource:0 allocation score on primary: -INFINITY
native_color: king_resource:0 allocation score on secondary: 101
king_resource:1 promotion score on primary: 100
king_resource:0 promotion score on secondary: 1
 * Recoverking_resource:0  ( Master -> Slave secondary )
 * Promoteking_resource:1  (   Slave -> Master primary )
 * Resource action: king_resource   cancel=1 on secondary
 * Resource action: king_resource   cancel=11000 on primary
 * Pseudo action:   ms_king_resource_pre_notify_demote_0
 * Resource action: king_resource   notify on secondary
 * Resource action: king_resource   notify on primary
 * Pseudo action:   ms_king_resource_confirmed-pre_notify_demote_0
 * Pseudo action:   ms_king_resource_demote_0
 * Resource action: king_resource   demote on secondary
 * Pseudo action:   ms_king_resource_demoted_0
 * Pseudo action:   ms_king_resource_post_notify_demoted_0
 * Resource action: king_resource   notify on secondary
 * Resource action: king_resource   notify on primary
 * Pseudo action:   ms_king_resource_confirmed-post_notify_demoted_0
 * Pseudo action:   ms_king_resource_pre_notify_stop_0
 * Resource action: king_resource   notify on secondary
 * Resource action: king_resource   notify on primary
 * Pseudo action: