Re: [ClusterLabs] crmsh resource failcount does not appear to work

2018-01-01 Thread Andrei Borzenkov
02.01.2018 06:48, Ken Gaillot пишет:
> On Wed, 2017-12-27 at 14:03 +0300, Andrei Borzenkov wrote:
>> On Wed, Dec 27, 2017 at 11:40 AM, Kristoffer Grönlund
>>  wrote:
>>>
>>> Andrei Borzenkov  writes:
>>>
 As far as I can tell, pacemaker acts on failcount attributes
 qualified
 by operation name, while crm sets/queries unqualified attribute;
 I do
 not see any syntax to set fail-count for specific operation in
 crmsh.
>>>
>>> crmsh uses crm_attribute to get the failcount. It could be that
>>> this
>>> usage has stopped working as of 1.1.17..
>>>
>>
>> There is probably misunderstanding. The problem is what attribute is
>> used, not how it is set.  crmsh sets (and as far as I can tell always
>> set) attribute with name fail-count- while pacemaker
>> internally sets and queries attributes with name
>> fail-count-#.
>>
>> It is possible that this has changed in recent pacemaker versions of
>> course ... yep, here is crm_failcount commit that implemented new
>> (per-operation) failcounts. Which means "crm resource failcount set"
>> without qualifying by operation is simply not valid ... actually
>> crm_failcount will refuse to set failcount at all (only clear it).
> 
> Hmm, I didn't realize crm shell supported setting a fail count.
> 
> We discourage setting a fail count attribute directly as of 1.1.17, as
> having a fail count without any failed operation history or last
> failure time can be confusing to users (no failures would show up in
> status, yet failure recovery behavior would be in effect, and failure
> timeouts would not work properly).
> 
> It is possible to set the new per-operation attributes directly, if
> that capability is still desired, but I'm not sure there's a good
> reason to do so.
> 
> crm_failcount is a better choice than crm_attribute for querying and
> clearing fail count attributes, as it will handle summing per-operation 
> fail counts if a resource total fail count is desired. Clearing a fail
> count is now equivalent to crm_resource --cleanup, so it keeps the
> operation history and last failure times consistent.
> 

The problem is that neither "crm resource failcount show" nor "crm
resource failcount delete" work anymore - that is how I hit this issue
in the first place. I do not particularly care whether it is possible to
set failcounts, although I can see it could be useful for testing.

If it is decided to allow setting them, may be crmsh could default to
"monitor" operation if none is explicitly given - that is likely what
most users mean, as during normal run we expect recurring monitor errors.

Although I suppose that crmsh should really be using crm_failcount
which makes support for "set" to be topic of core pacemaker.

> FYI the per-operation fail counts are not particularly useful now, but
> they will make future failure handling enhancements possible, e.g.
> configuring start-failure-is-fatal per resource, or ignoring a certain
> number of monitor failures before recovering while still recovering
> immediately for other operation failures.
> 
>>
>> https://github.com/ClusterLabs/pacemaker/commit/8323616179dc3f8038c6a
>> 69e7323757bd1feacb1#diff-6e58482648938fd488a920b9902daac4
>>
>>
>>>
>>> Cheers,
>>> Kristoffer
>>>

 ha1:~ # rpm -q crmsh
 crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
 ha1:~ # crm_mon -1rf
 Stack: corosync
 Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with
 quorum
 Last updated: Sun Dec 24 10:55:54 2017
 Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on
 ha2

 2 nodes configured
 4 resources configured

 Online: [ ha1 ha2 ]

 Full list of resources:

  stonith-sbd  (stonith:external/sbd): Started ha1
  rsc_dummy_1  (ocf::pacemaker:Dummy): Started ha2
  Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
  Masters: [ ha1 ]
  Slaves: [ ha2 ]

 Migration Summary:
 * Node ha2:
 * Node ha1:
 ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
 ha1:~ # crm_failcount -G -r rsc_Stateful_1
 scope=status  name=fail-count-rsc_Stateful_1 value=1
 ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
 scope=status  name=fail-count-rsc_Stateful_1 value=0
 ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
 ha1:~ # crm_failcount -G -r rsc_Stateful_1
 scope=status  name=fail-count-rsc_Stateful_1 value=1
 ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
 scope=status  name=fail-count-rsc_Stateful_1 value=4
 ha1:~ # cibadmin -Q | grep fail-count
   >>> id="status-1084752129-fail-count-rsc_Stateful_1.monitor_1"
 name="fail-count-rsc_Stateful_1#monitor_1" value="1"/>
   >>> name="fail-count-rsc_Stateful_1" value="4"/>
 ha1:~ #

 ___
 Users mailing list: Users@clusterlabs.org
 

Re: [ClusterLabs] crmsh resource failcount does not appear to work

2018-01-01 Thread Ken Gaillot
On Wed, 2017-12-27 at 14:03 +0300, Andrei Borzenkov wrote:
> On Wed, Dec 27, 2017 at 11:40 AM, Kristoffer Grönlund
>  wrote:
> > 
> > Andrei Borzenkov  writes:
> > 
> > > As far as I can tell, pacemaker acts on failcount attributes
> > > qualified
> > > by operation name, while crm sets/queries unqualified attribute;
> > > I do
> > > not see any syntax to set fail-count for specific operation in
> > > crmsh.
> > 
> > crmsh uses crm_attribute to get the failcount. It could be that
> > this
> > usage has stopped working as of 1.1.17..
> > 
> 
> There is probably misunderstanding. The problem is what attribute is
> used, not how it is set.  crmsh sets (and as far as I can tell always
> set) attribute with name fail-count- while pacemaker
> internally sets and queries attributes with name
> fail-count-#.
> 
> It is possible that this has changed in recent pacemaker versions of
> course ... yep, here is crm_failcount commit that implemented new
> (per-operation) failcounts. Which means "crm resource failcount set"
> without qualifying by operation is simply not valid ... actually
> crm_failcount will refuse to set failcount at all (only clear it).

Hmm, I didn't realize crm shell supported setting a fail count.

We discourage setting a fail count attribute directly as of 1.1.17, as
having a fail count without any failed operation history or last
failure time can be confusing to users (no failures would show up in
status, yet failure recovery behavior would be in effect, and failure
timeouts would not work properly).

It is possible to set the new per-operation attributes directly, if
that capability is still desired, but I'm not sure there's a good
reason to do so.

crm_failcount is a better choice than crm_attribute for querying and
clearing fail count attributes, as it will handle summing per-operation 
fail counts if a resource total fail count is desired. Clearing a fail
count is now equivalent to crm_resource --cleanup, so it keeps the
operation history and last failure times consistent.

FYI the per-operation fail counts are not particularly useful now, but
they will make future failure handling enhancements possible, e.g.
configuring start-failure-is-fatal per resource, or ignoring a certain
number of monitor failures before recovering while still recovering
immediately for other operation failures.

> 
> https://github.com/ClusterLabs/pacemaker/commit/8323616179dc3f8038c6a
> 69e7323757bd1feacb1#diff-6e58482648938fd488a920b9902daac4
> 
> 
> > 
> > Cheers,
> > Kristoffer
> > 
> > > 
> > > ha1:~ # rpm -q crmsh
> > > crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
> > > ha1:~ # crm_mon -1rf
> > > Stack: corosync
> > > Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with
> > > quorum
> > > Last updated: Sun Dec 24 10:55:54 2017
> > > Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on
> > > ha2
> > > 
> > > 2 nodes configured
> > > 4 resources configured
> > > 
> > > Online: [ ha1 ha2 ]
> > > 
> > > Full list of resources:
> > > 
> > >  stonith-sbd  (stonith:external/sbd): Started ha1
> > >  rsc_dummy_1  (ocf::pacemaker:Dummy): Started ha2
> > >  Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
> > >  Masters: [ ha1 ]
> > >  Slaves: [ ha2 ]
> > > 
> > > Migration Summary:
> > > * Node ha2:
> > > * Node ha1:
> > > ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
> > > ha1:~ # crm_failcount -G -r rsc_Stateful_1
> > > scope=status  name=fail-count-rsc_Stateful_1 value=1
> > > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> > > scope=status  name=fail-count-rsc_Stateful_1 value=0
> > > ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
> > > ha1:~ # crm_failcount -G -r rsc_Stateful_1
> > > scope=status  name=fail-count-rsc_Stateful_1 value=1
> > > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> > > scope=status  name=fail-count-rsc_Stateful_1 value=4
> > > ha1:~ # cibadmin -Q | grep fail-count
> > >    > > id="status-1084752129-fail-count-rsc_Stateful_1.monitor_1"
> > > name="fail-count-rsc_Stateful_1#monitor_1" value="1"/>
> > >    > > name="fail-count-rsc_Stateful_1" value="4"/>
> > > ha1:~ #
> > > 
> > > ___
> > > Users mailing list: Users@clusterlabs.org
> > > http://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scra
> > > tch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > > 
> > 
> > --
> > // Kristoffer Grönlund
> > // kgronl...@suse.com
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot 


Re: [ClusterLabs] crmsh resource failcount does not appear to work

2017-12-27 Thread Kristoffer Grönlund
Andrei Borzenkov  writes:

> As far as I can tell, pacemaker acts on failcount attributes qualified
> by operation name, while crm sets/queries unqualified attribute; I do
> not see any syntax to set fail-count for specific operation in crmsh.

crmsh uses crm_attribute to get the failcount. It could be that this
usage has stopped working as of 1.1.17..

Cheers,
Kristoffer

>
> ha1:~ # rpm -q crmsh
> crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
> ha1:~ # crm_mon -1rf
> Stack: corosync
> Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with quorum
> Last updated: Sun Dec 24 10:55:54 2017
> Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on ha2
>
> 2 nodes configured
> 4 resources configured
>
> Online: [ ha1 ha2 ]
>
> Full list of resources:
>
>  stonith-sbd  (stonith:external/sbd): Started ha1
>  rsc_dummy_1  (ocf::pacemaker:Dummy): Started ha2
>  Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
>  Masters: [ ha1 ]
>  Slaves: [ ha2 ]
>
> Migration Summary:
> * Node ha2:
> * Node ha1:
> ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
> ha1:~ # crm_failcount -G -r rsc_Stateful_1
> scope=status  name=fail-count-rsc_Stateful_1 value=1
> ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> scope=status  name=fail-count-rsc_Stateful_1 value=0
> ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
> ha1:~ # crm_failcount -G -r rsc_Stateful_1
> scope=status  name=fail-count-rsc_Stateful_1 value=1
> ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> scope=status  name=fail-count-rsc_Stateful_1 value=4
> ha1:~ # cibadmin -Q | grep fail-count
>id="status-1084752129-fail-count-rsc_Stateful_1.monitor_1"
> name="fail-count-rsc_Stateful_1#monitor_1" value="1"/>
>name="fail-count-rsc_Stateful_1" value="4"/>
> ha1:~ #
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] crmsh resource failcount does not appear to work

2017-12-27 Thread Andrei Borzenkov
On Wed, Dec 27, 2017 at 11:40 AM, Kristoffer Grönlund
 wrote:
>
> Andrei Borzenkov  writes:
>
> > As far as I can tell, pacemaker acts on failcount attributes qualified
> > by operation name, while crm sets/queries unqualified attribute; I do
> > not see any syntax to set fail-count for specific operation in crmsh.
>
> crmsh uses crm_attribute to get the failcount. It could be that this
> usage has stopped working as of 1.1.17..
>

There is probably misunderstanding. The problem is what attribute is
used, not how it is set.  crmsh sets (and as far as I can tell always
set) attribute with name fail-count- while pacemaker
internally sets and queries attributes with name
fail-count-#.

It is possible that this has changed in recent pacemaker versions of
course ... yep, here is crm_failcount commit that implemented new
(per-operation) failcounts. Which means "crm resource failcount set"
without qualifying by operation is simply not valid ... actually
crm_failcount will refuse to set failcount at all (only clear it).

https://github.com/ClusterLabs/pacemaker/commit/8323616179dc3f8038c6a69e7323757bd1feacb1#diff-6e58482648938fd488a920b9902daac4


>
> Cheers,
> Kristoffer
>
> >
> > ha1:~ # rpm -q crmsh
> > crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
> > ha1:~ # crm_mon -1rf
> > Stack: corosync
> > Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with quorum
> > Last updated: Sun Dec 24 10:55:54 2017
> > Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on ha2
> >
> > 2 nodes configured
> > 4 resources configured
> >
> > Online: [ ha1 ha2 ]
> >
> > Full list of resources:
> >
> >  stonith-sbd  (stonith:external/sbd): Started ha1
> >  rsc_dummy_1  (ocf::pacemaker:Dummy): Started ha2
> >  Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
> >  Masters: [ ha1 ]
> >  Slaves: [ ha2 ]
> >
> > Migration Summary:
> > * Node ha2:
> > * Node ha1:
> > ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
> > ha1:~ # crm_failcount -G -r rsc_Stateful_1
> > scope=status  name=fail-count-rsc_Stateful_1 value=1
> > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> > scope=status  name=fail-count-rsc_Stateful_1 value=0
> > ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
> > ha1:~ # crm_failcount -G -r rsc_Stateful_1
> > scope=status  name=fail-count-rsc_Stateful_1 value=1
> > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> > scope=status  name=fail-count-rsc_Stateful_1 value=4
> > ha1:~ # cibadmin -Q | grep fail-count
> >> id="status-1084752129-fail-count-rsc_Stateful_1.monitor_1"
> > name="fail-count-rsc_Stateful_1#monitor_1" value="1"/>
> >> name="fail-count-rsc_Stateful_1" value="4"/>
> > ha1:~ #
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> --
> // Kristoffer Grönlund
> // kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] crmsh resource failcount does not appear to work

2017-12-24 Thread Andrei Borzenkov
As far as I can tell, pacemaker acts on failcount attributes qualified
by operation name, while crm sets/queries unqualified attribute; I do
not see any syntax to set fail-count for specific operation in crmsh.

ha1:~ # rpm -q crmsh
crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
ha1:~ # crm_mon -1rf
Stack: corosync
Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with quorum
Last updated: Sun Dec 24 10:55:54 2017
Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on ha2

2 nodes configured
4 resources configured

Online: [ ha1 ha2 ]

Full list of resources:

 stonith-sbd(stonith:external/sbd): Started ha1
 rsc_dummy_1(ocf::pacemaker:Dummy): Started ha2
 Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
 Masters: [ ha1 ]
 Slaves: [ ha2 ]

Migration Summary:
* Node ha2:
* Node ha1:
ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
ha1:~ # crm_failcount -G -r rsc_Stateful_1
scope=status  name=fail-count-rsc_Stateful_1 value=1
ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
scope=status  name=fail-count-rsc_Stateful_1 value=0
ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
ha1:~ # crm_failcount -G -r rsc_Stateful_1
scope=status  name=fail-count-rsc_Stateful_1 value=1
ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
scope=status  name=fail-count-rsc_Stateful_1 value=4
ha1:~ # cibadmin -Q | grep fail-count
  
  
ha1:~ #

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org