Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

renayama19661014 Thu, 09 Feb 2017 02:28:19 -0800

Hi Ken,


> 1. Return a "hard" error such as OCF_ERR_ARGS or OCF_ERR_PERM. When
> Pacemaker gets one of these errors from an agent, it will ban the
> resource from that node (until the failure is cleared).

The first suggestion does not work well.

Even if this returns OCF_ERR_ARGS and OCF_ERR_PERM, it seems to be to be 
pre_promote(notify) handling of RA.
Pacemaker does not record the notify(pre promote) error in CIB.

 * https://github.com/ClusterLabs/pacemaker/blob/master/crmd/lrm.c#L2411

Because it is not recorded in CIB, there cannot be the thing that pengine works 
as "hard error".


> 2. Use crm_resource --ban instead. This would ban the resource from that
> node until the user removes the ban with crm_resource --clear (or by
> deleting the ban consraint from the configuration).

The second suggestion works well.
I intend to adopt the second suggestion.

As other methods, you think crm_resource -F to be available, but what do you 
think?
I think that last-failure does not have a problem either to let you handle 
pseudotrouble if it is crm_resource -F.

I think whether crm_resource -F is available, but adopt crm_resource -B because 
RA wants to completely stop pgsql resource.

``` @pgsql RA

pgsql_pre_promote() {
(snip)
            if [ "$cmp_location" != "$my_master_baseline" ]; then
                ocf_exit_reason "My data is newer than new master's one. New 
master's location : $master_baseline"
                exec_with_retry 0 $CRM_RESOURCE -B -r $OCF_RESOURCE_INSTANCE -N 
$NODENAME -Q
                return $OCF_ERR_GENERIC
            fi
(snip)
    CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
    CRM_RESOURCE="${HA_SBIN_DIR}/crm_resource"
```

I test movement a little more and send a patch.


Best Regards,
Hideo Yamauchi.



----- Original Message -----
> From: Ulrich Windl <[email protected]>
> To: [email protected]; [email protected]
> Cc: 
> Date: 2017/2/6, Mon 17:44
> Subject: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.
> 
>>>>  Ken Gaillot <[email protected]> schrieb am 02.02.2017 um 
> 19:33 in Nachricht
> <[email protected]>:
>>  On 02/02/2017 12:23 PM, [email protected] wrote:
>>>  Hi All,
>>> 
>>>  By the next correction, the user was not able to set a value except 
> zero in 
>>  crm_failcount.
>>> 
>>>   - [Fix: tools: implement crm_failcount command-line options correctly]
>>>     - 
>> 
> https://github.com/ClusterLabs/pacemaker/commit/95db10602e8f646eefed335414e40 
>>  a994498cafd#diff-6e58482648938fd488a920b9902daac4
>>> 
>>>  However, pgsql RA sets INFINITY in a script.
>>> 
>>>  ```
>>>  (snip)
>>>      CRM_FAILCOUNT="${HA_SBIN_DIR}/crm_failcount"
>>>  (snip)
>>>      ocf_exit_reason "My data is newer than new master's one. 
> New   master's 
>>  location : $master_baseline"
>>>      exec_with_retry 0 $CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -U 
> $NODENAME -v 
>>  INFINITY
>>>      return $OCF_ERR_GENERIC
>>>  (snip)
>>>  ```
>>> 
>>>  There seems to be the influence only in pgsql somehow or other.
>>> 
>>>  Can you revise it to set a value except zero in crm_failcount?
>>>  We make modifications to use crm_attribute in pgsql RA if we cannot 
> revise 
>>  it.
>>> 
>>>  Best Regards,
>>>  Hideo Yamauchi.
>> 
>>  Hmm, I didn't realize that was used. I changed it because it's not 
> a
>>  good idea to set fail-count without also changing last-failure and
>>  having a failed op in the LRM history. I'll have to think about what 
> the
>>  best alternative is.
> 
> The question also is whether the RA can acieve the same effect otherwise. I 
> thought CRM sets the failcount, not the RA...
> 
>> 
>>  _______________________________________________
>>  Users mailing list: [email protected] 
>>  http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>>  Project Home: http://www.clusterlabs.org 
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>  Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: [email protected]
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

Reply via email to