Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 04:15 PM, Lars Ellenberg wrote:
> On Mon, Mar 06, 2017 at 12:35:18PM -0600, Ken Gaillot wrote:
>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>> index 724edb7..39a7dd1 100644
>> --- a/lrmd/lrmd.c
>> +++ b/lrmd/lrmd.c
>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const 
>> char *stdout_data)
>>  static int
>>  ocf2uniform_rc(int rc)
>>  {
>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>> -return PCMK_OCF_UNKNOWN_ERROR;

 Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
 be the high end.

 Lars, do you want to test that?
>>>
>>> Why would we want to filter at all, then?
>>>
>>> I get it that we may want to map non-ocf agent exit codes
>>> into the "ocf" range,
>>> but why mask exit codes from "ocf" agents at all (in lrmd)?
>>
>> It's probably unnecessarily paranoid, but I guess the idea is to check
>> that the agent at least returns something in the expected range for OCF
> 
> Well, yes. But, if we are going to allow the range 0 to 199,
> I don't see any reason to hide the range 200 to 255.

Ideally, all non-LSB OCF return codes should be in the LSB spec's
150-199 range reserved for application use, since OCF aims for some
degree of LSB compatibility. RUNNING_MASTER and FAILED_MASTER got into
the wild before that policy was set, so they're exceptions. But it's
reasonable to require that all future OCF codes fall into that range.

>> (perhaps it's not complying with the spec, or complying with a newer
>> version of the spec than we can handle).
> 
> Or, as in the case of PCMK_OCF_DEGRADED, complying with a newer version
> of the spec that probably would have been handled fine, if it wasn't for
> the unneccesary paranoia ;-)
> 
> Lars

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Lars Ellenberg
On Mon, Mar 06, 2017 at 12:35:18PM -0600, Ken Gaillot wrote:
>  diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>  index 724edb7..39a7dd1 100644
>  --- a/lrmd/lrmd.c
>  +++ b/lrmd/lrmd.c
>  @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const 
>  char *stdout_data)
>   static int
>   ocf2uniform_rc(int rc)
>   {
>  -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>  -return PCMK_OCF_UNKNOWN_ERROR;
> >>
> >> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
> >> be the high end.
> >>
> >> Lars, do you want to test that?
> > 
> > Why would we want to filter at all, then?
> > 
> > I get it that we may want to map non-ocf agent exit codes
> > into the "ocf" range,
> > but why mask exit codes from "ocf" agents at all (in lrmd)?
> 
> It's probably unnecessarily paranoid, but I guess the idea is to check
> that the agent at least returns something in the expected range for OCF

Well, yes. But, if we are going to allow the range 0 to 199,
I don't see any reason to hide the range 200 to 255.

> (perhaps it's not complying with the spec, or complying with a newer
> version of the spec than we can handle).

Or, as in the case of PCMK_OCF_DEGRADED, complying with a newer version
of the spec that probably would have been handled fine, if it wasn't for
the unneccesary paranoia ;-)

Lars


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 10:55 AM, Lars Ellenberg wrote:
> On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote:
>> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
>>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>>>  wrote:
 When I recently tried to make use of the DEGRADED monitoring results,
 I found out that it does still not work.

 Because LRMD choses to filter them in ocf2uniform_rc(),
 and maps them to PCMK_OCF_UNKNOWN_ERROR.

 See patch suggestion below.

 It also filters away the other "special" rc values.
 Do we really not want to see them in crmd/pengine?
>>>
>>> I would think we do.
> 
 Note: I did build it, but did not use this yet,
 so I have no idea if the rest of the implementation of the DEGRADED
 stuff works as intended or if there are other things missing as well.
>>>
>>> failcount might be the other place that needs some massaging.
>>> specifically, not incrementing it when a degraded rc comes through
>>
>> I think that's already taken care of.
>>
 Thougts?\
>>>
>>> looks good to me
>>>

 diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
 index 724edb7..39a7dd1 100644
 --- a/lrmd/lrmd.c
 +++ b/lrmd/lrmd.c
 @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
 *stdout_data)
  static int
  ocf2uniform_rc(int rc)
  {
 -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
 -return PCMK_OCF_UNKNOWN_ERROR;
>>
>> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
>> be the high end.
>>
>> Lars, do you want to test that?
> 
> Why would we want to filter at all, then?
> 
> I get it that we may want to map non-ocf agent exit codes
> into the "ocf" range,
> but why mask exit codes from "ocf" agents at all (in lrmd)?
> 
> Lars

It's probably unnecessarily paranoid, but I guess the idea is to check
that the agent at least returns something in the expected range for OCF
(perhaps it's not complying with the spec, or complying with a newer
version of the spec than we can handle).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Lars Ellenberg
On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote:
> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
> > On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
> >  wrote:
> >> When I recently tried to make use of the DEGRADED monitoring results,
> >> I found out that it does still not work.
> >>
> >> Because LRMD choses to filter them in ocf2uniform_rc(),
> >> and maps them to PCMK_OCF_UNKNOWN_ERROR.
> >>
> >> See patch suggestion below.
> >>
> >> It also filters away the other "special" rc values.
> >> Do we really not want to see them in crmd/pengine?
> > 
> > I would think we do.

> >> Note: I did build it, but did not use this yet,
> >> so I have no idea if the rest of the implementation of the DEGRADED
> >> stuff works as intended or if there are other things missing as well.
> > 
> > failcount might be the other place that needs some massaging.
> > specifically, not incrementing it when a degraded rc comes through
> 
> I think that's already taken care of.
> 
> >> Thougts?\
> > 
> > looks good to me
> > 
> >>
> >> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
> >> index 724edb7..39a7dd1 100644
> >> --- a/lrmd/lrmd.c
> >> +++ b/lrmd/lrmd.c
> >> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
> >> *stdout_data)
> >>  static int
> >>  ocf2uniform_rc(int rc)
> >>  {
> >> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
> >> -return PCMK_OCF_UNKNOWN_ERROR;
> 
> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
> be the high end.
> 
> Lars, do you want to test that?

Why would we want to filter at all, then?

I get it that we may want to map non-ocf agent exit codes
into the "ocf" range,
but why mask exit codes from "ocf" agents at all (in lrmd)?

Lars


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-02 Thread Ken Gaillot
On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>  wrote:
>> When I recently tried to make use of the DEGRADED monitoring results,
>> I found out that it does still not work.
>>
>> Because LRMD choses to filter them in ocf2uniform_rc(),
>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>
>> See patch suggestion below.
>>
>> It also filters away the other "special" rc values.
>> Do we really not want to see them in crmd/pengine?
> 
> I would think we do.
> 
>> Why does LRMD think it needs to outsmart the pengine?
> 
> Because the person that implemented the feature incorrectly assumed
> the rc would be passed back unmolested.
> 
>>
>> Note: I did build it, but did not use this yet,
>> so I have no idea if the rest of the implementation of the DEGRADED
>> stuff works as intended or if there are other things missing as well.
> 
> failcount might be the other place that needs some massaging.
> specifically, not incrementing it when a degraded rc comes through

I think that's already taken care of.

>> Thougts?\
> 
> looks good to me
> 
>>
>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>> index 724edb7..39a7dd1 100644
>> --- a/lrmd/lrmd.c
>> +++ b/lrmd/lrmd.c
>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
>> *stdout_data)
>>  static int
>>  ocf2uniform_rc(int rc)
>>  {
>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>> -return PCMK_OCF_UNKNOWN_ERROR;

Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
be the high end.

Lars, do you want to test that?

>> +switch (rc) {
>> +default:
>> +   return PCMK_OCF_UNKNOWN_ERROR;
>> +
>> +case PCMK_OCF_OK:
>> +case PCMK_OCF_UNKNOWN_ERROR:
>> +case PCMK_OCF_INVALID_PARAM:
>> +case PCMK_OCF_UNIMPLEMENT_FEATURE:
>> +case PCMK_OCF_INSUFFICIENT_PRIV:
>> +case PCMK_OCF_NOT_INSTALLED:
>> +case PCMK_OCF_NOT_CONFIGURED:
>> +case PCMK_OCF_NOT_RUNNING:
>> +case PCMK_OCF_RUNNING_MASTER:
>> +case PCMK_OCF_FAILED_MASTER:
>> +
>> +case PCMK_OCF_DEGRADED:
>> +case PCMK_OCF_DEGRADED_MASTER:
>> +   return rc;
>> +
>> +#if 0
>> +   /* What about these?? */
> 
> yes, these should get passed back as-is too
> 
>> +/* 150-199 reserved for application use */
>> +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by 
>> disconnection of the LRM API to a local or remote node */
>> +
>> +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */
>> +PCMK_OCF_UNKNOWN   = 193, /* State of the service is unknown - used 
>> for recording in-flight operations */
>> +PCMK_OCF_SIGNAL= 194,
>> +PCMK_OCF_NOT_SUPPORTED = 195,
>> +PCMK_OCF_PENDING   = 196,
>> +PCMK_OCF_CANCELLED = 197,
>> +PCMK_OCF_TIMEOUT   = 198,
>> +PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
>> +#endif
>>  }
>> -
>> -return rc;
>>  }
>>
>>  static int

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-01 Thread Andrew Beekhof
On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
 wrote:
> When I recently tried to make use of the DEGRADED monitoring results,
> I found out that it does still not work.
>
> Because LRMD choses to filter them in ocf2uniform_rc(),
> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>
> See patch suggestion below.
>
> It also filters away the other "special" rc values.
> Do we really not want to see them in crmd/pengine?

I would think we do.

> Why does LRMD think it needs to outsmart the pengine?

Because the person that implemented the feature incorrectly assumed
the rc would be passed back unmolested.

>
> Note: I did build it, but did not use this yet,
> so I have no idea if the rest of the implementation of the DEGRADED
> stuff works as intended or if there are other things missing as well.

failcount might be the other place that needs some massaging.
specifically, not incrementing it when a degraded rc comes through

>
> Thougts?\

looks good to me

>
> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
> index 724edb7..39a7dd1 100644
> --- a/lrmd/lrmd.c
> +++ b/lrmd/lrmd.c
> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
> *stdout_data)
>  static int
>  ocf2uniform_rc(int rc)
>  {
> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
> -return PCMK_OCF_UNKNOWN_ERROR;
> +switch (rc) {
> +default:
> +   return PCMK_OCF_UNKNOWN_ERROR;
> +
> +case PCMK_OCF_OK:
> +case PCMK_OCF_UNKNOWN_ERROR:
> +case PCMK_OCF_INVALID_PARAM:
> +case PCMK_OCF_UNIMPLEMENT_FEATURE:
> +case PCMK_OCF_INSUFFICIENT_PRIV:
> +case PCMK_OCF_NOT_INSTALLED:
> +case PCMK_OCF_NOT_CONFIGURED:
> +case PCMK_OCF_NOT_RUNNING:
> +case PCMK_OCF_RUNNING_MASTER:
> +case PCMK_OCF_FAILED_MASTER:
> +
> +case PCMK_OCF_DEGRADED:
> +case PCMK_OCF_DEGRADED_MASTER:
> +   return rc;
> +
> +#if 0
> +   /* What about these?? */

yes, these should get passed back as-is too

> +/* 150-199 reserved for application use */
> +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by 
> disconnection of the LRM API to a local or remote node */
> +
> +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */
> +PCMK_OCF_UNKNOWN   = 193, /* State of the service is unknown - used 
> for recording in-flight operations */
> +PCMK_OCF_SIGNAL= 194,
> +PCMK_OCF_NOT_SUPPORTED = 195,
> +PCMK_OCF_PENDING   = 196,
> +PCMK_OCF_CANCELLED = 197,
> +PCMK_OCF_TIMEOUT   = 198,
> +PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
> +#endif
>  }
> -
> -return rc;
>  }
>
>  static int
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org