Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On 03/06/2017 04:15 PM, Lars Ellenberg wrote: > On Mon, Mar 06, 2017 at 12:35:18PM -0600, Ken Gaillot wrote: >> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c >> index 724edb7..39a7dd1 100644 >> --- a/lrmd/lrmd.c >> +++ b/lrmd/lrmd.c >> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const >> char *stdout_data) >> static int >> ocf2uniform_rc(int rc) >> { >> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { >> -return PCMK_OCF_UNKNOWN_ERROR; Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to be the high end. Lars, do you want to test that? >>> >>> Why would we want to filter at all, then? >>> >>> I get it that we may want to map non-ocf agent exit codes >>> into the "ocf" range, >>> but why mask exit codes from "ocf" agents at all (in lrmd)? >> >> It's probably unnecessarily paranoid, but I guess the idea is to check >> that the agent at least returns something in the expected range for OCF > > Well, yes. But, if we are going to allow the range 0 to 199, > I don't see any reason to hide the range 200 to 255. Ideally, all non-LSB OCF return codes should be in the LSB spec's 150-199 range reserved for application use, since OCF aims for some degree of LSB compatibility. RUNNING_MASTER and FAILED_MASTER got into the wild before that policy was set, so they're exceptions. But it's reasonable to require that all future OCF codes fall into that range. >> (perhaps it's not complying with the spec, or complying with a newer >> version of the spec than we can handle). > > Or, as in the case of PCMK_OCF_DEGRADED, complying with a newer version > of the spec that probably would have been handled fine, if it wasn't for > the unneccesary paranoia ;-) > > Lars ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On Mon, Mar 06, 2017 at 12:35:18PM -0600, Ken Gaillot wrote: > diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c > index 724edb7..39a7dd1 100644 > --- a/lrmd/lrmd.c > +++ b/lrmd/lrmd.c > @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const > char *stdout_data) > static int > ocf2uniform_rc(int rc) > { > -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { > -return PCMK_OCF_UNKNOWN_ERROR; > >> > >> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to > >> be the high end. > >> > >> Lars, do you want to test that? > > > > Why would we want to filter at all, then? > > > > I get it that we may want to map non-ocf agent exit codes > > into the "ocf" range, > > but why mask exit codes from "ocf" agents at all (in lrmd)? > > It's probably unnecessarily paranoid, but I guess the idea is to check > that the agent at least returns something in the expected range for OCF Well, yes. But, if we are going to allow the range 0 to 199, I don't see any reason to hide the range 200 to 255. > (perhaps it's not complying with the spec, or complying with a newer > version of the spec than we can handle). Or, as in the case of PCMK_OCF_DEGRADED, complying with a newer version of the spec that probably would have been handled fine, if it wasn't for the unneccesary paranoia ;-) Lars ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On 03/06/2017 10:55 AM, Lars Ellenberg wrote: > On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote: >> On 03/01/2017 05:28 PM, Andrew Beekhof wrote: >>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg >>> wrote: When I recently tried to make use of the DEGRADED monitoring results, I found out that it does still not work. Because LRMD choses to filter them in ocf2uniform_rc(), and maps them to PCMK_OCF_UNKNOWN_ERROR. See patch suggestion below. It also filters away the other "special" rc values. Do we really not want to see them in crmd/pengine? >>> >>> I would think we do. > Note: I did build it, but did not use this yet, so I have no idea if the rest of the implementation of the DEGRADED stuff works as intended or if there are other things missing as well. >>> >>> failcount might be the other place that needs some massaging. >>> specifically, not incrementing it when a degraded rc comes through >> >> I think that's already taken care of. >> Thougts?\ >>> >>> looks good to me >>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c index 724edb7..39a7dd1 100644 --- a/lrmd/lrmd.c +++ b/lrmd/lrmd.c @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char *stdout_data) static int ocf2uniform_rc(int rc) { -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { -return PCMK_OCF_UNKNOWN_ERROR; >> >> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to >> be the high end. >> >> Lars, do you want to test that? > > Why would we want to filter at all, then? > > I get it that we may want to map non-ocf agent exit codes > into the "ocf" range, > but why mask exit codes from "ocf" agents at all (in lrmd)? > > Lars It's probably unnecessarily paranoid, but I guess the idea is to check that the agent at least returns something in the expected range for OCF (perhaps it's not complying with the spec, or complying with a newer version of the spec than we can handle). ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote: > On 03/01/2017 05:28 PM, Andrew Beekhof wrote: > > On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg > > wrote: > >> When I recently tried to make use of the DEGRADED monitoring results, > >> I found out that it does still not work. > >> > >> Because LRMD choses to filter them in ocf2uniform_rc(), > >> and maps them to PCMK_OCF_UNKNOWN_ERROR. > >> > >> See patch suggestion below. > >> > >> It also filters away the other "special" rc values. > >> Do we really not want to see them in crmd/pengine? > > > > I would think we do. > >> Note: I did build it, but did not use this yet, > >> so I have no idea if the rest of the implementation of the DEGRADED > >> stuff works as intended or if there are other things missing as well. > > > > failcount might be the other place that needs some massaging. > > specifically, not incrementing it when a degraded rc comes through > > I think that's already taken care of. > > >> Thougts?\ > > > > looks good to me > > > >> > >> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c > >> index 724edb7..39a7dd1 100644 > >> --- a/lrmd/lrmd.c > >> +++ b/lrmd/lrmd.c > >> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char > >> *stdout_data) > >> static int > >> ocf2uniform_rc(int rc) > >> { > >> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { > >> -return PCMK_OCF_UNKNOWN_ERROR; > > Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to > be the high end. > > Lars, do you want to test that? Why would we want to filter at all, then? I get it that we may want to map non-ocf agent exit codes into the "ocf" range, but why mask exit codes from "ocf" agents at all (in lrmd)? Lars ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On 03/01/2017 05:28 PM, Andrew Beekhof wrote: > On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg > wrote: >> When I recently tried to make use of the DEGRADED monitoring results, >> I found out that it does still not work. >> >> Because LRMD choses to filter them in ocf2uniform_rc(), >> and maps them to PCMK_OCF_UNKNOWN_ERROR. >> >> See patch suggestion below. >> >> It also filters away the other "special" rc values. >> Do we really not want to see them in crmd/pengine? > > I would think we do. > >> Why does LRMD think it needs to outsmart the pengine? > > Because the person that implemented the feature incorrectly assumed > the rc would be passed back unmolested. > >> >> Note: I did build it, but did not use this yet, >> so I have no idea if the rest of the implementation of the DEGRADED >> stuff works as intended or if there are other things missing as well. > > failcount might be the other place that needs some massaging. > specifically, not incrementing it when a degraded rc comes through I think that's already taken care of. >> Thougts?\ > > looks good to me > >> >> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c >> index 724edb7..39a7dd1 100644 >> --- a/lrmd/lrmd.c >> +++ b/lrmd/lrmd.c >> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char >> *stdout_data) >> static int >> ocf2uniform_rc(int rc) >> { >> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { >> -return PCMK_OCF_UNKNOWN_ERROR; Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to be the high end. Lars, do you want to test that? >> +switch (rc) { >> +default: >> + return PCMK_OCF_UNKNOWN_ERROR; >> + >> +case PCMK_OCF_OK: >> +case PCMK_OCF_UNKNOWN_ERROR: >> +case PCMK_OCF_INVALID_PARAM: >> +case PCMK_OCF_UNIMPLEMENT_FEATURE: >> +case PCMK_OCF_INSUFFICIENT_PRIV: >> +case PCMK_OCF_NOT_INSTALLED: >> +case PCMK_OCF_NOT_CONFIGURED: >> +case PCMK_OCF_NOT_RUNNING: >> +case PCMK_OCF_RUNNING_MASTER: >> +case PCMK_OCF_FAILED_MASTER: >> + >> +case PCMK_OCF_DEGRADED: >> +case PCMK_OCF_DEGRADED_MASTER: >> + return rc; >> + >> +#if 0 >> + /* What about these?? */ > > yes, these should get passed back as-is too > >> +/* 150-199 reserved for application use */ >> +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by >> disconnection of the LRM API to a local or remote node */ >> + >> +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */ >> +PCMK_OCF_UNKNOWN = 193, /* State of the service is unknown - used >> for recording in-flight operations */ >> +PCMK_OCF_SIGNAL= 194, >> +PCMK_OCF_NOT_SUPPORTED = 195, >> +PCMK_OCF_PENDING = 196, >> +PCMK_OCF_CANCELLED = 197, >> +PCMK_OCF_TIMEOUT = 198, >> +PCMK_OCF_OTHER_ERROR = 199, /* Keep the same codes as PCMK_LSB */ >> +#endif >> } >> - >> -return rc; >> } >> >> static int ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg wrote: > When I recently tried to make use of the DEGRADED monitoring results, > I found out that it does still not work. > > Because LRMD choses to filter them in ocf2uniform_rc(), > and maps them to PCMK_OCF_UNKNOWN_ERROR. > > See patch suggestion below. > > It also filters away the other "special" rc values. > Do we really not want to see them in crmd/pengine? I would think we do. > Why does LRMD think it needs to outsmart the pengine? Because the person that implemented the feature incorrectly assumed the rc would be passed back unmolested. > > Note: I did build it, but did not use this yet, > so I have no idea if the rest of the implementation of the DEGRADED > stuff works as intended or if there are other things missing as well. failcount might be the other place that needs some massaging. specifically, not incrementing it when a degraded rc comes through > > Thougts?\ looks good to me > > diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c > index 724edb7..39a7dd1 100644 > --- a/lrmd/lrmd.c > +++ b/lrmd/lrmd.c > @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char > *stdout_data) > static int > ocf2uniform_rc(int rc) > { > -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) { > -return PCMK_OCF_UNKNOWN_ERROR; > +switch (rc) { > +default: > + return PCMK_OCF_UNKNOWN_ERROR; > + > +case PCMK_OCF_OK: > +case PCMK_OCF_UNKNOWN_ERROR: > +case PCMK_OCF_INVALID_PARAM: > +case PCMK_OCF_UNIMPLEMENT_FEATURE: > +case PCMK_OCF_INSUFFICIENT_PRIV: > +case PCMK_OCF_NOT_INSTALLED: > +case PCMK_OCF_NOT_CONFIGURED: > +case PCMK_OCF_NOT_RUNNING: > +case PCMK_OCF_RUNNING_MASTER: > +case PCMK_OCF_FAILED_MASTER: > + > +case PCMK_OCF_DEGRADED: > +case PCMK_OCF_DEGRADED_MASTER: > + return rc; > + > +#if 0 > + /* What about these?? */ yes, these should get passed back as-is too > +/* 150-199 reserved for application use */ > +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by > disconnection of the LRM API to a local or remote node */ > + > +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */ > +PCMK_OCF_UNKNOWN = 193, /* State of the service is unknown - used > for recording in-flight operations */ > +PCMK_OCF_SIGNAL= 194, > +PCMK_OCF_NOT_SUPPORTED = 195, > +PCMK_OCF_PENDING = 196, > +PCMK_OCF_CANCELLED = 197, > +PCMK_OCF_TIMEOUT = 198, > +PCMK_OCF_OTHER_ERROR = 199, /* Keep the same codes as PCMK_LSB */ > +#endif > } > - > -return rc; > } > > static int > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org