Sent from my iPad
> On 11 Feb 2015, at 14:32, Alan Robertson <al...@unix.sh> wrote: > >> On 02/10/2015 03:33 PM, Lars Ellenberg wrote: >>> On Tue, Feb 10, 2015 at 02:45:06PM -0700, Alan Robertson wrote: >>>> On 02/10/2015 02:09 PM, Lars Ellenberg wrote: >>>> On Tue, Feb 10, 2015 at 09:44:40PM +0100, Lars Ellenberg wrote: >>>>>> Then we take it from there, >>>>>> and do the necessary overhaul of this OCF RA API spec. >>>>>> >>>>>> I will followup with a list of items that need to be addressed >>>>>> (as I remember them from the discussions we had in Brno). >>>> * reserve new exit codes for a probe/monitor action >>>> >>>> "running (Started/Slave), but degraded" >>>> "running (Master), but degraded" >>> Conventional monitoring systems also provide statuses which indicate a >>> marginal condition - "working, but barely" kind of thing. >> Can you give an example of something that is >> working properly >> working "degraded" >> working "barely" >> >> Thing is: there is usually nothing pacemaker can do about this >> (but to record that status in the CIB, and thus make it digestible >> by crm_mon, and all sorts of UIs). >> Which means we do not really benefit from that distinction. >> >> What we want to achieve by introducing these additional exit codes >> is that an operator who only occasionally checks crm_mon output >> or equivalent, and has no proper alerting via additional tactical >> monitoring, will not be misled by a resource state of "Running". >> >> Example: >> A DRBD Primary lost its disk for some reason. >> Right now it would still show up as "Running Master" in crm_mon. >> Two days later, the network (or the peer) has some hickup, >> and the resource, and everything depending on it, fails. >> >> Had the operator seen "DEGRADED" in crm_mon, >> he might have taken action two days earlier. >> >> I don't see how an additional "DEGRADED HELP ME URGENTLY" >> would improve the situation further. >> >> We can already "enrich" the feedback from the resource agent via free >> form text messages, so we could have more than the exit code alone. >> >> Which in fact means that we could also just forget about the additional >> exit codes, and instead specify that some not-so-free form text message >> would be recognized as "internal health state" of the resource. > > I'm not using the OCF RA with Pacemaker. I use it for alerting in > Assimilation. Free-form would work, but then it couldn't be free form - > it would have to have some structure to it ;-). Certainly exit codes > would be useful here to allow the free form text to be free form. > > If you made the "free form" text to be JSON, then you could eliminate > exit codes altogether - but I think the strict structure of exit codes > serves a purpose of making it easier to decipher the meaning of what was > observed. Exactly. Exit codes are authoritative, the text is optional and intended to be purely informational for the benefit of the end user. > _______________________________________________ > OCF mailing list > OCF@lists.community.tummy.com > http://lists.community.tummy.com/mailman/listinfo/ocf _______________________________________________ OCF mailing list OCF@lists.community.tummy.com http://lists.community.tummy.com/mailman/listinfo/ocf