On 2008-06-27T14:52:08, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > The fail-counts in lrmd will probably be available for > inspection. And they would probably also expire after some time. > What I suggested in the previous messages is actually missing > the time dimension: There should be maximum failures within > a period. > > > So I think that lrmd should always report failures like now, > > and crm/cib should hold all the failed status and make a decision. > > Of course, it could be done like that as well, though that could > make processing in crm much more complex.
The CRM already implements all of the above for failures and restarts, and tracks failcounts. This would be a fairly minor addition, not that I think it would be a good one - RAs shouldn't report failures if there wasn't a failure, period. > > Another case we've met was when we wrote a RA to check for some hardware. > > The status from the hardware rarely failed in very specific timing, > > and retrying the check was just fine. > That's what I often observed with some stonith devices. This is a bug in the monitor operation. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list Pacemaker@clusterlabs.org http://list.clusterlabs.org/mailman/listinfo/pacemaker