On 01/18/2012 01:02 PM, Dejan Muhamedagic wrote: >> If I may restate; >> >> Out of band management devices (iLO, IPMI, w/e) have two fatal flaws >> which make them unreliable as sole fence devices; They share their power >> with the host and they (generally) have only one network link. If the >> node's PSU fails, or if the network link/BMC fails, fencing fails. > > I thought we were talking about computers with two PSU. If both > fail, that's already two faults and (our) clusters don't protect > from multiple faults. As for the rest (network connection, etc) > it's not shared with the host and if there's a failure in any of > these components it should be detected by the next monitor > operation on the stonith resource giving enough time to repair. > In short, a fencing device is not a SPOF.
I was talking about the needs for a fence to succeed. So a node as RPSU, with each cable going to a different PDU. For the fence method to succeed, both actions must succeed (confirmed switching off both outlets). So I was talking (in this case) about the actual fence action succeeding or failing. >> A PDU as a backup protects against this, but is not ideal as it can't >> confirm a node's power state. > > Why is that? If you ask PDU to disconnect power to the host and > that command succeeds how high is the probability that the CPU is > still running? Or am I missing something? Two cases where this fails, both pebcak, but still real. One; RPSU where only one link was configured (or 2 or 3, whatever). Two; An admin moves the power cable to another outlet sometime between original configuration/testing and the need to fence. Never under-estimate the power of stupidity or the dangers of working late. :) >> Red Hat clusters call these "Fence Methods", with each "method" >> containing one or more fence "devices". With the IPMI, there is only one >> device. With Redundant PSUs across two PDUs, you have two devices in the >> "method". All devices in a method must succeed for the fence method to >> succeed. >> >> It would, if nothing else, help people migrating to pacemaker from rhcs >> if similar names were used. > > Pacemaker is already using terminology different from RHCS. I'm > not at all against using similar (or same) names, but it's > too late for that. Introducing RHCS specific names to co-exist > with Pacemaker names... well, how is that going to help? > > Thanks, > > Dejan If it's set, then it is set and there is no more discussion to be had. To answer your question though; Come EL7 (or whenever Pacemaker gains full support), as rgmanager is phased out, all the existing rhcs clusters will need to be migrated. More prescient; The admins who managed those cluster will need to be retrained. I would argue that everything that can be done to smooth that migration should be done, including seemingly trivial things like naming conventions. Cheers -- Digimer E-Mail: digi...@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org