On 02/07/2013, at 2:58 AM, Digimer <li...@alteeve.ca> wrote: > On 07/01/2013 12:43 PM, Lars Marowsky-Bree wrote: >> On 2013-07-01T11:53:29, Digimer <li...@alteeve.ca> wrote: >> >>> You are right, of course. Imagine though that the IPMI BMC's network >>> port or cable could have silently failed some time before the node >>> failed. >> >> Pacemaker can monitor the fencing device if you configure a monitor >> action for it, for exactly this reason. > > My *very* initial testing of op monitor="30" didn't detect the failure > or recovery of the fence device.
That might come down to the quality of the monitor action in the agent though. > I may very well have screwed something > up though... I still have a lot to learn. > > As an aside, RHEL 6.4 introduce 'fence_check' which will do the same if > you cron/script it. > >>> Yes, this is two simultaneous failues so not an overall SPoF, but >>> likely enough that it should be addressed. >> >> Yes ;-) >> >> While it's conceivable that the *fencing* network switch doesn't have a >> dual power supply and thus is affected by the outage (and very very few >> management boards have two network ports so that you could connect them >> to two), the answer here could be to - at least for two node scenarios - >> just connect the management ports to a dedicated NIC on the other node. >> (A ring topology for multiple nodes is conceivable.) >> >> Then a single power failure could well cause both methods to fail. >> >> Still, it's a double failure that we, officially, don't protect against >> in all scenarios (the power failure + whatever causes the fence). > > I protect against this scenario by using two switches and plugging IPMI > into the first switch and the PDUs into the second switch. All nodes use > bonded links with a leg in either switch. So the failure of an entire > switch will not cause an interruption or the loss of fencing capabilities. > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org