On 2016-08-04 19:33, Digimer wrote:
On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
On 2016-08-04 19:03, Digimer wrote:
On 04/08/16 06:56 PM, Dan Swartzendruber wrote:
I'm setting up an HA NFS server to serve up storage to a couple of
vsphere hosts. I have a virtual IP, and it depends on a ZFS
resource
agent which imports or exports a pool. So far, with stonith
disabled,
it all works perfectly. I was dubious about a 2-node solution, so I
created a 3rd node which runs as a virtual machine on one of the
hosts.
All it is for is quorum. So, looking at fencing next. The primary
server is a poweredge R905, which has DRAC for fencing. The backup
storage node is a Supermicro X9-SCL-F (with IPMI). So I would be
using
the DRAC agent for the former and the ipmilan for the latter? I was
reading about location constraints, where you tell each instance of
the
fencing agent not to run on the node that would be getting fenced.
So,
my first thought was to configure the drac agent and tell it not to
fence node 1, and configure the ipmilan agent and tell it not to
fence
node 2. The thing is, there is no agent available for the quorum
node.
Would it make more sense instead to tell the drac agent to only run
on
node 2, and the ipmilan agent to only run on node 1? Thanks!
This is a common mistake.
Fencing and quorum solve different problems and are not
interchangeable.
In short;
Fencing is a tool when things go wrong.
Quorum is a tool when things are working.
The only impact that having quorum has with regard to fencing is that
it
avoids a scenario when both nodes try to fence each other and the
faster
one wins (which is itself OK). Even then, you can add 'delay=15' the
node you want to win and it will win is such a case. In the old days,
it
would also prevent a fence loop if you started the cluster on boot
and
comms were down. Now though, you set 'wait_for_all' and you won't get
a
fence loop, so that solves that.
Said another way; Quorum is optional, fencing is not (people often
get
that backwards).
As for DRAC vs IPMI, no, they are not two things. In fact, I am
pretty
certain that fence_drac is a symlink to fence_ipmilan. All DRAC is
(same
with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the
fence
action; rebooting the node, works via the basic IPMI standard using
the
DRAC's BMC.
To do proper redundant fencing, which is a great idea, you want
something like switched PDUs. This is how we do it (with two node
clusters). IPMI first, and if that fails, a pair of PDUs (one for
each
PSU, each PDU going to independent UPSes) as backup.
Thanks for the quick response. I didn't mean to give the impression
that I didn't know the different between quorum and fencing. The only
reason I (currently) have the quorum node was to prevent a deathmatch
(which I had read about elsewhere.) If it is as simple as adding a
delay as you describe, I'm inclined to go that route. At least on
CentOS7, fence_ipmilan and fence_drac are not the same. e.g. they are
both python scripts that are totally different.
The delay is perfectly fine. We've shipped dozens of two-node systems
over the last five or so years and all were 2-node and none have had
trouble. Where node failures have occurred, fencing operated properly
and services were recovered. So in my opinion, in the interest of
minimizing complexity, I recommend the two-node approach.
As for the two agents not being symlinked, OK. It still doesn't change
the core point through that both fence_ipmilan and fence_drac would be
acting on the same target.
Note; If you lose power to the mainboard (which we've seen, failed
mainboard voltage regulator did this once), you lose the IPMI (DRAC)
BMC. This scenario will leave your cluster blocked without an external
secondary fence method, like switched PDUs.
cheers
Thanks!
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org