On 10/4/2012 1:56 PM, Jim Klimov wrote:

What if the backup host is down (i.e. the ex-master after the failover)?
Will your failed-over pool accept no writes until both storage machines
are working?

What if internetworking between these two heads has a glitch, and as
a result both of them become masters of their private copies (mirror
halves), and perhaps both even manage to accept writes from clients?

This is the clustering part, which involves "fencing" around the node
which is considered dead, perhaps including a hardware reset request
just to make sure it's dead, before taking over resources it used to
master (STONITH - Shoot The Other Node In The Head). In particular,
clusters suggest that for hearbeats so as to make sure both machines
work indeed, you use at least two separate wires (i.e. serial and LAN)
without active hardware (switches) in-between, separate from data
networking.

this all makes a lot of sense. didn't mean to imply there are no failure modes that can take you down entirely. i was aware of the split-brain issue. i was not sure what richard was getting at...
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to