On Mon, Apr 23, 2012 at 02:23:20PM -0700, Kyle Lanclos wrote:

> However, this does jog another potential failure mode. Some of our older
> OpenBSD firewalls (going back to OpenBSD) will occasionally (maybe once a
> year) "lose" a network interface. If you logged in at the console of a
> host while it was in this state, the interface would look perfectly normal,
> but it would not pass any traffic. I callously worked around this by
> administratively cycling each network interface on the affected machine(s)
> on a weekly basis.
> 
> If we ran into this failure mode with our CARP firewalls, I'm assuming the
> master would keep right on thinking it was the master, and not attempt to
> demote iteslf.
> 
> While it is certainly helpful for self-demotion of a master to occur,
> it seems reasonable for self-promotion of a slave to also occur.

Without any active probing, like with ifstated, there is no way to
distinguish which uplink is "up but not forwarding". It could be either
the master's, or the backup's, or both. Statistically, for every time you
improve the situation by failing over, there is a time you shoot yourself
in the foot doing the same. If you do nothing, you have the same chances,
and things remain simpler.

With ifstated, you only need to change one side's advskew so their order
reverses, then rely on carp's election process. For instance, run
ifstated only on the master, pinging next hops on all sides, and

 - when any ping fails (the first time), demote:
     increase own advskew above the backup's

 - when all pings succeed (again), promote:
     reset own advskew to the original value (below the backup's)

In your example above, ifstated on the master would detect a ping
failure on one next hop and demote by increasing its advskew above the
backup's.

With preempt enabled, the master would lose election on the other
interface and therefore group failover all interfaces to backup state,
while the backup would win election and group failback all interface to
master state, i.e. self-promotion of the backup is done only through carp
election.

When you fix the interface, ifstated will see all pings succeed again,
and reset advskew. Now the (preferred) master wins election and fails
back.

I don't think there is a case where it's helpful to run scripts on both
the master and the backup. You'd have to be careful to not introduce new
failure cases, for instance when a next hop is unreachable from both.

Daniel

Reply via email to