Re: Simultaneous CARP failover for multiple interfaces

2012-04-27 Thread Kyle Lanclos
I quoted from the OpenBSD FAQ:
 net.inet.carp.preempt Allow hosts within a redundancy group
   that have a better advbase and advskew
   to preempt the master. In addition, this
   option also enables failing over a group
   of interfaces together in the event that
   one interface goes down. If one physical
   CARP-enabled interface goes down, CARP
   will increase the demotion counter,
   carpdemote, by 1 on interface groups that
   the carp(4) interface is a member of, in
   effect causing all group members to fail-
   over together. 
 
 I did establish an interface group for my two CARP interfaces, but I did
 not do my failover tests while it was in that state. As I said, I clearly
 need to re-do my tests.

I re-did my tests this morning, and yes indeed, both CARP interfaces in
the CARP group switch automatically between the BACKUP and MASTER states
when I do abusive things to one of the connections. I say automatically
instead of simultaneously because there was a bit of a delay while the
reconnected link negotiated speed. I blame Cisco for that.

Thanks for all the assistance, I found this exchange to be quite helpful.

--Kyle


Re: Simultaneous CARP failover for multiple interfaces

2012-04-24 Thread Kyle Lanclos
Karl O. Pinc wrote:
 I didn't notice _any_ reference to pfsync in the original
 post.  Perhaps this is part of the problem?

I originally wrote:
 I have a pair of OpenBSD firewall/routers in a reasonably vanilla
 pf + pfsync + CARP configuration...

It sounds like using 'defer' may allow pf + pfsync to handle the issues
resulting from asymmetric routing of packets, as long as the asymmetry
is fully contained within the pfsync'd hosts.

I apologize if I gave too much airtime to the pf + pfsync aspects of
what I was trying to resolve, we largely worked around those by enabling
carp preemption.

--Kyle


Re: Simultaneous CARP failover for multiple interfaces

2012-04-24 Thread Daniel Hartmeier
On Mon, Apr 23, 2012 at 02:23:20PM -0700, Kyle Lanclos wrote:

 However, this does jog another potential failure mode. Some of our older
 OpenBSD firewalls (going back to OpenBSD) will occasionally (maybe once a
 year) lose a network interface. If you logged in at the console of a
 host while it was in this state, the interface would look perfectly normal,
 but it would not pass any traffic. I callously worked around this by
 administratively cycling each network interface on the affected machine(s)
 on a weekly basis.
 
 If we ran into this failure mode with our CARP firewalls, I'm assuming the
 master would keep right on thinking it was the master, and not attempt to
 demote iteslf.
 
 While it is certainly helpful for self-demotion of a master to occur,
 it seems reasonable for self-promotion of a slave to also occur.

Without any active probing, like with ifstated, there is no way to
distinguish which uplink is up but not forwarding. It could be either
the master's, or the backup's, or both. Statistically, for every time you
improve the situation by failing over, there is a time you shoot yourself
in the foot doing the same. If you do nothing, you have the same chances,
and things remain simpler.

With ifstated, you only need to change one side's advskew so their order
reverses, then rely on carp's election process. For instance, run
ifstated only on the master, pinging next hops on all sides, and

 - when any ping fails (the first time), demote:
 increase own advskew above the backup's

 - when all pings succeed (again), promote:
 reset own advskew to the original value (below the backup's)

In your example above, ifstated on the master would detect a ping
failure on one next hop and demote by increasing its advskew above the
backup's.

With preempt enabled, the master would lose election on the other
interface and therefore group failover all interfaces to backup state,
while the backup would win election and group failback all interface to
master state, i.e. self-promotion of the backup is done only through carp
election.

When you fix the interface, ifstated will see all pings succeed again,
and reset advskew. Now the (preferred) master wins election and fails
back.

I don't think there is a case where it's helpful to run scripts on both
the master and the backup. You'd have to be careful to not introduce new
failure cases, for instance when a next hop is unreachable from both.

Daniel


Re: Simultaneous CARP failover for multiple interfaces

2012-04-23 Thread Daniel Hartmeier
On Mon, Apr 23, 2012 at 11:49:14AM -0700, Kyle Lanclos wrote:

 Where this presents a problem is if the current CARP master loses a single
 network interface (cable unplugged, isolated hardware failure, sysadmin
 failure, etc.), as opposed to the CARP master failing entirely. The slave
 will appropriately assume the master role for one CARP interface, but will
 *not* do so for the second.

Yes, it will:

 net.inet.carp.preempt   Allow virtual hosts to preempt each other.
 It is also used to failover carp interfaces
 as a group.  When the option is enabled and
 one of the carp enabled physical interfaces
 goes down, advskew is changed to 240 on all
 carp interfaces.  See also the first example.
 Disabled by default.

(i.e. this single sysctl knob enables both group failover and failback)

This covers link state change (unplugged cable) as well as
administrative down of the physical interface.

It does not cover the case where the link remains up, but the uplink
switch stops forwarding, for instance, but...

 We would like our otherwise nicely redundant firewall configuration to be
 resilient against this type of failure. Short of running a cron job every
 sixty seconds to check the interface state, is there some way we can
 automatically force the promotion of a CARP slave if a second CARP interface
 flips from slave to master?

.. see ifstated(8), which can ping uplink hops and issue ifconfig advskew
changes to demote the master when appropriate.

Daniel


Re: Simultaneous CARP failover for multiple interfaces

2012-04-23 Thread Stuart Henderson
On 2012/04/23 11:49, Kyle Lanclos wrote:
 In order for our firewall to operate effectively, we use 'keep state'
 pf rules. We empirically determined that we must have CARP preemption
 enabled, otherwise pf cannot properly establish state for new TCP
 connections. If pfsync could be told to synchronize incomplete states,
 this issue might go away.

pfsync(4)'s defer option might help. there is a penalty but it might
be acceptable for your use case.

 Where more than one firewall might actively handle packets, e.g. with
 certain ospfd(8), bgpd(8) or carp(4) configurations, it is beneficial to
 defer transmission of the initial packet of a connection.  The pfsync
 state insert message is sent immediately; the packet is queued until
 either this message is acknowledged by another system, or a timeout has
 expired.  This behaviour is enabled with the defer parameter to
 ifconfig(8).


Re: Simultaneous CARP failover for multiple interfaces

2012-04-23 Thread Kyle Lanclos
Daniel Hartmeier wrote:
 Yes, it will:
 
  net.inet.carp.preempt   Allow virtual hosts to preempt each other.
  It is also used to failover carp interfaces
  as a group.  When the option is enabled and
  one of the carp enabled physical interfaces
  goes down, advskew is changed to 240 on all
  carp interfaces.  See also the first example.
  Disabled by default.

Whoops, thanks for pointing that out. I will re-do my tests (the setup is
at a remote site), and make sure that I'm done making things up.

I'm not sure where your quote came from; I see similar text in NetBSD docs,
but this is what I (now) find in the OpenBSD FAQ:

net.inet.carp.preempt   Allow hosts within a redundancy group
that have a better advbase and advskew
to preempt the master. In addition, this
option also enables failing over a group
of interfaces together in the event that
one interface goes down. If one physical
CARP-enabled interface goes down, CARP
will increase the demotion counter,
carpdemote, by 1 on interface groups that
the carp(4) interface is a member of, in
effect causing all group members to fail-
over together. 

http://www.openbsd.org/faq/pf/carp.html

I did establish an interface group for my two CARP interfaces, but I did
not do my failover tests while it was in that state. As I said, I clearly
need to re-do my tests.

 It does not cover the case where the link remains up, but the uplink
 switch stops forwarding, for instance, but...

At least for our case, a switch failure on either side of the firewall/router
represents a total loss of connectivity.

However, this does jog another potential failure mode. Some of our older
OpenBSD firewalls (going back to OpenBSD) will occasionally (maybe once a
year) lose a network interface. If you logged in at the console of a
host while it was in this state, the interface would look perfectly normal,
but it would not pass any traffic. I callously worked around this by
administratively cycling each network interface on the affected machine(s)
on a weekly basis.

If we ran into this failure mode with our CARP firewalls, I'm assuming the
master would keep right on thinking it was the master, and not attempt to
demote iteslf.

While it is certainly helpful for self-demotion of a master to occur,
it seems reasonable for self-promotion of a slave to also occur.

 ... see ifstated(8), which can ping uplink hops and issue ifconfig advskew
 changes to demote the master when appropriate.

Thanks, I'll look into that.

--Kyle