carp init delay

2013-04-03 Thread Camiel Dobbelaar


In some cases when a network port comes up, it does not indicate that 
the network is ready.  But on linkup, carp(4) will try to get out of the 
INIT state as soon as possible.  And because all is quiet it will decide 
to become master.


This then leads to master-master situations.

Here are two examples when this can happen, there are probably more:

(1) spanning tree may be in effect, and not yet forwarding

(2) a powering-up or rebooting switch that activates its ports 
immediately, but does not forward anything while not completely up yet 
(this may be an openbsd bridge too)


I wonder if carp(4) needs an extra knob (*shudder*) to pause in the INIT 
state while the rest of the network gets ready after a linkup.


I see in the source code there are already two mechanisms/workarounds 
that are related, but a pause may be a bit more generic:

- sc_suppress
- sc_delayed_arp

Anyone else observe/fix this by other means?  Opinions?




Re: carp init delay

2013-04-03 Thread Stuart Henderson
On 2013/04/03 14:54, Camiel Dobbelaar wrote:
 
 In some cases when a network port comes up, it does not indicate that
 the network is ready.  But on linkup, carp(4) will try to get out of
 the INIT state as soon as possible.  And because all is quiet it will
 decide to become master.
 
 This then leads to master-master situations.
 
 Here are two examples when this can happen, there are probably more:
 
 (1) spanning tree may be in effect, and not yet forwarding
 
 (2) a powering-up or rebooting switch that activates its ports
 immediately, but does not forward anything while not completely up
 yet (this may be an openbsd bridge too)
 
 I wonder if carp(4) needs an extra knob (*shudder*) to pause in the
 INIT state while the rest of the network gets ready after a linkup.
 
 I see in the source code there are already two mechanisms/workarounds
 that are related, but a pause may be a bit more generic:
 - sc_suppress
 - sc_delayed_arp
 
 Anyone else observe/fix this by other means?  Opinions?
 
 

slightly messy, though at least this also applies to the case with
things other than carp which could also have problems: add !sleep 5
or something in hostname.if for the physical interface...



Re: carp init delay

2013-04-03 Thread sven falempin
my 2 cents:
timing is always a problem, maybe you could arping the next hop and then
activate the carp ?


On Wed, Apr 3, 2013 at 9:34 AM, Stuart Henderson s...@spacehopper.orgwrote:

 On 2013/04/03 14:54, Camiel Dobbelaar wrote:
 
  In some cases when a network port comes up, it does not indicate that
  the network is ready.  But on linkup, carp(4) will try to get out of
  the INIT state as soon as possible.  And because all is quiet it will
  decide to become master.
 
  This then leads to master-master situations.
 
  Here are two examples when this can happen, there are probably more:
 
  (1) spanning tree may be in effect, and not yet forwarding
 
  (2) a powering-up or rebooting switch that activates its ports
  immediately, but does not forward anything while not completely up
  yet (this may be an openbsd bridge too)
 
  I wonder if carp(4) needs an extra knob (*shudder*) to pause in the
  INIT state while the rest of the network gets ready after a linkup.
 
  I see in the source code there are already two mechanisms/workarounds
  that are related, but a pause may be a bit more generic:
  - sc_suppress
  - sc_delayed_arp
 
  Anyone else observe/fix this by other means?  Opinions?
 
 

 slightly messy, though at least this also applies to the case with
 things other than carp which could also have problems: add !sleep 5
 or something in hostname.if for the physical interface...




-- 
-
() ascii ribbon campaign - against html e-mail
/\


Re: carp init delay

2013-04-03 Thread Stuart Henderson
On 2013/04/03 15:43, Camiel Dobbelaar wrote:
 
 
 On 4/3/13 3:34 PM, Stuart Henderson wrote:
 In some cases when a network port comes up, it does not indicate that
 the network is ready.  But on linkup, carp(4) will try to get out of
 the INIT state as soon as possible.  And because all is quiet it will
 decide to become master.
 Anyone else observe/fix this by other means?  Opinions?
 
 
 slightly messy, though at least this also applies to the case with
 things other than carp which could also have problems: add !sleep 5
 or something in hostname.if for the physical interface...
 
 Yes, I already use that.  That solves the case where the system with
 the carp interfaces itself is rebooted.
 
 But not the other cases.  Hence I'd like that sleep applied always,
 instead of only when /etc/netstart is run.  :-)

Ah, I see what you mean. Still there are things other than carp
where this might also apply - for example the pfsync initial_bulk
that gets handled via if_linkstatehooks (but maybe also userland
things)..



Re: carp init delay

2013-04-03 Thread Camiel Dobbelaar


When the system with the carp interfaces comes up, a sleep in the 
hostname.if file works.  An arping might be an optimization of that.


But I'd like carp to react properly to events *outside* the system.

When I unplug/plug a network cable, spanning tree can kick in again on 
the switch.  *Then* I'd like carp to pause.


Or the example I mentioned earlier when a switch is powered off and on.

I think handling that belongs in the kernel, and not some userland 
voodoo (ifstated/cron scripts) to clean it up.   :-)




On 4/3/13 3:37 PM, sven falempin wrote:

my 2 cents:
timing is always a problem, maybe you could arping the next hop and then
activate the carp ?


On Wed, Apr 3, 2013 at 9:34 AM, Stuart Henderson s...@spacehopper.orgwrote:


On 2013/04/03 14:54, Camiel Dobbelaar wrote:


In some cases when a network port comes up, it does not indicate that
the network is ready.  But on linkup, carp(4) will try to get out of
the INIT state as soon as possible.  And because all is quiet it will
decide to become master.

This then leads to master-master situations.

Here are two examples when this can happen, there are probably more:

(1) spanning tree may be in effect, and not yet forwarding

(2) a powering-up or rebooting switch that activates its ports
immediately, but does not forward anything while not completely up
yet (this may be an openbsd bridge too)

I wonder if carp(4) needs an extra knob (*shudder*) to pause in the
INIT state while the rest of the network gets ready after a linkup.

I see in the source code there are already two mechanisms/workarounds
that are related, but a pause may be a bit more generic:
- sc_suppress
- sc_delayed_arp

Anyone else observe/fix this by other means?  Opinions?




slightly messy, though at least this also applies to the case with
things other than carp which could also have problems: add !sleep 5
or something in hostname.if for the physical interface...









Re: carp init delay

2013-04-03 Thread Camiel Dobbelaar



On 4/3/13 3:54 PM, Stuart Henderson wrote:

On 2013/04/03 15:43, Camiel Dobbelaar wrote:



On 4/3/13 3:34 PM, Stuart Henderson wrote:

In some cases when a network port comes up, it does not indicate that
the network is ready.  But on linkup, carp(4) will try to get out of
the INIT state as soon as possible.  And because all is quiet it will
decide to become master.
Anyone else observe/fix this by other means?  Opinions?



slightly messy, though at least this also applies to the case with
things other than carp which could also have problems: add !sleep 5
or something in hostname.if for the physical interface...


Yes, I already use that.  That solves the case where the system with
the carp interfaces itself is rebooted.

But not the other cases.  Hence I'd like that sleep applied always,
instead of only when /etc/netstart is run.  :-)


Ah, I see what you mean. Still there are things other than carp
where this might also apply - for example the pfsync initial_bulk
that gets handled via if_linkstatehooks (but maybe also userland
things)..


Pausing carp may help the pfsync case too?

Wasn't the major problem there caused by the freshly booted backup going 
to master too soon and cancelling the bulk update?