Re: carp init delay

2013-04-03 Thread Camiel Dobbelaar



On 4/3/13 3:54 PM, Stuart Henderson wrote:

On 2013/04/03 15:43, Camiel Dobbelaar wrote:



On 4/3/13 3:34 PM, Stuart Henderson wrote:

In some cases when a network port comes up, it does not indicate that
the network is ready.  But on linkup, carp(4) will try to get out of
the INIT state as soon as possible.  And because all is quiet it will
decide to become master.
Anyone else observe/fix this by other means?  Opinions?



slightly messy, though at least this also applies to the case with
things other than carp which could also have problems: add "!sleep 5"
or something in hostname.if for the physical interface...


Yes, I already use that.  That solves the case where the system with
the carp interfaces itself is rebooted.

But not the other cases.  Hence I'd like that sleep applied always,
instead of only when /etc/netstart is run.  :-)


Ah, I see what you mean. Still there are things other than carp
where this might also apply - for example the pfsync initial_bulk
that gets handled via if_linkstatehooks (but maybe also userland
things)..


Pausing carp may help the pfsync case too?

Wasn't the major problem there caused by the freshly booted backup going 
to master too soon and cancelling the bulk update?





Re: carp init delay

2013-04-03 Thread Camiel Dobbelaar


When the system with the carp interfaces comes up, a sleep in the 
hostname.if file works.  An arping might be an optimization of that.


But I'd like carp to react properly to events *outside* the system.

When I unplug/plug a network cable, spanning tree can kick in again on 
the switch.  *Then* I'd like carp to pause.


Or the example I mentioned earlier when a switch is powered off and on.

I think handling that belongs in the kernel, and not some userland 
voodoo (ifstated/cron scripts) to clean it up.   :-)




On 4/3/13 3:37 PM, sven falempin wrote:

my 2 cents:
timing is always a problem, maybe you could arping the next hop and then
activate the carp ?


On Wed, Apr 3, 2013 at 9:34 AM, Stuart Henderson wrote:


On 2013/04/03 14:54, Camiel Dobbelaar wrote:


In some cases when a network port comes up, it does not indicate that
the network is ready.  But on linkup, carp(4) will try to get out of
the INIT state as soon as possible.  And because all is quiet it will
decide to become master.

This then leads to master-master situations.

Here are two examples when this can happen, there are probably more:

(1) spanning tree may be in effect, and not yet forwarding

(2) a powering-up or rebooting switch that activates its ports
immediately, but does not forward anything while not completely up
yet (this may be an openbsd bridge too)

I wonder if carp(4) needs an extra knob (*shudder*) to pause in the
INIT state while the rest of the network gets ready after a linkup.

I see in the source code there are already two mechanisms/workarounds
that are related, but a pause may be a bit more generic:
- sc_suppress
- sc_delayed_arp

Anyone else observe/fix this by other means?  Opinions?




slightly messy, though at least this also applies to the case with
things other than carp which could also have problems: add "!sleep 5"
or something in hostname.if for the physical interface...









Re: carp init delay

2013-04-03 Thread Camiel Dobbelaar



On 4/3/13 3:34 PM, Stuart Henderson wrote:

In some cases when a network port comes up, it does not indicate that
the network is ready.  But on linkup, carp(4) will try to get out of
the INIT state as soon as possible.  And because all is quiet it will
decide to become master.
Anyone else observe/fix this by other means?  Opinions?



slightly messy, though at least this also applies to the case with
things other than carp which could also have problems: add "!sleep 5"
or something in hostname.if for the physical interface...


Yes, I already use that.  That solves the case where the system with the 
carp interfaces itself is rebooted.


But not the other cases.  Hence I'd like that sleep applied always, 
instead of only when /etc/netstart is run.  :-)




carp init delay

2013-04-03 Thread Camiel Dobbelaar


In some cases when a network port comes up, it does not indicate that 
the network is ready.  But on linkup, carp(4) will try to get out of the 
INIT state as soon as possible.  And because all is quiet it will decide 
to become master.


This then leads to master-master situations.

Here are two examples when this can happen, there are probably more:

(1) spanning tree may be in effect, and not yet forwarding

(2) a powering-up or rebooting switch that activates its ports 
immediately, but does not forward anything while not completely up yet 
(this may be an openbsd bridge too)


I wonder if carp(4) needs an extra knob (*shudder*) to pause in the INIT 
state while the rest of the network gets ready after a linkup.


I see in the source code there are already two mechanisms/workarounds 
that are related, but a pause may be a bit more generic:

- sc_suppress
- sc_delayed_arp

Anyone else observe/fix this by other means?  Opinions?




Re: help testing bridge diff

2012-09-24 Thread Camiel Dobbelaar
On Sun, 23 Sep 2012, Stefan Sperling wrote:

> On Thu, Sep 20, 2012 at 10:11:20AM +0200, Camiel Dobbelaar wrote:
> > I need help testing this bridge diff, as I cannot test (or even imagine) 
> > all the possible bridge setups.
> > 
> > It brings a nice speed improvement and simplifies the code.
> > 
> > Testing especially appreciated with gif, tun and vether interfaces in the 
> > bridge.
> > 
> > I can provide i386 and amd64 kernels to make it convenient.  :-)
> > 
> > Thanks!
> 
> The diff reads fine. I like the idea, storing a pointer to the bridge
> port itself makes much more sense than having everyone and their uncle
> loop over the iflist in the bridge softc to find the port.
> 
> I'll note that apart from making the output path more efficient this
> diff also hides the bridge iflist internals inside the bridge core code.
> Are you planning on changing the list to e.g. a tree going forward?

I think the list is ok as long as it's for simple bridge maintenance.  Not 
for lookups.

There is one performance killer still there that loops over the list to 
compare MAC addresses to see if the packet is for the machine itself.  I 
think it may be possible to use the bridge routecache for this.  So that 
MAC addresses of bridgeports point to "self".  There is also a carp 
check in there unfortunately, which may make this a bit harder.
 
> I'm gonna test this on my firewalls. Let's see if it runs as well
> as it looks :)
> 
> There are two small changes buried in the diff which are unrelated
> to the overall change you're making to the bridge code, see below.

Yes, the mbuf.h change as well.  I'll commit those seperately.

And a new diff,  A misplaced bracket in in_arpinput() caused make release 
to fail (as you noticed).


Index: dev/isa/if_ie.c
===
RCS file: /cvs/src/sys/dev/isa/if_ie.c,v
retrieving revision 1.35
diff -u -p -r1.35 if_ie.c
--- dev/isa/if_ie.c 28 Nov 2008 02:44:17 -  1.35
+++ dev/isa/if_ie.c 24 Sep 2012 15:14:52 -
@@ -1054,16 +1054,16 @@ check_eh(sc, eh, to_bpf)
 */
 #if NBPFILTER > 0
*to_bpf = (sc->sc_arpcom.ac_if.if_bpf != 0) ||
-   (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #else
-   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   *to_bpf = (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #endif
/* If for us, accept and hand up to BPF */
if (ether_equal(eh->ether_dhost, sc->sc_arpcom.ac_enaddr))
return 1;
 
 #if NBPFILTER > 0
-   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge == NULL)
+   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridgeport == NULL)
*to_bpf = 2; /* we don't need to see it */
 #endif
 
@@ -1095,9 +1095,9 @@ check_eh(sc, eh, to_bpf)
 */
 #if NBPFILTER > 0
*to_bpf = (sc->sc_arpcom.ac_if.if_bpf != 0) ||
-   (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #else
-   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   *to_bpf = (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #endif
/* We want to see multicasts. */
if (eh->ether_dhost[0] & 1)
@@ -1109,7 +1109,7 @@ check_eh(sc, eh, to_bpf)
 
/* Anything else goes to BPF but nothing else. */
 #if NBPFILTER > 0
-   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge == NULL)
+   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridgeport == NULL)
*to_bpf = 2;
 #endif
return 1;
Index: net/bridgestp.c
===
RCS file: /cvs/src/sys/net/bridgestp.c,v
retrieving revision 1.41
diff -u -p -r1.41 bridgestp.c
--- net/bridgestp.c 20 Sep 2012 14:10:18 -  1.41
+++ net/bridgestp.c 24 Sep 2012 15:14:52 -
@@ -1641,7 +1641,6 @@ void
 bstp_ifstate(void *arg)
 {
struct ifnet *ifp = (struct ifnet *)arg;
-   struct bridge_softc *sc;
struct bridge_iflist *p;
struct bstp_port *bp;
struct bstp_state *bs;
@@ -1649,16 +1648,11 @@ bstp_ifstate(void *arg)
 
if (ifp->if_type == IFT_BRIDGE)
return;
-   sc = (struct bridge_softc *)ifp->if_bridge;
 
s = splnet();
-   LIST_FOREACH(p, &sc->sc_iflist, next) {
-   if ((p->bif_flags & IFBIF_STP) == 0)
-   continue;
-   if (p->ifp == ifp)
-   break;
-   }
-   if (p == LIST

Re: proto cksum madness

2012-09-22 Thread Camiel Dobbelaar
On 21-9-2012 23:40, Stuart Henderson wrote:
> $ ifconfig vr0 hwfeatures|head -2
> vr0: flags=8b43 mtu 
> 1500
> hwfeatures=8017
> 
> No problems noticed yet. (this is running i386).
> 
> $ ifconfig vlan6 hwfeatures|head -2
> vlan6: flags=8943 mtu 1500
> hwfeatures=0<>
> 
> Is it right/expected that CSUM_* aren't propagated to the vlan ifaces?

It is correct.  vr(4) does not have VLAN_HWTAGGING.

The comment in the code explains it:

> /*
>  * If the parent interface can do hardware-assisted
>  * VLAN encapsulation, then propagate its hardware-
>  * assisted checksumming flags.
>  *
>  * If the card cannot handle hardware tagging, it cannot
>  * possibly compute the correct checksums for tagged packets.
>  *
>  * This brings up another possibility, do cards exist which
>  * have all of these capabilities but cannot utilize them together?
>  */
> if (p->if_capabilities & IFCAP_VLAN_HWTAGGING)
> ifv->ifv_if.if_capabilities = p->if_capabilities &
> IFCAP_CSUM_MASK;



help testing bridge diff

2012-09-20 Thread Camiel Dobbelaar
I need help testing this bridge diff, as I cannot test (or even imagine) 
all the possible bridge setups.

It brings a nice speed improvement and simplifies the code.

Testing especially appreciated with gif, tun and vether interfaces in the 
bridge.

I can provide i386 and amd64 kernels to make it convenient.  :-)

Thanks!


Index: dev/isa/if_ie.c
===
RCS file: /cvs/src/sys/dev/isa/if_ie.c,v
retrieving revision 1.35
diff -u -p -r1.35 if_ie.c
--- dev/isa/if_ie.c 28 Nov 2008 02:44:17 -  1.35
+++ dev/isa/if_ie.c 18 Sep 2012 09:55:59 -
@@ -1054,16 +1054,16 @@ check_eh(sc, eh, to_bpf)
 */
 #if NBPFILTER > 0
*to_bpf = (sc->sc_arpcom.ac_if.if_bpf != 0) ||
-   (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #else
-   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   *to_bpf = (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #endif
/* If for us, accept and hand up to BPF */
if (ether_equal(eh->ether_dhost, sc->sc_arpcom.ac_enaddr))
return 1;
 
 #if NBPFILTER > 0
-   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge == NULL)
+   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridgeport == NULL)
*to_bpf = 2; /* we don't need to see it */
 #endif
 
@@ -1095,9 +1095,9 @@ check_eh(sc, eh, to_bpf)
 */
 #if NBPFILTER > 0
*to_bpf = (sc->sc_arpcom.ac_if.if_bpf != 0) ||
-   (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #else
-   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   *to_bpf = (sc->sc_arpcom.ac_if.if_bridgeport != NULL);
 #endif
/* We want to see multicasts. */
if (eh->ether_dhost[0] & 1)
@@ -1109,7 +1109,7 @@ check_eh(sc, eh, to_bpf)
 
/* Anything else goes to BPF but nothing else. */
 #if NBPFILTER > 0
-   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge == NULL)
+   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridgeport == NULL)
*to_bpf = 2;
 #endif
return 1;
Index: net/bridgestp.c
===
RCS file: /cvs/src/sys/net/bridgestp.c,v
retrieving revision 1.40
diff -u -p -r1.40 bridgestp.c
--- net/bridgestp.c 9 Jul 2011 04:53:33 -   1.40
+++ net/bridgestp.c 18 Sep 2012 09:56:00 -
@@ -1640,7 +1640,6 @@ void
 bstp_ifstate(void *arg)
 {
struct ifnet *ifp = (struct ifnet *)arg;
-   struct bridge_softc *sc;
struct bridge_iflist *p;
struct bstp_port *bp;
struct bstp_state *bs;
@@ -1648,16 +1647,11 @@ bstp_ifstate(void *arg)
 
if (ifp->if_type == IFT_BRIDGE)
return;
-   sc = (struct bridge_softc *)ifp->if_bridge;
 
s = splnet();
-   LIST_FOREACH(p, &sc->sc_iflist, next) {
-   if ((p->bif_flags & IFBIF_STP) == 0)
-   continue;
-   if (p->ifp == ifp)
-   break;
-   }
-   if (p == LIST_END(&sc->sc_iflist))
+   if ((p = (struct bridge_iflist *)ifp->if_bridgeport) == NULL)
+   goto done;
+   if ((p->bif_flags & IFBIF_STP) == 0)
goto done;
if ((bp = p->bif_stp) == NULL)
goto done;
@@ -2120,7 +2114,7 @@ bstp_ifsflags(struct bstp_port *bp, u_in
 int
 bstp_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
-   struct bridge_softc *sc = (struct bridge_softc *)ifp;
+   struct bridge_softc *sc = (struct bridge_softc *)ifp->if_softc;
struct bstp_state *bs = sc->sc_stp;
struct ifbrparam *ifbp = (struct ifbrparam *)data;
struct ifbreq *ifbr = (struct ifbreq *)data;
@@ -2137,15 +2131,8 @@ bstp_ioctl(struct ifnet *ifp, u_long cmd
err = ENOENT;
break;
}
-   if ((caddr_t)sc != ifs->if_bridge) {
-   err = ESRCH;
-   break;
-   }
-   LIST_FOREACH(p, &sc->sc_iflist, next) {
-   if (p->ifp == ifs)
-   break;
-   }
-   if (p == LIST_END(&sc->sc_iflist)) {
+   p = (struct bridge_iflist *)ifs->if_bridgeport;
+   if (p == NULL || p->bridge_sc != sc) {
err = ESRCH;
break;
}
Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.241
diff -u -p -r1.241 if.c
--- net/if.c3 Jan 2012 23:41:51 -   1.241
+++ net/if.c18 Sep 2012 09:56:00 -
@@ -531,7 +531,7 @@ if_detach(struct ifnet *ifp)
 
 #if NBRIDGE > 0
  

Re: bridge loop detection

2012-03-02 Thread Camiel Dobbelaar
On 2-3-2012 15:49, Matthieu Herrb wrote:
> On Fri, Mar 02, 2012 at 03:19:34PM +0100, Camiel Dobbelaar wrote:
>> I think the bridge loop detection in if_ethersubr.c can be removed.  It 
>> taxes all bridge output traffic, but I don't think it ever kicks in.
>>
>> It was added in 2001 by angelos:
>> http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c.diff?r1=1.48;r2=1.49;f=h
>>
>> I'd say the following ethertypes are safe, they push packets further down 
>> the stack so they cannot be bridged again: ieee80211, trunk, vlan
>>
>> tun and gre cannot be part of a bridge.
> 
> While I don't understand the network stack very deeply, I for sure
> have tun interfaces (in tap mode, with link0 set) part of a bridge on
> my openvpn gateway:
> 
> bridge0: flags=41
> groups: bridge
> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6
> proto rstp
> tun0 flags=3
> port 14 ifpriority 0 ifcost 0
> vlan4 flags=3
> port 6 ifpriority 0 ifcost 0
> 
> 
> 

Yes, you are right, I missed the layer-2 tap mode.  It looks like it
only pushes packets further down as well (so they cannot be bridged
twice).  But the tun code is a little more challenging so not 100% sure yet.

If you can try the diff on this setup, that would be nice.



bridge loop detection

2012-03-02 Thread Camiel Dobbelaar
I think the bridge loop detection in if_ethersubr.c can be removed.  It 
taxes all bridge output traffic, but I don't think it ever kicks in.

It was added in 2001 by angelos:
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c.diff?r1=1.48;r2=1.49;f=h

I'd say the following ethertypes are safe, they push packets further down 
the stack so they cannot be bridged again: ieee80211, trunk, vlan

tun and gre cannot be part of a bridge.

vether discards all output.

That leaves gif, but that has its own loop detection.  Added by angelos in 
2001 as well, so I'm not sure what I'm missing.
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_gif.c.diff?r1=1.18;r2=1.19;f=h

FreeBSD and NetBSD don't have it.


Can people with exotic bridge setups (two bridges on one machine, gif 
tunnels, etc) give this a spin?  It increases the collission counter 
(netstat -in) when a loop is broken.

Index: if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.151
diff -u -p -t -u -r1.151 if_ethersubr.c
--- if_ethersubr.c  9 Jul 2011 00:47:18 -   1.151
+++ if_ethersubr.c  2 Mar 2012 13:47:45 -
@@ -399,8 +399,10 @@ ether_output(ifp0, m0, dst, rt0)
 goto bad;
 }
 #endif
-if (!bcmp(&ifp->if_bridge, mtag + 1, sizeof(caddr_t)))
+if (!bcmp(&ifp->if_bridge, mtag + 1, sizeof(caddr_t))) 
{
+ifp->if_collisions++;
 break;
+}
 }
 if (mtag == NULL) {
 /* Attach a tag so we can detect loops */



Re: vitaminstore 24x7

2012-02-06 Thread Camiel Dobbelaar
Sorry everyone, wrong tech@   :-(


On 6-2-2012 9:31, Camiel Dobbelaar wrote:
> Vitaminstote is in 24x7 now.   They are allowed to contact us now when
> the server is down.
> 
> They did that anyway a few times, so not a lot changes.
> 
> What we do is solve hardware and resource problems, and maybe a very
> ocassional iisreset.  For the rest they probably need their China devs.



vitaminstore 24x7

2012-02-06 Thread Camiel Dobbelaar
Vitaminstote is in 24x7 now.   They are allowed to contact us now when
the server is down.

They did that anyway a few times, so not a lot changes.  :-)

What we do is solve hardware and resource problems, and maybe a very
ocassional iisreset.  For the rest they probably need their China devs.



Re: relayd imsg race

2012-01-02 Thread Camiel Dobbelaar
On Mon, 5 Dec 2011, Camiel Dobbelaar wrote:
> > Another might be to inhibit the processing of IMSG_HOST_STATUS only until
> > the configuration has been completed (that is after receiving 
> > IMSG_CFG_DONE):
> 
> I'm going to try this one.  I'm not sure how bad it is to discard
> messages though.

I tried it, and it does not work correctly.  Because the imsg is dropped 
while the process is marked inactive, you get desynchronized and this code 
for example still breaks:

if (host->check_cnt != st.check_cnt) {
log_debug("%s: host %d => %d", __func__,
host->conf.id, host->up);
fatalx("pfe_dispatch_hce: desynchronized");
}

Maybe that can be fixed, if we can assume that it's not bad to drop 
some status messages once in a while.

I tried another approach below: only start the processes if _all_ of them 
have loaded the config.  This should fix the configuration race after 
startup completely.

There's still a race while reloading though.  Some processes might still 
be active with an old config, while others may be busy purging their old 
config before loading the new one.  The right way would be to pauze all 
the processes first.  But I'd say that's a seperate problem.  :-)

--
Cam


Index: hce.c
===
RCS file: /cvs/src/usr.sbin/relayd/hce.c,v
retrieving revision 1.61
diff -u -p -r1.61 hce.c
--- hce.c   12 Nov 2011 19:36:17 -  1.61
+++ hce.c   2 Jan 2012 13:57:40 -
@@ -355,6 +355,8 @@ hce_dispatch_parent(int fd, struct privs
break;
case IMSG_CFG_DONE:
config_getcfg(env, imsg);
+   break;
+   case IMSG_CTL_START:
hce_setup_events();
break;
case IMSG_CTL_RESET:
Index: parse.y
===
RCS file: /cvs/src/usr.sbin/relayd/parse.y,v
retrieving revision 1.159
diff -u -p -r1.159 parse.y
--- parse.y 21 Sep 2011 18:45:40 -  1.159
+++ parse.y 2 Jan 2012 13:57:40 -
@@ -2280,9 +2280,6 @@ load_config(const char *filename, struct
errors++;
}
 
-   if (TAILQ_EMPTY(conf->sc_relays))
-   conf->sc_prefork_relay = 0;
-
/* Cleanup relay list to inherit */
while ((rlay = TAILQ_FIRST(&relays)) != NULL) {
TAILQ_REMOVE(&relays, rlay, rl_entry);
Index: pfe.c
===
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.71
diff -u -p -r1.71 pfe.c
--- pfe.c   12 Nov 2011 19:36:17 -  1.71
+++ pfe.c   2 Jan 2012 13:57:40 -
@@ -203,6 +203,8 @@ pfe_dispatch_parent(int fd, struct privs
config_getcfg(env, imsg);
init_filter(env, imsg->fd);
init_tables(env);
+   break;
+   case IMSG_CTL_START:
pfe_setup_events();
pfe_sync();
break;
Index: relay.c
===
RCS file: /cvs/src/usr.sbin/relayd/relay.c,v
retrieving revision 1.143
diff -u -p -r1.143 relay.c
--- relay.c 21 Sep 2011 18:45:40 -  1.143
+++ relay.c 2 Jan 2012 13:57:40 -
@@ -2577,6 +2577,8 @@ relay_dispatch_parent(int fd, struct pri
break;
case IMSG_CFG_DONE:
config_getcfg(env, imsg);
+   break;
+   case IMSG_CTL_START:
relay_launch();
break;
case IMSG_CTL_RESET:
Index: relayd.c
===
RCS file: /cvs/src/usr.sbin/relayd/relayd.c,v
retrieving revision 1.104
diff -u -p -r1.104 relayd.c
--- relayd.c4 Sep 2011 20:26:58 -   1.104
+++ relayd.c2 Jan 2012 13:57:40 -
@@ -49,6 +49,7 @@
 __dead void usage(void);
 
 int parent_configure(struct relayd *);
+voidparent_configure_done(struct relayd *);
 voidparent_reload(struct relayd *, u_int, const char *);
 voidparent_sig_handler(int, short, void *);
 voidparent_shutdown(struct relayd *);
@@ -292,6 +293,9 @@ parent_configure(struct relayd *env)
TAILQ_FOREACH(rlay, env->sc_relays, rl_entry)
config_setrelay(env, rlay);
 
+   /* HCE, PFE and the preforked relays need to reload their config. */
+   env->sc_reload = 2 + env->sc_prefork_relay;
+
for (id = 0; id < PROC_MAX; id++) {
if (id == privsep_process)
continue;
@@ -308,7 +312,6 @@ parent_configure(struct relayd *env)
} else
s = -1;
 
-   env->sc_reload++;
proc_compose_imsg(env->sc_ps, id,

Re: relayd imsg race

2011-12-05 Thread Camiel Dobbelaar
On 5-12-2011 19:45, Sebastian Benoit wrote:
> I see relayd crashes like this: (1)
> fatal: relay_dispatch_pfe: invalid host id

> or like this: (2)
> fatal: pfe_dispatch_hce: invalid host id

> There is a race of the hce and the other childs (pfe and relays)
> between loading the configuration and start of processing IMSG_HOST_STATUS
> messages.
> 
> The problem is that in hce_setup_events() the host checks are started before
> all childs have all of the configuration.

Yes, I experienced the same thing, see:
http://marc.info/?l=openbsd-bugs&m=132207738531052&w=2

> A quick hack is to insert a sleep(1) at the beginning of hce_setup_events().

No, that does not work, I've seen crashes with sleeps upto 3 seconds on
my system.  And it is still a race.

> A fix might be to make 'invalid host id' non fatal:

That might lead to crashes later on, especially if the hce notifies
about new host ids that the other processes have not loaded yet.

> Another might be to inhibit the processing of IMSG_HOST_STATUS only until
> the configuration has been completed (that is after receiving IMSG_CFG_DONE):

I'm going to try this one.  I'm not sure how bad it is to discard
messages though.



Re: raise max value for tcp autosizing buffer [WAS: misc@ network tuning for high bandwidth and high latency]

2011-12-04 Thread Camiel Dobbelaar
On 4-12-2011 13:01, Sebastian Reitenbach wrote:
> the default maximum size of the tcp send and receive buffer used by the 
> autosizing algorithm is way too small, when trying to get maximum speed with 
> high bandwidth and high latency connections.

I have tweaked SB_MAX on a system too, but it was for UDP.

When running a busy Unbound resolver, the recommendation is too bump the
receive buffer to 4M or even 8M. See
http://unbound.net/documentation/howto_optimise.html

Otherwise a lot of queries are dropped when the cache is cold.

I don't think there's a magic value that's right for everyone, so a
sysctl would be nice.  Maybe separate ones for tcp and udp.

I know similar sysctl's have been removed recently, and that they are
sometimes abused, but I'd say we have two valid use cases now.

So I'd love some more discussion.  :-)

--
Cam



Re: use M_PROTO1 in bridge output too

2011-11-03 Thread Camiel Dobbelaar
No one interested in this one?  I have another bridge speedup diff after 
this.

On Fri, 28 Oct 2011, Camiel Dobbelaar wrote:

> M_PROTO1 is used by if_bridge on the input path.  On the output path it's 
> used now only by if_bridge for if_gif.  I think we can use it generically 
> to mark packets as "processed by bridge" in the output path.
> 
> The diff simplifies things and avoids mtag checking and allocation so is 
> more efficient too.
> 
> The old code checks if a packet has passed the _same_ bridge already, but 
> as an interface can only be a member of one bridge I think the flag is 
> sufficient.
> 
> It looks like the only other user of M_PROTO1 is netbt/hci_link.c, but 
> that can be fixed if the diff is acceptable otherwise.
> 
> Tested lightly in a bridge/gif setup, but could use some more testing.
> (especially with ipsec in the mix too)
> 
> 
> Index: if_bridge.c
> ===
> RCS file: /cvs/src/sys/net/if_bridge.c,v
> retrieving revision 1.193
> diff -u -p -r1.193 if_bridge.c
> --- if_bridge.c   4 Jul 2011 06:54:49 -   1.193
> +++ if_bridge.c   28 Oct 2011 17:55:04 -
> @@ -2813,9 +2813,8 @@ bridge_ifenqueue(struct bridge_softc *sc
>  #if NGIF > 0
>   /* Packet needs etherip encapsulation. */
>   if (ifp->if_type == IFT_GIF) {
> - m->m_flags |= M_PROTO1;
> -
>   /* Count packets input into the gif from outside */
> + /* XXX do this in if_gif? */
>   ifp->if_ipackets++;
>   ifp->if_ibytes += m->m_pkthdr.len;
>   }
> @@ -2844,6 +2843,7 @@ bridge_ifenqueue(struct bridge_softc *sc
>   }
>  #endif
>   len = m->m_pkthdr.len;
> + m->m_flags |= M_PROTO1;
>   mflags = m->m_flags;
>   IFQ_ENQUEUE(&ifp->if_snd, m, NULL, error);
>   if (error) {
> Index: if_ethersubr.c
> ===
> RCS file: /cvs/src/sys/net/if_ethersubr.c,v
> retrieving revision 1.151
> diff -u -p -r1.151 if_ethersubr.c
> --- if_ethersubr.c9 Jul 2011 00:47:18 -   1.151
> +++ if_ethersubr.c28 Oct 2011 17:55:04 -
> @@ -382,40 +382,8 @@ ether_output(ifp0, m0, dst, rt0)
>* Interfaces that are bridge members need special handling
>* for output.
>*/
> - if (ifp->if_bridge) {
> - struct m_tag *mtag;
> -
> - /*
> -  * Check if this packet has already been sent out through
> -  * this bridge, in which case we simply send it out
> -  * without further bridge processing.
> -  */
> - for (mtag = m_tag_find(m, PACKET_TAG_BRIDGE, NULL); mtag;
> - mtag = m_tag_find(m, PACKET_TAG_BRIDGE, mtag)) {
> -#ifdef DEBUG
> - /* Check that the information is there */
> - if (mtag->m_tag_len != sizeof(caddr_t)) {
> - error = EINVAL;
> - goto bad;
> - }
> -#endif
> - if (!bcmp(&ifp->if_bridge, mtag + 1, sizeof(caddr_t)))
> - break;
> - }
> - if (mtag == NULL) {
> - /* Attach a tag so we can detect loops */
> - mtag = m_tag_get(PACKET_TAG_BRIDGE, sizeof(caddr_t),
> - M_NOWAIT);
> - if (mtag == NULL) {
> - error = ENOBUFS;
> - goto bad;
> - }
> - bcopy(&ifp->if_bridge, mtag + 1, sizeof(caddr_t));
> - m_tag_prepend(m, mtag);
> - error = bridge_output(ifp, m, NULL, NULL);
> - return (error);
> - }
> - }
> + if (ifp->if_bridge && !(m->m_flags & M_PROTO1))
> + return (bridge_output(ifp, m, NULL, NULL));
>  #endif
>   mflags = m->m_flags;
>   len = m->m_pkthdr.len;



use M_PROTO1 in bridge output too

2011-10-28 Thread Camiel Dobbelaar
M_PROTO1 is used by if_bridge on the input path.  On the output path it's 
used now only by if_bridge for if_gif.  I think we can use it generically 
to mark packets as "processed by bridge" in the output path.

The diff simplifies things and avoids mtag checking and allocation so is 
more efficient too.

The old code checks if a packet has passed the _same_ bridge already, but 
as an interface can only be a member of one bridge I think the flag is 
sufficient.

It looks like the only other user of M_PROTO1 is netbt/hci_link.c, but 
that can be fixed if the diff is acceptable otherwise.

Tested lightly in a bridge/gif setup, but could use some more testing.
(especially with ipsec in the mix too)


Index: if_bridge.c
===
RCS file: /cvs/src/sys/net/if_bridge.c,v
retrieving revision 1.193
diff -u -p -r1.193 if_bridge.c
--- if_bridge.c 4 Jul 2011 06:54:49 -   1.193
+++ if_bridge.c 28 Oct 2011 17:55:04 -
@@ -2813,9 +2813,8 @@ bridge_ifenqueue(struct bridge_softc *sc
 #if NGIF > 0
/* Packet needs etherip encapsulation. */
if (ifp->if_type == IFT_GIF) {
-   m->m_flags |= M_PROTO1;
-
/* Count packets input into the gif from outside */
+   /* XXX do this in if_gif? */
ifp->if_ipackets++;
ifp->if_ibytes += m->m_pkthdr.len;
}
@@ -2844,6 +2843,7 @@ bridge_ifenqueue(struct bridge_softc *sc
}
 #endif
len = m->m_pkthdr.len;
+   m->m_flags |= M_PROTO1;
mflags = m->m_flags;
IFQ_ENQUEUE(&ifp->if_snd, m, NULL, error);
if (error) {
Index: if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.151
diff -u -p -r1.151 if_ethersubr.c
--- if_ethersubr.c  9 Jul 2011 00:47:18 -   1.151
+++ if_ethersubr.c  28 Oct 2011 17:55:04 -
@@ -382,40 +382,8 @@ ether_output(ifp0, m0, dst, rt0)
 * Interfaces that are bridge members need special handling
 * for output.
 */
-   if (ifp->if_bridge) {
-   struct m_tag *mtag;
-
-   /*
-* Check if this packet has already been sent out through
-* this bridge, in which case we simply send it out
-* without further bridge processing.
-*/
-   for (mtag = m_tag_find(m, PACKET_TAG_BRIDGE, NULL); mtag;
-   mtag = m_tag_find(m, PACKET_TAG_BRIDGE, mtag)) {
-#ifdef DEBUG
-   /* Check that the information is there */
-   if (mtag->m_tag_len != sizeof(caddr_t)) {
-   error = EINVAL;
-   goto bad;
-   }
-#endif
-   if (!bcmp(&ifp->if_bridge, mtag + 1, sizeof(caddr_t)))
-   break;
-   }
-   if (mtag == NULL) {
-   /* Attach a tag so we can detect loops */
-   mtag = m_tag_get(PACKET_TAG_BRIDGE, sizeof(caddr_t),
-   M_NOWAIT);
-   if (mtag == NULL) {
-   error = ENOBUFS;
-   goto bad;
-   }
-   bcopy(&ifp->if_bridge, mtag + 1, sizeof(caddr_t));
-   m_tag_prepend(m, mtag);
-   error = bridge_output(ifp, m, NULL, NULL);
-   return (error);
-   }
-   }
+   if (ifp->if_bridge && !(m->m_flags & M_PROTO1))
+   return (bridge_output(ifp, m, NULL, NULL));
 #endif
mflags = m->m_flags;
len = m->m_pkthdr.len;



more sunix puc cards

2011-10-21 Thread Camiel Dobbelaar
Adapted from NetBSD, which in turn got the id's from Linux.

Not sure about the PUC_MAX_PORTS bump from 8 to 16 (and the comment), it 
grows pucdata.o from 13k to 23k (on i386) for just one card.


Index: sys/dev/pci/pcidevs
===
RCS file: /cvs/src/sys/dev/pci/pcidevs,v
retrieving revision 1.1624
diff -u -r1.1624 pcidevs
--- sys/dev/pci/pcidevs 9 Oct 2011 21:39:11 -   1.1624
+++ sys/dev/pci/pcidevs 11 Oct 2011 06:57:54 -
@@ -320,6 +320,7 @@
 vendor SYMPHONY2   0x1c1c  Symphony Labs
 vendor TEKRAM2 0x1de1  Tekram
 vendor TEHUTI  0x1fc9  Tehuti Networks
+vendor SUNIX2  0x1fd4  Sunix
 vendor HINT0x3388  Hint
 vendor 3DLABS  0x3d3d  3D Labs
 vendor AVANCE2 0x4005  Avance Logic
@@ -5230,6 +5231,7 @@
 /* Sunix */
 product SUNIX 40XX 0x7168  40XX
 product SUNIX 4018A0x7268  4018A
+product SUNIX2 50XX0x1999  50XX
 
 /* Surecom products */
 product SURECOM NE34   0x0e34  NE-34
Index: sys/dev/pci/pucdata.c
===
RCS file: /cvs/src/sys/dev/pci/pucdata.c,v
retrieving revision 1.75
diff -u -r1.75 pucdata.c
--- sys/dev/pci/pucdata.c   9 Oct 2011 21:46:32 -   1.75
+++ sys/dev/pci/pucdata.c   11 Oct 2011 06:57:54 -
@@ -1463,6 +1463,120 @@
},
 
/*
+* SUNIX 50XX series of serial/parallel combo cards.
+* Tested with 5066A.
+*/
+   {   /* SUNIX 5008 1P */
+   {   PCI_VENDOR_SUNIX2, PCI_PRODUCT_SUNIX2_50XX, 0x1fd4, 0x0100 },
+   {   0x, 0x, 0x, 0xeff0 },
+   {
+   { PUC_PORT_TYPE_LPT, 0x14, 0x00, 0x00 },
+   },
+   },
+
+   {   /* SUNIX 5016 16S */
+   {   PCI_VENDOR_SUNIX2, PCI_PRODUCT_SUNIX2_50XX, 0x1fd4, 0x0010 },
+   {   0x, 0x, 0x, 0x },
+   {
+   { PUC_PORT_TYPE_COM, 0x10, 0x00, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x08, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x10, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x18, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x00, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x08, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x10, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x18, COM_FREQ * 8 },
+/*
+ * PUC_MAX_PORTS needs to be raised in order to reach these ports
+ */
+#if PUC_MAX_PORTS >= 16
+   { PUC_PORT_TYPE_COM, 0x14, 0x20, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x28, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x30, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x38, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x40, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x48, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x50, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x58, COM_FREQ * 8 },
+#endif /* PUC_MAX_PORTS >= 16 */
+   },
+   },
+
+   {   /* SUNIX 5027 1S */
+   {   PCI_VENDOR_SUNIX2, PCI_PRODUCT_SUNIX2_50XX, 0x1fd4, 0x0001 },
+   {   0x, 0x, 0x, 0x },
+   {
+   { PUC_PORT_TYPE_COM, 0x10, 0x00, COM_FREQ * 8 },
+   },
+   },
+
+   {   /* SUNIX 5037 2S */
+   {   PCI_VENDOR_SUNIX2, PCI_PRODUCT_SUNIX2_50XX, 0x1fd4, 0x0002 },
+   {   0x, 0x, 0x, 0x },
+   {
+   { PUC_PORT_TYPE_COM, 0x10, 0x00, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x08, COM_FREQ * 8 },
+   },
+   },
+
+   {   /* SUNIX 5056 4S */
+   {   PCI_VENDOR_SUNIX2, PCI_PRODUCT_SUNIX2_50XX, 0x1fd4, 0x0004 },
+   {   0x, 0x, 0x, 0x },
+   {
+   { PUC_PORT_TYPE_COM, 0x10, 0x00, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x08, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x10, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x18, COM_FREQ * 8 },
+   },
+   },
+
+   {   /* SUNIX 5066 8S */
+   {   PCI_VENDOR_SUNIX2, PCI_PRODUCT_SUNIX2_50XX, 0x1fd4, 0x0008 },
+   {   0x, 0x, 0x, 0x },
+   {
+   { PUC_PORT_TYPE_COM, 0x10, 0x00, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x08, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x10, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x10, 0x18, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x00, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x08, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x10, COM_FREQ * 8 },
+   { PUC_PORT_TYPE_COM, 0x14, 0x18, COM_FREQ * 8 },
+   },
+   },
+
+   {   /* SUNIX 5069 1S / 1P */
+   {   P

carp destroy

2011-10-21 Thread Camiel Dobbelaar
Destroying a carp interface does not restore the demote count of the 
carp group.

Reason is that the interface is removed from the carp group by 
if_clone_destroy() before carp_clone_destroy() is run.  The second reason 
is a simple bug introduced in ip_carp.c, rev 1.175.

The diff removes if_delgroup() from if_clone_destroy().  This is possible 
because if_detach() that is run later on removes the interface from all 
the groups as well.  This seems to work fine.  I cannot deduce from the 
CVS history why it was added to if_clone_destroy...  can anyone 
remember?  (Henning?)

After the diff (fxp1 has no carrier on purpose):

camield@rifraf $ ifconfig fxp1
fxp1: flags=8843 mtu 1500
lladdr 00:d0:b7:47:3c:07
priority: 0
media: Ethernet autoselect (none)
status: no carrier
inet 10.38.38.10 netmask 0xff00 broadcast 10.38.38.255
inet6 fe80::2d0:b7ff:fe47:3c07%fxp1 prefixlen 64 scopeid 0x3
camield@rifraf $ sudo ifconfig carp11 vhid 11 carpdev fxp1
camield@rifraf $ ifconfig carp11
carp11: flags=8803 mtu 1500
lladdr 00:00:5e:00:01:0b
priority: 0
carp: INIT carpdev fxp1 vhid 11 advbase 1 advskew 0
groups: carp
inet6 fe80::200:5eff:fe00:10b%carp11 prefixlen 64 scopeid 0x6
camield@rifraf $ ifconfig -g carp
carp: carp demote count 1
camield@rifraf $ sudo ifconfig carp11 destroy
camield@rifraf $ ifconfig -g carp 
carp: carp demote count 0
camield@rifraf $ tail -2 /var/log/messages  

Oct 21 13:48:25 rifraf /bsd: carp: carp11 demoted group carp by 1 to 1 
(carpdev)
Oct 21 13:48:33 rifraf /bsd: carp: carp11 demoted group carp by -1 to 0 
(detach)


Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.239
diff -u -p -r1.239 if.c
--- net/if.c9 Jul 2011 00:47:18 -   1.239
+++ net/if.c21 Oct 2011 08:23:04 -
@@ -712,7 +712,7 @@ if_clone_destroy(const char *name)
 {
struct if_clone *ifc;
struct ifnet *ifp;
-   int s, ret;
+   int s;
 
ifc = if_clone_lookup(name, NULL);
if (ifc == NULL)
@@ -731,12 +731,7 @@ if_clone_destroy(const char *name)
splx(s);
}
 
-   if_delgroup(ifp, ifc->ifc_name);
-
-   if ((ret = (*ifc->ifc_destroy)(ifp)) != 0)
-   if_addgroup(ifp, ifc->ifc_name);
-
-   return (ret);
+   return ((*ifc->ifc_destroy)(ifp));
 }
 
 /*
Index: netinet/ip_carp.c
===
RCS file: /cvs/src/sys/netinet/ip_carp.c,v
retrieving revision 1.191
diff -u -p -r1.191 ip_carp.c
--- netinet/ip_carp.c   16 Oct 2011 21:07:19 -  1.191
+++ netinet/ip_carp.c   21 Oct 2011 08:23:04 -
@@ -980,7 +980,7 @@ carpdetach(struct carp_softc *sc)
carp_del_all_timeouts(sc);
 
if (sc->sc_demote_cnt)
-   carp_group_demote_adj(&sc->sc_if, sc->sc_demote_cnt, "detach");
+   carp_group_demote_adj(&sc->sc_if, -sc->sc_demote_cnt, "detach");
sc->sc_suppress = 0;
sc->sc_sendad_errors = 0;



Re: rdr-to ::1

2011-08-02 Thread Camiel Dobbelaar
On 1-8-2011 23:59, Alexander Bluhm wrote:
> On Wed, Jul 27, 2011 at 12:44:21AM +0200, Alexander Bluhm wrote:
>> On Fri, May 20, 2011 at 11:54:09AM +0200, Camiel Dobbelaar wrote:
>>> I'll spend some more time on this, but maybe there's an IPv6 guru that
>>> can lend a hand?  :-)
>>
>> Just removing the check seems wrong to me.  This would allow ::1
>> addresses from the wire.  Also the goto hbhcheck would get lost.
> 
> I have reconsidered the existing loopback check in ip6_input().  It
> is wrong.  The check that ::1 is not allowed from the wire must be
> before pf_test().  Otherwise pf could reroute or redirect such a
> packet.
> 
> KAME moved the check in rev 1.189 of their ip6_input.c.  They also
> removed the special goto ours logic for ::1.  I do not change that
> now before release so leave the goto where it is.
> 
> Redirect or nat to ::1 should work with this diff.  But I still
> believe that divert-to is more suitable for that.
> 
> ok?


Fixes the problem for me.

And looks correct according to that KAME rev.

(and I agree with the remark about the divert-to, I'll prepare a manpage
ipv6 example for ftp-proxy)

--
Cam



Re: svlan taghash

2011-07-04 Thread Camiel Dobbelaar
On Mon, 4 Jul 2011, Camiel Dobbelaar wrote:
> When svlan(4) was introduced, it got its own taghash in if_vlan.c.
> This wasn't necessary as "etype" was already checked in the old hash
> lookup.

Hmmm, etype wasn't checked in rev 1.83 so it looks like my memory is 
shady...  I also see a comment about "ID conflicts" so my diff is probably 
wrong.  Disregard please.



svlan taghash

2011-07-04 Thread Camiel Dobbelaar
When svlan(4) was introduced, it got its own taghash in if_vlan.c.
This wasn't necessary as "etype" was already checked in the old hash
lookup.

So simplify the code again, and use the savings for some extra hash 
buckets.  :-)

Survives this stacked test between i386 and sparc64 (use .1 on one system, 
and .2 on the other and ping all ip's):

#!/bin/sh
VLANDEV="fxp0"
IP4="1"

ifconfig svlan100 vlan 100 vlanprio 1 vlandev $VLANDEV \
10.10.101.$IP4 netmask 255.255.255.0
ifconfig svlan200 vlan 200 vlanprio 2 vlandev svlan100 \
10.10.102.$IP4 netmask 255.255.255.0
ifconfig vlan300  vlan 300 vlanprio 3 vlandev svlan200 \
10.10.103.$IP4 netmask 255.255.255.0
ifconfig svlan300 vlan 300 vlanprio 4 vlandev vlan300 \
10.10.104.$IP4 netmask 255.255.255.0
ifconfig vlan200  vlan 200 vlanprio 5 vlandev svlan300 \
10.10.105.$IP4 netmask 255.255.255.0
ifconfig vlan100  vlan 100 vlanprio 6 vlandev vlan200 \
10.10.106.$IP4 netmask 255.255.255.0

tcpdump:

19:53:17.566551 QinQ svid 100 pri 1 QinQ svid 200 pri 2 802.1Q vid 300 pri 
3 QinQ svid 300 pri 4 802.1Q vid 200 pri 5 802.1Q vid 100 pri 6 
10.10.106.1 > 10.10.106.2: icmp: echo request

19:53:17.566636 QinQ svid 100 pri 1 QinQ svid 200 pri 2 802.1Q vid 300 pri 
3 QinQ svid 300 pri 4 802.1Q vid 200 pri 5 802.1Q vid 100 pri 6 
10.10.106.2 > 10.10.106.1: icmp: echo reply


--
Cam


Index: if_vlan.c
===
RCS file: /cvs/src/sys/net/if_vlan.c,v
retrieving revision 1.87
diff -u -r1.87 if_vlan.c
--- if_vlan.c   18 Feb 2011 17:06:45 -  1.87
+++ if_vlan.c   4 Jul 2011 18:33:47 -
@@ -77,11 +77,11 @@
 #include 
 
 extern struct  ifaddr  **ifnet_addrs;
-u_long vlan_tagmask, svlan_tagmask;
+u_long vlan_tagmask;
 
-#define TAG_HASH_SIZE  32
+#define TAG_HASH_SIZE  64  
 #define TAG_HASH(tag)  (tag & vlan_tagmask)
-LIST_HEAD(vlan_taghash, ifvlan)*vlan_tagh, *svlan_tagh;
+LIST_HEAD(vlan_taghash, ifvlan)*vlan_tagh;
 
 void   vlan_start(struct ifnet *ifp);
 intvlan_ioctl(struct ifnet *ifp, u_long cmd, caddr_t addr);
@@ -107,18 +107,12 @@
 void
 vlanattach(int count)
 {
-   /* Normal VLAN */
vlan_tagh = hashinit(TAG_HASH_SIZE, M_DEVBUF, M_NOWAIT,
&vlan_tagmask);
if (vlan_tagh == NULL)
panic("vlanattach: hashinit");
-   if_clone_attach(&vlan_cloner);
 
-   /* Service-VLAN for QinQ/802.1ad provider bridges */
-   svlan_tagh = hashinit(TAG_HASH_SIZE, M_DEVBUF, M_NOWAIT,
-   &svlan_tagmask);
-   if (svlan_tagh == NULL)
-   panic("vlanattach: hashinit");
+   if_clone_attach(&vlan_cloner);
if_clone_attach(&svlan_cloner);
 }
 
@@ -277,13 +271,11 @@
 {
struct ifvlan *ifv;
struct ifnet *ifp = m->m_pkthdr.rcvif;
-   struct vlan_taghash *tagh;
u_int tag;
u_int16_t etype;
 
if (m->m_flags & M_VLANTAG) {
etype = ETHERTYPE_VLAN;
-   tagh = vlan_tagh;
tag = EVL_VLANOFTAG(m->m_pkthdr.ether_vtag);
} else {
if (m->m_len < EVL_ENCAPLEN &&
@@ -293,11 +285,10 @@
}
 
etype = ntohs(eh->ether_type);
-   tagh = etype == ETHERTYPE_QINQ ? svlan_tagh : vlan_tagh;
tag = EVL_VLANOFTAG(ntohs(*mtod(m, u_int16_t *)));
}
 
-   LIST_FOREACH(ifv, &tagh[TAG_HASH(tag)], ifv_list) {
+   LIST_FOREACH(ifv, &vlan_tagh[TAG_HASH(tag)], ifv_list) {
if (m->m_pkthdr.rcvif == ifv->ifv_p && tag == ifv->ifv_tag &&
etype == ifv->ifv_type)
break;
@@ -359,7 +350,6 @@
 {
struct ifaddr *ifa1, *ifa2;
struct sockaddr_dl *sdl1, *sdl2;
-   struct vlan_taghash *tagh;
u_int flags;
int s;
 
@@ -449,8 +439,7 @@
 
ifv->ifv_tag = tag;
s = splnet();
-   tagh = ifv->ifv_type == ETHERTYPE_QINQ ? svlan_tagh : vlan_tagh;
-   LIST_INSERT_HEAD(&tagh[TAG_HASH(tag)], ifv, ifv_list);
+   LIST_INSERT_HEAD(&vlan_tagh[TAG_HASH(tag)], ifv, ifv_list);
 
/* Register callback for physical link state changes */
ifv->lh_cookie = hook_establish(p->if_linkstatehooks, 1,



bridge interface search

2011-06-28 Thread Camiel Dobbelaar
This diff changes the if_bridge pointer of an interface (struct ifnet) to 
not point to "the bridge" but to its own "bridge interface" configuration.
Should be safe because an interface can only be part of one bridge.

This way all the LIST_FOREACH lineair searches in the bridge code can be 
replaced.  There are also two of those in the forwarding path so this diff 
should make the bridge faster, especially with lots of interfaces.

I've renamed it to "if_bridge_port" to smoke out all users and because 
it's clearer.  (my fingers itch to rename "bridge_iflist" too as noted in 
the diff :-) )

Most of the diff is mechanical.  The if_ether.c change got pretty hairy 
though and could some more eyes.

And the whole thing could use some substantial testing...



Index: sys/dev/isa/if_ie.c
===
RCS file: /cvs/src/sys/dev/isa/if_ie.c,v
retrieving revision 1.35
diff -u -r1.35 if_ie.c
--- sys/dev/isa/if_ie.c 28 Nov 2008 02:44:17 -  1.35
+++ sys/dev/isa/if_ie.c 24 Jun 2011 18:29:18 -
@@ -1054,16 +1054,16 @@
 */
 #if NBPFILTER > 0
*to_bpf = (sc->sc_arpcom.ac_if.if_bpf != 0) ||
-   (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   (sc->sc_arpcom.ac_if.if_bridge_port != NULL);
 #else
-   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge_port != NULL);
 #endif
/* If for us, accept and hand up to BPF */
if (ether_equal(eh->ether_dhost, sc->sc_arpcom.ac_enaddr))
return 1;
 
 #if NBPFILTER > 0
-   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge == NULL)
+   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge_port == NULL)
*to_bpf = 2; /* we don't need to see it */
 #endif
 
@@ -1095,9 +1095,9 @@
 */
 #if NBPFILTER > 0
*to_bpf = (sc->sc_arpcom.ac_if.if_bpf != 0) ||
-   (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   (sc->sc_arpcom.ac_if.if_bridge_port != NULL);
 #else
-   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge != NULL);
+   *to_bpf = (sc->sc_arpcom.ac_if.if_bridge_port != NULL);
 #endif
/* We want to see multicasts. */
if (eh->ether_dhost[0] & 1)
@@ -1109,7 +1109,7 @@
 
/* Anything else goes to BPF but nothing else. */
 #if NBPFILTER > 0
-   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge == NULL)
+   if (*to_bpf && sc->sc_arpcom.ac_if.if_bridge_port == NULL)
*to_bpf = 2;
 #endif
return 1;
Index: sys/net/bridgestp.c
===
RCS file: /cvs/src/sys/net/bridgestp.c,v
retrieving revision 1.39
diff -u -r1.39 bridgestp.c
--- sys/net/bridgestp.c 20 Nov 2010 14:23:09 -  1.39
+++ sys/net/bridgestp.c 24 Jun 2011 18:29:18 -
@@ -1644,7 +1644,7 @@
 
if (ifp->if_type == IFT_BRIDGE)
return;
-   sc = (struct bridge_softc *)ifp->if_bridge;
+   sc = ((struct bridge_iflist *)ifp->if_bridge_port)->bridge_sc;
 
s = splnet();
LIST_FOREACH(p, &sc->sc_iflist, next) {
@@ -2133,15 +2133,8 @@
err = ENOENT;
break;
}
-   if ((caddr_t)sc != ifs->if_bridge) {
-   err = ESRCH;
-   break;
-   }
-   LIST_FOREACH(p, &sc->sc_iflist, next) {
-   if (p->ifp == ifs)
-   break;
-   }
-   if (p == LIST_END(&sc->sc_iflist)) {
+   p = (struct bridge_iflist *)ifs->if_bridge_port;
+   if (p == NULL || p->bridge_sc != sc) {
err = ESRCH;
break;
}
Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.234
diff -u -r1.234 if.c
--- sys/net/if.c13 Mar 2011 15:31:41 -  1.234
+++ sys/net/if.c24 Jun 2011 18:29:18 -
@@ -531,7 +531,7 @@
 
 #if NBRIDGE > 0
/* Remove the interface from any bridge it is part of.  */
-   if (ifp->if_bridge)
+   if (ifp->if_bridge_port)
bridge_ifdetach(ifp);
 #endif
 
@@ -1101,7 +1101,7 @@
carp_carpdev_state(ifp);
 #endif
 #if NBRIDGE > 0
-   if (ifp->if_bridge)
+   if (ifp->if_bridge_port)
bstp_ifstate(ifp);
 #endif
rt_ifmsg(ifp);
@@ -1137,7 +1137,7 @@
carp_carpdev_state(ifp);
 #endif
 #if NBRIDGE > 0
-   if (ifp->if_bridge)
+   if (ifp->if_bridge_port)
bstp_ifstate(ifp);
 #endif
rt_ifmsg(ifp);
Index: sys/net/if.h
===
RCS file: /cvs/src/sys/net/if.h,

ifconfig vlan diff

2011-06-26 Thread Camiel Dobbelaar
vlandev (parent) does not need to be physical, and can be changed on the 
fly now.


Index: ifconfig.8
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v
retrieving revision 1.216
diff -u -r1.216 ifconfig.8
--- ifconfig.8  13 Mar 2011 21:24:20 -  1.216
+++ ifconfig.8  21 Jun 2011 13:02:03 -
@@ -1487,11 +1487,11 @@
 vlan header for packets sent from the vlan interface.
 This value cannot be changed once it is set for an interface.
 .It Cm vlandev Ar parent-interface
-Associate with physical interface
-.Ar iface .
+Associate with interface
+.Ar parent-interface .
 Packets transmitted through the vlan interface will be
-diverted to the specified physical interface
-.Ar iface
+diverted to the specified interface
+.Ar parent-interface
 with 802.1Q vlan encapsulation.
 Packets with 802.1Q encapsulation received
 by the parent interface with the correct vlan tag will be diverted to
@@ -1506,12 +1506,8 @@
 the interface name, for instance
 .Cm vlan5
 will be assigned 802.1Q tag 5.
-If the vlan interface already has
-a physical interface associated with it, this command will fail.
-To change the association to another physical interface, the existing
-association must be cleared first.
 .It Fl vlandev
-Disassociate from the physical interface.
+Disassociate from the parent interface.
 This breaks the link between the vlan interface and its parent,
 clears its vlan tag, flags, and link address, and shuts the interface down.
 .It Cm vlanprio Ar vlan-priority



mark arp broadcasts in the mbuf

2011-06-26 Thread Camiel Dobbelaar
Mark ARP request broadcasts as such in the mbuf flags.  FreeBSD and NetBSD 
both have this.

Without this, bridge_output() drops ARP request broadcasts on interfaces 
without the discover flag:

if ((p->bif_flags & IFBIF_DISCOVER) == 0 &&
(m->m_flags & (M_BCAST | M_MCAST)) == 0)
continue;


Index: if_ether.c
===
RCS file: /cvs/src/sys/netinet/if_ether.c,v
retrieving revision 1.88
diff -u -r1.88 if_ether.c
--- if_ether.c  22 Jul 2010 00:41:55 -  1.88
+++ if_ether.c  26 Jun 2011 17:30:29 -
@@ -359,6 +359,7 @@
bcopy((caddr_t)tip, (caddr_t)ea->arp_tpa, sizeof(ea->arp_tpa));
sa.sa_family = pseudo_AF_HDRCMPLT;
sa.sa_len = sizeof(sa);
+   m->m_flags |= M_BCAST;
(*ifp->if_output)(ifp, m, &sa, (struct rtentry *)0);
 }
 
@@ -994,6 +995,7 @@
   sizeof(ea->arp_tha));
sa.sa_family = pseudo_AF_HDRCMPLT;
sa.sa_len = sizeof(sa);
+   m->m_flags |= M_BCAST;
ifp->if_output(ifp, m, &sa, (struct rtentry *)0);
 }



rdr-to ::1

2011-05-20 Thread Camiel Dobbelaar
inet6 pf rules that "rdr-to ::1" do not work currently.  Matching
packets just disappear and the counter "packets that violated scope
rules" from a "netstat -s -p ip6" gets incremented.

It came up before on misc@:
http://marc.info/?t=12680425912&r=1&w=2

The attached diff removes the check (let's call it check #1) that drops
the packet.  This is just to point out where the problem is because the
BSD's have diverged here:

FreeBSD _replaced_ check #1 with another check (#2) in this diff from
2004 (from kame):
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/ip6_input.c.diff?r1=1.68;r2=1.69

NetBSD _replaced_ check #1 it with something totally different
(attributed to jinmei@kame)
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/netinet6/ip6_input.c?rev=1.81&content-type=text/x-cvsweb-markup

Check #2 was _added_ to OpenBSD in 2006 (attributed to jinmei@kame):
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/netinet6/ip6_input.c.diff?r1=1.72;r2=1.73

Basically, check #1 is gone in FreeBSD and NetBSD and the diff syncs us
closer to FreeBSD.

I'm unsure if it's the right thing to do though for a few reasons. The
FreeBSD diff has a second part that I cannot yet tell is related or not.
 Or maybe the NetBSD diff could be better.  And OpenBSD also seems to
have other checks in this area.

I'll spend some more time on this, but maybe there's an IPv6 guru that
can lend a hand?  :-)

--
Cam




Index: ip6_input.c
===
RCS file: /cvs/src/sys/netinet6/ip6_input.c,v
retrieving revision 1.99
diff -u -r1.99 ip6_input.c
--- ip6_input.c 3 Apr 2011 13:56:05 -   1.99
+++ ip6_input.c 20 May 2011 09:30:14 -
@@ -270,7 +270,6 @@
in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_addrerr);
goto bad;
}
-
if (IN6_IS_ADDR_MC_INTFACELOCAL(&ip6->ip6_dst) &&
!(m->m_flags & M_LOOP)) {
/*
@@ -340,19 +339,6 @@
ip6 = mtod(m, struct ip6_hdr *);
srcrt = !IN6_ARE_ADDR_EQUAL(&odst, &ip6->ip6_dst);
 #endif
-
-   if (IN6_IS_ADDR_LOOPBACK(&ip6->ip6_src) ||
-   IN6_IS_ADDR_LOOPBACK(&ip6->ip6_dst)) {
-   if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) {
-   ours = 1;
-   deliverifp = m->m_pkthdr.rcvif;
-   goto hbhcheck;
-   } else {
-   ip6stat.ip6s_badscope++;
-   in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_addrerr);
-   goto bad;
-   }
-   }

/* drop packets if interface ID portion is already filled */
if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) {



Re: vlan vlandev fix

2011-02-17 Thread Camiel Dobbelaar
On 16-2-2011 14:27, Reyk Floeter wrote:
> My previous change to vlan(4) allows to change the vlandev and vlan id
> on-the-fly without re-creating the vlan interface.

I hesitated to ask this simple question, because I might be overlooking
something, but what exactly is the advantage over just using
/etc/netstart?  Less dropped packets?


--
Cam



ifconfig vlan tag range

2011-02-09 Thread Camiel Dobbelaar
The valid range for vlan tags in OpenBSD is 0-4095 (inclusive).  Fix
both checks.

Makes vlan0 autoconfig work (obj/ifconfig has the diff):

# ifconfig vlan0 vlandev fxp0
ifconfig: invalid vlan tag and device specification

# obj/ifconfig vlan0 vlandev fxp0

and gives a better error message on tags > 4095:

# ifconfig vlan1 vlan 5000 vlandev fxp0
ifconfig: SIOCSETVLAN: Invalid argument

# obj/ifconfig vlan1 vlan 5000 vlandev fxp0
ifconfig: vlan tag 5000: too large

--
Cam
Index: ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.242
diff -u -r1.242 ifconfig.c
--- ifconfig.c  9 Nov 2010 21:14:47 -   1.242
+++ ifconfig.c  5 Feb 2011 14:28:56 -
@@ -3351,7 +3351,7 @@
struct vlanreq vreq;
const char *errmsg = NULL;
 
-   __tag = tag = strtonum(val, 0, 65535, &errmsg);
+   __tag = tag = strtonum(val, 0, 4095, &errmsg);
if (errmsg)
errx(1, "vlan tag %s: %s", val, errmsg);
__have_tag = 1;
@@ -3411,7 +3411,7 @@
 
if (!__have_tag && vreq.vlr_tag == 0) {
skip = strcspn(ifr.ifr_name, "0123456789");
-   tag = strtonum(ifr.ifr_name + skip, 1, 4095, &estr);
+   tag = strtonum(ifr.ifr_name + skip, 0, 4095, &estr);
if (estr != NULL)
errx(1, "invalid vlan tag and device specification");
vreq.vlr_tag = tag;



Re: carp shutdown in /etc/rc

2011-02-05 Thread Camiel Dobbelaar
On 5-2-2011 11:02, Henning Brauer wrote:
> on the other side, fixing "ifconfig very slow with lots of interfaces"
> deserves to be fixed anyway. looking at the code - either getifaddrs is
> slow (which in turn wouldn't be ifconfig only), or the ioctls ifconfig
> does in getinfo(). that's "just" 5 tho. wonder wether making one big
> ioctl that returns everything those 5 would help - wouldn't win a beauty
> price for sure. not that ifconfig would ever qualify.

If ifconfig.c is instrumented with a little perl script:

# cat ioctl_debug.pl
#!/usr/bin/perl -pi.orig

if (m{ioctl\(\w+, (\w+),}) {
my $sig = $1;
s{ioctl\(}{printf("ioctl $sig\\n") != -1 && ioctl(};
}

and then build like this:
# perl ioctl_debug.pl ifconfig.c
# make
# make install

you can see this:

# ifconfig carp80
ioctl SIOCGIFFLAGS
ioctl SIOCGIFXFLAGS
ioctl SIOCGIFMETRIC
ioctl SIOCGIFMTU
ioctl SIOCGIFRDOMAIN
ioctl SIOCGIFFLAGS
ioctl SIOCGIFXFLAGS
ioctl SIOCGIFMETRIC
ioctl SIOCGIFMTU
ioctl SIOCGIFRDOMAIN
carp80: flags=8843 mtu 1500
lladdr 00:00:5e:00:01:50
ioctl SIOCGIFDESCR
ioctl SIOCGIFPRIORITY
priority: 0
ioctl SIOCGETKALIVE
ioctl SIOCGETVLAN
ioctl SIOCGVH
carp: BACKUP carpdev vlan80 vhid 80 advbase 1 advskew 0
ioctl SIOCGETPFSYNC
ioctl PPPOEGETPARMS
ioctl SIOCGIFTIMESLOT
ioctl SIOCGIFGENERIC
ioctl SIOCGTRUNKPORT
ioctl SIOCGTRUNK
ioctl SIOCGETPFLOW
ioctl SIOCGIFGROUP
ioctl SIOCGIFGROUP
groups: carp
ioctl SIOCGIFMEDIA
status: backup
ioctl SIOCGLIFPHYADDR
inet6 fe80::200:5eff:fe00:150%carp80ioctl SIOCGIFNETMASK_IN6
 prefixlen 64ioctl SIOCGIFAFLAG_IN6
 scopeid 0x7ioctl SIOCGIFALIFETIME_IN6

inet 10.10.80.1ioctl SIOCGIFNETMASK
 netmask 0xff00ioctl SIOCGIFBRDADDR
 broadcast 10.10.80.255

# ifconfig vlan
ioctl SIOCGIFGMEMB
ioctl SIOCGIFGMEMB
ioctl SIOCGIFFLAGS
ioctl SIOCGIFXFLAGS
ioctl SIOCGIFMETRIC
ioctl SIOCGIFMTU
ioctl SIOCGIFRDOMAIN
vlan80: flags=8943 mtu 1500
lladdr 00:d0:59:b6:f4:27
ioctl SIOCGIFDESCR
ioctl SIOCGIFPRIORITY
priority: 0
ioctl SIOCGETKALIVE
ioctl SIOCGETVLAN
ioctl SIOCGETVLANPRIO
vlan: 80 priority: 0 parent interface: fxp0
ioctl SIOCGVH
ioctl SIOCGETPFSYNC
ioctl PPPOEGETPARMS
ioctl SIOCGIFTIMESLOT
ioctl SIOCGIFGENERIC
ioctl SIOCGTRUNKPORT
ioctl SIOCGTRUNK
ioctl SIOCGETPFLOW
ioctl SIOCGIFGROUP
ioctl SIOCGIFGROUP
groups: vlan
ioctl SIOCGIFMEDIA
status: active
ioctl SIOCGLIFPHYADDR
inet6 fe80::2d0:59ff:feb6:f427%vlan80ioctl SIOCGIFNETMASK_IN6
 prefixlen 64ioctl SIOCGIFAFLAG_IN6
 scopeid 0x6ioctl SIOCGIFALIFETIME_IN6

inet 10.10.80.2ioctl SIOCGIFNETMASK
 netmask 0xff00ioctl SIOCGIFBRDADDR
 broadcast 10.10.80.255

# ifconfig fxp
ioctl SIOCGIFGMEMB
ioctl SIOCGIFFLAGS
ioctl SIOCGIFXFLAGS
ioctl SIOCGIFMETRIC
ioctl SIOCGIFMTU
ioctl SIOCGIFRDOMAIN
fxp0:
flags=8b43 mtu 1500
lladdr 00:d0:59:b6:f4:27
ioctl SIOCGIFDESCR
ioctl SIOCGIFPRIORITY
priority: 0
ioctl SIOCGETKALIVE
ioctl SIOCGETVLAN
ioctl SIOCGVH
ioctl SIOCGETPFSYNC
ioctl PPPOEGETPARMS
ioctl SIOCGIFTIMESLOT
ioctl SIOCGIFGENERIC
ioctl SIOCGTRUNKPORT
ioctl SIOCGTRUNK
ioctl SIOCGETPFLOW
ioctl SIOCGIFGROUP
ioctl SIOCGIFGROUP
groups: egress
ioctl SIOCGIFMEDIA
ioctl SIOCGIFMEDIA
media: Ethernet autoselect (100baseTX full-duplex)
status: active
ioctl SIOCG80211NWID
ioctl SIOCG80211NWKEY
ioctl SIOCG80211WPAPSK
ioctl SIOCG80211POWER
ioctl SIOCG80211CHANNEL
ioctl SIOCG80211BSSID
ioctl SIOCG80211TXPOWER
ioctl SIOCG80211WPAPARMS
ioctl SIOCGLIFPHYADDR
inet6 fe80::2d0:59ff:feb6:f427%fxp0ioctl SIOCGIFNETMASK_IN6
 prefixlen 64ioctl SIOCGIFAFLAG_IN6
 scopeid 0x2ioctl SIOCGIFALIFETIME_IN6

inet 192.168.28.129ioctl SIOCGIFNETMASK
 netmask 0xff00ioctl SIOCGIFBRDADDR
 broadcast 192.168.28.255

So yeah, it looks like ifconfig can be made a little smarter.



Re: carp shutdown in /etc/rc

2011-02-05 Thread Camiel Dobbelaar
On 5-2-2011 2:15, Ted Unangst wrote:
> On Fri, Feb 4, 2011 at 7:21 AM, Camiel Dobbelaar  wrote:
>> With hundreds of (vlan) interfaces, a shutdown takes quite a while.
>># bring carp interfaces down gracefully
>> -   ifconfig | while read a b; do
>> +   ifconfig carp | while read a b; do
> 
> going back to the original issue, does "ifconfig | grep carp | while
> read a b" make things faster?

No, it's ifconfig itself that takes long.

With 2000 vlan interfaces and 1 carp interface:
# time ifconfig | grep ^carp
carp80: flags=8843 mtu 1500
1m11.29s real 0m12.07s user 0m59.03s system
# time ifconfig carp | grep ^carp
carp80: flags=8843 mtu 1500
0m0.06s real 0m0.01s user 0m0.05s system

1000
# time ifconfig | grep ^carp
carp80: flags=8843 mtu 1500
0m16.66s real 0m2.88s user 0m13.72s system
# time ifconfig carp | grep ^carp
carp80: flags=8843 mtu 1500
0m0.03s real 0m0.00s user 0m0.02s system

500
# time ifconfig | grep ^carp
carp80: flags=8843 mtu 1500
0m3.18s real 0m0.67s user 0m2.49s system
# time ifconfig carp | grep ^carp
carp80: flags=8843 mtu 1500
0m0.02s real 0m0.00s user 0m0.01s system

200
# time ifconfig | grep ^carp
carp80: flags=8843 mtu 1500
0m0.35s real 0m0.07s user 0m0.27s system
# time ifconfig carp | grep ^carp
carp80: flags=8843 mtu 1500
0m0.01s real 0m0.00s user 0m0.00s system


It does not scale linearly, but the real world usage (200) is fine.  I
think we can drop the diff, since it turned out not to be so obvious and
clean...

--
Cam



Re: carp shutdown in /etc/rc

2011-02-04 Thread Camiel Dobbelaar
On 4-2-2011 15:06, Stuart Henderson wrote:
> On 2011/02/04 14:37, Camiel Dobbelaar wrote:
>> On 4-2-2011 13:32, Henning Brauer wrote:
>>> * Camiel Dobbelaar  [2011-02-04 13:21]:
>>>> With hundreds of (vlan) interfaces, a shutdown takes quite a while.
>>>> Fix below.
>>>
>>> hmm. this relies on all carp interfaces being in the carp interface
>>> group. while that is the default, it is not necessarily so.
>>
>> I didn't know that a groupname takes precendence, neither did the
>> manpage.  :-)
> 
>> +If an interface group with that name exists, all interfaces in the group
>> +will be shown.
> 
> seems it's more complicated than that - if you remove all interfaces
> from group carp, 'ifconfig carp' lists nothing:
> 
> $ ifconfig |grep ^carp
> carp1: flags=8802 mtu 1500
> carp2: flags=8802 mtu 1500
> $ ifconfig carp  
> $
> 
> but with another type, this doesn't apply:
> 
> $ sudo ifconfig bge0 group em
> $ ifconfig em | egrep '(0: |groups:)'
> bge0: flags=28843 mtu 1500
> groups: em
> $ sudo ifconfig bge0 -group em
> $ ifconfig em | egrep '(0: |groups:)'
> em0: flags=8b43 mtu 
> 1500
> groups: egress

It looks like group "carp" is never removed from the system, even if it
becomes empty.  That makes sense since some daemons use it as a default.

But the manpage diff is still correct, isn't it?

--
Cam



Re: carp shutdown in /etc/rc

2011-02-04 Thread Camiel Dobbelaar
On 4-2-2011 13:32, Henning Brauer wrote:
> * Camiel Dobbelaar  [2011-02-04 13:21]:
>> With hundreds of (vlan) interfaces, a shutdown takes quite a while.
>> Fix below.
> 
> hmm. this relies on all carp interfaces being in the carp interface
> group. while that is the default, it is not necessarily so.

I didn't know that a groupname takes precendence, neither did the
manpage.  :-)

--
Cam
Index: ifconfig.8
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v
retrieving revision 1.212
diff -u -r1.212 ifconfig.8
--- ifconfig.8  23 Dec 2010 08:54:59 -  1.212
+++ ifconfig.8  4 Feb 2011 13:32:21 -
@@ -97,11 +97,15 @@
 .Dq en0 .
 If no optional parameters are supplied, this string can instead be just
 .Dq name .
-In this case, all interfaces of that type will be displayed.
+If an interface group with that name exists, all interfaces in the group
+will be shown.
+Otherwise, 
+.Dq name
+is treated as a type and all interfaces of that type will be displayed.
 For example,
-.Dq carp
+.Dq fxp
 will display the current configuration of all
-.Xr carp 4
+.Xr fxp 4
 interfaces.
 .It Ar address_family
 Specifies the address family



carp shutdown in /etc/rc

2011-02-04 Thread Camiel Dobbelaar
With hundreds of (vlan) interfaces, a shutdown takes quite a while.

Fix below.

--
Cam


Index: rc
===
RCS file: /cvs/src/etc/rc,v
retrieving revision 1.348
diff -u -r1.348 rc
--- rc  14 Jan 2011 00:05:42 -  1.348
+++ rc  3 Feb 2011 15:59:25 -
@@ -167,7 +167,7 @@
echo /etc/rc.shutdown complete.

# bring carp interfaces down gracefully
-   ifconfig | while read a b; do
+   ifconfig carp | while read a b; do
case $a in
carp+([0-9]):) ifconfig ${a%:} down ;;
esac



carp configuration race condition

2011-02-03 Thread Camiel Dobbelaar
I had a hard time this week getting carp to work reliably between i386
and sparc64.

On i386 I used:
ifconfig carp80 vhid 80 carpdev fxp0 pass tachtig
ifconfig carp80 192.168.28.200 netmask 255.255.255.0

and on sparc64:
ifconfig carp80 vhid 80 carpdev fxp0 advskew 100 pass tachtig
ifconfig carp80 192.168.28.200 netmask 255.255.255.0

Sometimes this would work correctly.  Sometimes they would both become
master.  Running the ifconfig lines again could sometimes mess it up.
Or even just issuing a "ifconfig carp80" on sparc64 could mess it up.

When bad (both master) the "discarded for bad authentication" counter
from "netstat -s -p carp" would increase on both systems and "incorrect
hash" would appear in the logs.

It turned out that if the IP configuration was done first, things would
work ok, so for example on i386:
ifconfig carp80 192.168.28.200 netmask 255.255.255.0
ifconfig carp80 vhid 80 carpdev fxp0 pass tachtig

So I think the hmac calculation is run too early when the IP address is
not yet added (this works via a callback it looks like).  Attached diff
fixes my problem.

There's also a carp_hmac_prepare() at the bottom of the carp_ioctl()
function, that looks a little out of place.  Even a simple "ifconfig
carp80" makes it run 30 times.

I don't know why this problem doesn't bite more people, maybe because
carped machines are usually identical?

--
Cam

I hope the attachments survive the demiming, otherwise I'll resend later.
Index: ip_carp.c
===
RCS file: /cvs/src/sys/netinet/ip_carp.c,v
retrieving revision 1.180
diff -p -u -r1.180 ip_carp.c
--- ip_carp.c   21 Dec 2010 14:59:14 -  1.180
+++ ip_carp.c   3 Feb 2011 13:35:37 -
@@ -1974,10 +1974,10 @@ carp_addr_updated(void *v)
if (sc->sc_naddrs == 0 && sc->sc_naddrs6 == 0) {
sc->sc_if.if_flags &= ~IFF_UP;
carp_set_state_all(sc, INIT);
-   } else
-   carp_hmac_prepare(sc);
+   }
}
 
+   carp_hmac_prepare(sc);
carp_setrun_all(sc, 0);
 }
OpenBSD 4.9-beta (GENERIC) #5: Thu Feb  3 14:23:36 CET 2011
r...@xmts.sentia.nl:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) III Mobile CPU 866MHz ("GenuineIntel" 686-class) 864 
MHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PSE36,MMX,FXSR,SSE
real mem  = 401633280 (383MB)
avail mem = 384933888 (367MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 06/10/03, BIOS32 rev. 0 @ 0xfd7e0, SMBIOS 
rev. 2.31 @ 0xe0010 (48 entries)
bios0: vendor IBM version "1DET70WW (1.32 )" date 06/10/2003
bios0: IBM 2662EGG
apm0 at bios0: Power Management spec V1.2
apm0: battery life expectancy 96%
apm0: AC on, battery charge high
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xfd770/0x890
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdeb0/256 (14 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00)
pcibios0: PCI bus #6 is the last bus
bios0: ROM list: 0xc/0xc000 0xcc000/0x1000 0xcd000/0x1000 0xdc000/0x4000! 
0xe/0x1
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82830M Host" rev 0x02
intelagp0 at pchb0
agp0 at intelagp0: aperture at 0xd000, size 0xe40
ppb0 at pci0 dev 1 function 0 "Intel 82830M AGP" rev 0x02
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "ATI Radeon Mobility M6" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
radeondrm0 at vga1: irq 11
drm0 at radeondrm0
uhci0 at pci0 dev 29 function 0 "Intel 82801CA/CAM USB" rev 0x01: irq 11
uhci1 at pci0 dev 29 function 1 "Intel 82801CA/CAM USB" rev 0x01: irq 11
uhci2 at pci0 dev 29 function 2 "Intel 82801CA/CAM USB" rev 0x01: irq 11
ppb1 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0x41
pci2 at ppb1 bus 2
mem address conflict 0x5000/0x1000
mem address conflict 0x5010/0x1000
cbb0 at pci2 dev 3 function 0 "Ricoh 5C476 CardBus" rev 0x80: irq 11
cbb1 at pci2 dev 3 function 1 "Ricoh 5C476 CardBus" rev 0x80: irq 11
wi0 at pci2 dev 5 function 0 "Intersil PRISM2.5" rev 0x01: irq 11
wi0: PRISM2.5 ISL3874A(Mini-PCI) (0x8013), Firmware 1.1.0 (primary), 1.4.2 
(station), address 00:20:e0:8b:69:46
fxp0 at pci2 dev 8 function 0 "Intel PRO/100 VE" rev 0x41, i82562: irq 11, 
address 00:d0:59:b6:f4:27
inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0
cardslot0 at cbb0 slot 0 flags 0
cardbus0 at cardslot0: bus 3 device 0 cacheline 0x0, lattimer 0xb0
pcmcia0 at cardslot0
cardslot1 at cbb1 slot 1 flags 0
cardbus1 at cardslot1: bus 6 device 0 cacheline 0x0, lattimer 0xb0
pcmcia1 at cardslot1
ichpcib0 at pci0 dev 31 function 0 "Intel 82801CAM LPC" rev 0x01: 24-bit timer 
at 3579545Hz: SpeedStep
pciide0 at pci0 dev 31 function 1 "Intel 82801CAM IDE" rev 0x01: DMA, channel 0 
configured to compatibility, channel 1 co