On Thu, Jan 11, 2018 at 02:51:40PM +0100, Martin Pieuchot wrote:
> On 11/01/18(Thu) 21:59, David Gwynne wrote:
> > [...] 
> > when you say i break carp balancing, are you talking about the removal of 
> > the PACKET_TAG_CARP_BAL_IP tagging? PACKET_TAG_CARP_BAL_IP is only used in 
> > carp_lsdrop to clear the M_MCAST flag on the mbuf. M_MCAST wont be set on 
> > packets destined for the carp interface because we check ac_enaddr before 
> > checking if the packet is multicast or broadcast.
> I might be mistaken.  I just know that this code is fragile and I'd
> prefer to see such change tested isolated because carp(4) itself has
> multiple configurations.  CARP balancing has been fixed for 6.2 by
> friehm@ after being broken for multiple releases.


i had a look the commit related to this at

that change happened because ethernet_input checks and sets the
multicast bits before doing the ac_enaddr comparison. that's reversed
in the big diff i sent, but i pulled it out and attached it below.
it doesn't take away the mbuf tag carp uses, but it would make it

> > are you sure? carp and vlan on top of trunk should still function. however, 
> > trunk or bridge/switch on vlan is broken though :(
> I'm not sure, but I believe mixing input handler and hardcoding some in
> ether_input() won't fly :)


> > > If if_input() is called on a pseudo-interface we know we're already in
> > > a softnet process.
> > 
> > or a syscall.
> > 
> > the code above has pseudo interfaces recurse, where they'd loop either on 
> > ifih or at the task level. probably not a huge concern though.
> We can unroll the loop afterward.  What we need now is get rid of the
> queues.


> > > We could also think of doing something similar for if_enqueue() and call
> > > if_start() directly for pseudo-interface.
> > 
> > ill think about that. ive had some other experiments in that area we could 
> > look at too.
> I'd be glad to look at your experiments :)

but im shy :(

> > >> note that trunk and bridge/switch are still implemented using
> > >> interface input handlers at the moment.
> > > 
> > > If you want to get rid of the input handlers, I'd suggest doing it in
> > > the beginning of a release cycle and for all pseudo drivers at once.
> > 
> > considering the trunk and bridge/switch issue, it probably is best to do 
> > them all at once.
> Yes.  I'm aware that the SRP/input handler loop might be considered.
> We can probably gain some percents if we remove it.  However this is
> a micro optimization compared to other improvements that can be done.
> Plus it has the advantage of not having fragile #ifdef maze in the
> rest of the code.

im not expecting a performance difference with this stuff, it's
more about correctness. right now the behaviour of the stack is
arguably incorrect depending on when you configure pseudo-interfaces.
here are two examples:

if you configure a carp interface on myx0 after configuring a vlan
interface, carp_input input handler will be in the SRPL before
vlan_input. because of that we have this chunk in carp_input:

#if NVLAN > 0
         * If the underlying interface removed the VLAN header itself,
         * it's not for us.
        if (ISSET(m->m_flags, M_VLANTAG))
                return (0);

however, myx does not do hw vlan tagging and we dont check for
ETHERTYPE_VLAN or _QINQ there. carp may take a packet that vlan
should have taken.

secondly, if you configure switch(4) on an interface after configuring
it as a trunk port, switch will happily take all the packets on it.

it could be argued that these examples are a bit contrived, which
i will accept, and they could be fixed with some stricter code
checks, but they do demonstrate a problem with the stack.

in my mind, the order of processing by pseudo interfaces attached
to an ethernet interface would be:

1. let trunk(4) look at it

if trunk is configured on the port...

while we're talking about this, i would like to implement "independent"
ports on trunks. independent ports mean that if lacp isnt negotiated
on the ethernet interface, it can be used as a normal interface.
cisco do this by default on port channels, but other vendors require
explicit config. eg, on a force10^Wdell i need to add something
like "lacp ungroup member-independent port-channel 12" to allow
members of po12 to function as normal ports when lacp isnt negotiated.
this is useful if you want to pxe boot boxes that are usually
connected using lacp.

if we supported independent trunk ports, then we should allow
otherwise normal ethernet interface configuration.

2. let bridge(4) or switch(4) look at it

not both.

it only makes sense for a bridge/switch to get the packet if trunk
doesnt want it.

3. if it is vlan or qinq tagged, give it to vlan or drop it.

4. check ac_enaddr to see if it is for the current interface.

5. otherwise, give it to carp to try and match on

im arguing the system should enforce this order. right now people
tend to configure it in orders that make sense, but the exceptions
are ugly.

i did have a super big diff that did all this, but i didnt like the
bridge/switch handling in it. i can tart it up again though.

anyway, here's the ether_input shuffle:

Index: if_ethersubr.c
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.250
diff -u -p -r1.250 if_ethersubr.c
--- if_ethersubr.c      10 Jan 2018 00:14:38 -0000      1.250
+++ if_ethersubr.c      12 Jan 2018 07:07:32 -0000
@@ -326,17 +326,21 @@ ether_input(struct ifnet *ifp, struct mb
        ac = (struct arpcom *)ifp;
        eh = mtod(m, struct ether_header *);
-       if (ETHER_IS_MULTICAST(eh->ether_dhost)) {
+       /* Is the packet for us? */
+       if (memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN) != 0) {
+               /* If not, it must be multicast or broadcast to go further */
+               if (!ETHER_IS_MULTICAST(eh->ether_dhost))
+                       goto dropanyway;
                 * If this is not a simplex interface, drop the packet
                 * if it came from us.
                if ((ifp->if_flags & IFF_SIMPLEX) == 0) {
                        if (memcmp(ac->ac_enaddr, eh->ether_shost,
-                           ETHER_ADDR_LEN) == 0) {
-                               m_freem(m);
-                               return (1);
-                       }
+                           ETHER_ADDR_LEN) == 0)
+                               goto dropanyway;
                if (memcmp(etherbroadcastaddr, eh->ether_dhost,
@@ -354,18 +358,6 @@ ether_input(struct ifnet *ifp, struct mb
        if (m->m_flags & M_VLANTAG) {
                return (1);
-       }
-       /*
-        * If packet is unicast, make sure it is for us.  Drop otherwise.
-        * This check is required in promiscous mode, and for some hypervisors
-        * where the MAC filter is 'best effort' only.
-        */
-       if ((m->m_flags & (M_BCAST|M_MCAST)) == 0) {
-               if (memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN)) {
-                       m_freem(m);
-                       return (1);
-               }
        etype = ntohs(eh->ether_type);

Reply via email to