Re: merge vlan and carp input back into ether_input
On 01/11/18 14:51, Martin Pieuchot wrote: On 11/01/18(Thu) 21:59, David Gwynne wrote: [...] when you say i break carp balancing, are you talking about the removal of the PACKET_TAG_CARP_BAL_IP tagging? PACKET_TAG_CARP_BAL_IP is only used in carp_lsdrop to clear the M_MCAST flag on the mbuf. M_MCAST wont be set on packets destined for the carp interface because we check ac_enaddr before checking if the packet is multicast or broadcast. I might be mistaken. I just know that this code is fragile and I'd prefer to see such change tested isolated because carp(4) itself has multiple configurations. CARP balancing has been fixed for 6.2 by friehm@ after being broken for multiple releases. Hi David, let me know if you all you carp changes are commited, then I will test if carp balancing is still working. I tested this commit already and it seems to be fine. Thanks & Regards, friehm
Re: merge vlan and carp input back into ether_input
On Thu, Jan 11, 2018 at 02:51:40PM +0100, Martin Pieuchot wrote: > On 11/01/18(Thu) 21:59, David Gwynne wrote: > > [...] > > when you say i break carp balancing, are you talking about the removal of > > the PACKET_TAG_CARP_BAL_IP tagging? PACKET_TAG_CARP_BAL_IP is only used in > > carp_lsdrop to clear the M_MCAST flag on the mbuf. M_MCAST wont be set on > > packets destined for the carp interface because we check ac_enaddr before > > checking if the packet is multicast or broadcast. > > I might be mistaken. I just know that this code is fragile and I'd > prefer to see such change tested isolated because carp(4) itself has > multiple configurations. CARP balancing has been fixed for 6.2 by > friehm@ after being broken for multiple releases. ok. i had a look the commit related to this at https://github.com/openbsd/src/commit/76dda2b0279f3c37adf1c059c3bab4d74bc96602 that change happened because ethernet_input checks and sets the multicast bits before doing the ac_enaddr comparison. that's reversed in the big diff i sent, but i pulled it out and attached it below. it doesn't take away the mbuf tag carp uses, but it would make it unnecessary. > > are you sure? carp and vlan on top of trunk should still function. however, > > trunk or bridge/switch on vlan is broken though :( > > I'm not sure, but I believe mixing input handler and hardcoding some in > ether_input() won't fly :) agreed. > > > > If if_input() is called on a pseudo-interface we know we're already in > > > a softnet process. > > > > or a syscall. > > > > the code above has pseudo interfaces recurse, where they'd loop either on > > ifih or at the task level. probably not a huge concern though. > > We can unroll the loop afterward. What we need now is get rid of the > queues. ok. > > > > We could also think of doing something similar for if_enqueue() and call > > > if_start() directly for pseudo-interface. > > > > ill think about that. ive had some other experiments in that area we could > > look at too. > > I'd be glad to look at your experiments :) but im shy :( > > >> note that trunk and bridge/switch are still implemented using > > >> interface input handlers at the moment. > > > > > > If you want to get rid of the input handlers, I'd suggest doing it in > > > the beginning of a release cycle and for all pseudo drivers at once. > > > > considering the trunk and bridge/switch issue, it probably is best to do > > them all at once. > > Yes. I'm aware that the SRP/input handler loop might be considered. > We can probably gain some percents if we remove it. However this is > a micro optimization compared to other improvements that can be done. > Plus it has the advantage of not having fragile #ifdef maze in the > rest of the code. im not expecting a performance difference with this stuff, it's more about correctness. right now the behaviour of the stack is arguably incorrect depending on when you configure pseudo-interfaces. here are two examples: if you configure a carp interface on myx0 after configuring a vlan interface, carp_input input handler will be in the SRPL before vlan_input. because of that we have this chunk in carp_input: #if NVLAN > 0 /* * If the underlying interface removed the VLAN header itself, * it's not for us. */ if (ISSET(m->m_flags, M_VLANTAG)) return (0); #endif however, myx does not do hw vlan tagging and we dont check for ETHERTYPE_VLAN or _QINQ there. carp may take a packet that vlan should have taken. secondly, if you configure switch(4) on an interface after configuring it as a trunk port, switch will happily take all the packets on it. it could be argued that these examples are a bit contrived, which i will accept, and they could be fixed with some stricter code checks, but they do demonstrate a problem with the stack. in my mind, the order of processing by pseudo interfaces attached to an ethernet interface would be: 1. let trunk(4) look at it if trunk is configured on the port... while we're talking about this, i would like to implement "independent" ports on trunks. independent ports mean that if lacp isnt negotiated on the ethernet interface, it can be used as a normal interface. cisco do this by default on port channels, but other vendors require explicit config. eg, on a force10^Wdell i need to add something like "lacp ungroup member-independent port-channel 12" to allow members of po12 to function as normal ports when lacp isnt negotiated. this is useful if you want to pxe boot boxes that are usually connected using lacp. if we supported independent trunk ports, then we should allow otherwise normal ethernet interface configuration. 2. let bridge(4) or switch(4) look at it not both. it only makes sense for a bridge/switch to get the packet if trunk doesnt want it. 3. if it is vlan or qinq tagged, give it to vlan or drop it. 4. check ac_enaddr to see if it is for the current interface. 5. otherwise, give
Re: merge vlan and carp input back into ether_input
On 11/01/18(Thu) 21:59, David Gwynne wrote: > [...] > when you say i break carp balancing, are you talking about the removal of the > PACKET_TAG_CARP_BAL_IP tagging? PACKET_TAG_CARP_BAL_IP is only used in > carp_lsdrop to clear the M_MCAST flag on the mbuf. M_MCAST wont be set on > packets destined for the carp interface because we check ac_enaddr before > checking if the packet is multicast or broadcast. I might be mistaken. I just know that this code is fragile and I'd prefer to see such change tested isolated because carp(4) itself has multiple configurations. CARP balancing has been fixed for 6.2 by friehm@ after being broken for multiple releases. > are you sure? carp and vlan on top of trunk should still function. however, > trunk or bridge/switch on vlan is broken though :( I'm not sure, but I believe mixing input handler and hardcoding some in ether_input() won't fly :) > > If if_input() is called on a pseudo-interface we know we're already in > > a softnet process. > > or a syscall. > > the code above has pseudo interfaces recurse, where they'd loop either on > ifih or at the task level. probably not a huge concern though. We can unroll the loop afterward. What we need now is get rid of the queues. > > We could also think of doing something similar for if_enqueue() and call > > if_start() directly for pseudo-interface. > > ill think about that. ive had some other experiments in that area we could > look at too. I'd be glad to look at your experiments :) > >> note that trunk and bridge/switch are still implemented using > >> interface input handlers at the moment. > > > > If you want to get rid of the input handlers, I'd suggest doing it in > > the beginning of a release cycle and for all pseudo drivers at once. > > considering the trunk and bridge/switch issue, it probably is best to do them > all at once. Yes. I'm aware that the SRP/input handler loop might be considered. We can probably gain some percents if we remove it. However this is a micro optimization compared to other improvements that can be done. Plus it has the advantage of not having fragile #ifdef maze in the rest of the code.
Re: merge vlan and carp input back into ether_input
> On 11 Jan 2018, at 8:58 pm, Martin Pieuchot wrote: > > On 11/01/18(Thu) 11:50, David Gwynne wrote: >> while we were working on making the various pseudo interfaces you >> stack on top of ethernet mpsafe, we split their input processing >> off so they could be attacked one by one. they're all mpsafe now, >> so this separation is not strictly necessary anymore. > > Well at that time we weren't sure how to split the work between > CPUs. The current design allow to have multiple tasks doing some > part of the work and, like we are pushing now, process all the > incoming packets in the same context. yes. now though we assign interfaces to softnettqs, no matter what their relationship to each other is. a physical interface could be on tq 0, and it's child vlan on 1, which goes against the goal of processing all incoming packets in the same context. > >> this moves carp and vlan input back into ether_input. [...] > > It does much more than that. It's also doing a conversion from mbuf to > mbuf_list, it contains some IFF_RUNNING "fixes", it breaks carp(4) > balancing, it breaks vlan/carp on top of trunk(4)... it doesnt replace mbufs with mbuf_lists, it gets rid of the cookies for vlan and carp input and gives them the list to requeue their packets on for input. wrt to IFF_RUNNING, if we agree that we should be looking at it to know if an interface is functional i can make a bunch of those fixes without oks. when you say i break carp balancing, are you talking about the removal of the PACKET_TAG_CARP_BAL_IP tagging? PACKET_TAG_CARP_BAL_IP is only used in carp_lsdrop to clear the M_MCAST flag on the mbuf. M_MCAST wont be set on packets destined for the carp interface because we check ac_enaddr before checking if the packet is multicast or broadcast. are you sure? carp and vlan on top of trunk should still function. however, trunk or bridge/switch on vlan is broken though :( > >> this diff also gets rid of the use of the pseudo interfaces input >> queues, it processes their packets off an mbuf list for each real >> ethernet packet. if we can tie all the work done on behalf of a >> physical ring to a single task it makes rx ring moderation for >> physical interfaces a lot easier to implement. > > This needs to be done. But please don't mix that with a rewrite of > the input handlers. I believe that's the easiest way to do that is > to modify if_input() do process the packets directly if `ifp' is a > pseudo-interface: > > if (ISSET(ifp->if_xflags, IFXF_CLONED) { > struct ifih *ifih; > struct srp_ref sr; > > NET_ASSERT_LOCKED(); > >SRPL_FOREACH(ifih, &sr, &ifp->if_inputs, ifih_next) { >if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) >break; >} >SRPL_LEAVE(&sr) > } else > ifiq_input(&ifp->if_rcv, ml, 2048); > > If if_input() is called on a pseudo-interface we know we're already in > a softnet process. or a syscall. the code above has pseudo interfaces recurse, where they'd loop either on ifih or at the task level. probably not a huge concern though. > > We could also think of doing something similar for if_enqueue() and call > if_start() directly for pseudo-interface. ill think about that. ive had some other experiments in that area we could look at too. > >> note that trunk and bridge/switch are still implemented using >> interface input handlers at the moment. > > If you want to get rid of the input handlers, I'd suggest doing it in > the beginning of a release cycle and for all pseudo drivers at once. considering the trunk and bridge/switch issue, it probably is best to do them all at once.
Re: merge vlan and carp input back into ether_input
On 11/01/18(Thu) 11:50, David Gwynne wrote: > while we were working on making the various pseudo interfaces you > stack on top of ethernet mpsafe, we split their input processing > off so they could be attacked one by one. they're all mpsafe now, > so this separation is not strictly necessary anymore. Well at that time we weren't sure how to split the work between CPUs. The current design allow to have multiple tasks doing some part of the work and, like we are pushing now, process all the incoming packets in the same context. > this moves carp and vlan input back into ether_input. [...] It does much more than that. It's also doing a conversion from mbuf to mbuf_list, it contains some IFF_RUNNING "fixes", it breaks carp(4) balancing, it breaks vlan/carp on top of trunk(4)... > this diff also gets rid of the use of the pseudo interfaces input > queues, it processes their packets off an mbuf list for each real > ethernet packet. if we can tie all the work done on behalf of a > physical ring to a single task it makes rx ring moderation for > physical interfaces a lot easier to implement. This needs to be done. But please don't mix that with a rewrite of the input handlers. I believe that's the easiest way to do that is to modify if_input() do process the packets directly if `ifp' is a pseudo-interface: if (ISSET(ifp->if_xflags, IFXF_CLONED) { struct ifih *ifih; struct srp_ref sr; NET_ASSERT_LOCKED(); SRPL_FOREACH(ifih, &sr, &ifp->if_inputs, ifih_next) { if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) break; } SRPL_LEAVE(&sr) } else ifiq_input(&ifp->if_rcv, ml, 2048); If if_input() is called on a pseudo-interface we know we're already in a softnet process. We could also think of doing something similar for if_enqueue() and call if_start() directly for pseudo-interface. > note that trunk and bridge/switch are still implemented using > interface input handlers at the moment. If you want to get rid of the input handlers, I'd suggest doing it in the beginning of a release cycle and for all pseudo drivers at once.
merge vlan and carp input back into ether_input
while we were working on making the various pseudo interfaces you stack on top of ethernet mpsafe, we split their input processing off so they could be attacked one by one. they're all mpsafe now, so this separation is not strictly necessary anymore. this moves carp and vlan input back into ether_input. a lot of care is taken to correctly order when we give the packets to the sub interfaces. basically, any ethernet packet with a vlan tag is unconditionally given to vlan_input. carp input is only attempted if the packet is not for the parent interface, but before the multicast handling is done. by checking the interfaces mac address first, carp interfaces can get their packets immediately, which means we can stop messing around with the M_BCAST and M_MCAST flags on carp mbufs. the relevant chunk of code is: #if NVLAN > 0 if (ISSET(m->m_flags, M_VLANTAG) || etype == ETHERTYPE_VLAN || etype == ETHERTYPE_QINQ) { vlan_input(ifp, ml, m); return; } #endif if (memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN) != 0) { /* The packet doesn't match the ether addr on this iface */ #if NCARP > 0 /* It may be addressed to a child carp iface */ if (ifp->if_type != IFT_CARP && !SRPL_EMPTY_LOCKED(&ifp->if_carp) && carp_input(ifp, ml, m)) { /* carp_input has consumed the packet */ return; } #endif /* It must be multicast if it isn't for us or a child carp */ if (!ETHER_IS_MULTICAST(eh->ether_dhost)) goto dropanyway; /* Drop it if it came from us in the first place */ if (!ISSET(ifp->if_flags, IFF_SIMPLEX) && memcmp(ac->ac_enaddr, eh->ether_shost, ETHER_ADDR_LEN) == 0) goto dropanyway; SET(m->m_flags, (memcmp(etherbroadcastaddr, eh->ether_dhost, ETHER_ADDR_LEN) == 0) ? M_BCAST : M_MCAST); ifp->if_imcasts++; /* XXX lock? */ } doing vlan and carp input here let's us remove special handling for VLAN packets in carp code. this diff also gets rid of the use of the pseudo interfaces input queues, it processes their packets off an mbuf list for each real ethernet packet. if we can tie all the work done on behalf of a physical ring to a single task it makes rx ring moderation for physical interfaces a lot easier to implement. note that trunk and bridge/switch are still implemented using interface input handlers at the moment. ok? Index: net/if_ethersubr.c === RCS file: /cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.250 diff -u -p -r1.250 if_ethersubr.c --- net/if_ethersubr.c 10 Jan 2018 00:14:38 - 1.250 +++ net/if_ethersubr.c 11 Jan 2018 01:20:54 - @@ -103,6 +103,16 @@ didn't get a copy, you may request one f #include #endif +#include "vlan.h" +#if NVLAN > 0 +#include +#endif + +#include "carp.h" +#if NCARP > 0 +#include +#endif + #include "pppoe.h" #if NPPPOE > 0 #include @@ -121,6 +131,8 @@ didn't get a copy, you may request one f #include #endif /* MPLS */ +void ether_input_m(struct ifnet *, struct mbuf_list *, struct mbuf *); + u_int8_t etherbroadcastaddr[ETHER_ADDR_LEN] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; u_int8_t etheranyaddr[ETHER_ADDR_LEN] = @@ -306,69 +318,108 @@ bad: return (error); } -/* - * Process a received Ethernet packet; - * the packet is in the mbuf chain m without - * the ether header, which is provided separately. - */ +void +ether_enqueue(struct ifnet *ifp, struct mbuf_list *ml, struct mbuf *m) +{ +#if NBPFILTER > 0 + caddr_t if_bpf; +#endif + + m->m_pkthdr.ph_ifidx = ifp->if_index; + m->m_pkthdr.ph_rtableid = ifp->if_rdomain; + + /* XXX lock? */ + ifp->if_ipackets++; + ifp->if_ibytes += m->m_pkthdr.len; + +#if NBPFILTER > 0 + if_bpf = ifp->if_bpf; + if (if_bpf) { + if (bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_IN)) { + m_freem(m); + return; + } + } +#endif + + ml_enqueue(ml, m); +} + int ether_input(struct ifnet *ifp, struct mbuf *m, void *cookie) { + struct mbuf_list ml = MBUF_LIST_INITIALIZER(); + + /* Drop short frames */ + if (m->m_len < ETHER_HDR_LEN) { + m_freem(m); + return (1); + } + + /* We have a reference to this ifp already */ + ether_input_m(ifp, &ml, m); + + /* Run the packet through any child interfaces */ + while ((m = ml_dequeue(&ml)) != NULL) { + ifp = if_get(m->m_pkthdr.ph_ifidx); + if (ifp != NULL) + ether_input_m(ifp, &ml, m); + else +