Re: bypass interface input queues for vlan(4)
On 23.2.2019. 10:35, David Gwynne wrote: > On Fri, Feb 22, 2019 at 09:56:58AM -0300, Martin Pieuchot wrote: >> On 22/02/19(Fri) 15:01, David Gwynne wrote: >>> On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote: On 21/02/19(Thu) 14:19, David Gwynne wrote: > right now we add vlan_input as a possible input handler on the parent > interface, and if the packet is for a vlan we take it and pretend we > received it on the vlan interface by calling if_input against that mbuf. > > as mpi notes, the if input queue stuff looks like a lot of work, > especially for a single packet. my opinion is that we got away with > the if input stuff we've done to try and encourage an mpsafe network > stack because we amortised the cost of it over many packets off the > hardware ring. vlan does it a packet at a time. > > this moves the handling of vlan packets back into ether_input by > calling vlan_input directly on packets that are either marked as vlan > tagged or have a vlan ethertype. note that we have to do that anyway, > this just makes it explicit. > > vlan_input is then tweaked to implement all the important bits of if > input. part of what if input does is count the packets. because vlan > already has per cpu counters for bypassing queues on output, we can use > them again for input from any cpu. if i ever get round to making a > driver handle multiple rx rings this means we can rx vlan packets > concurrently, they don't get serialised to a single if input q. > > finally, hrvoje popovski has tested this diff and get's a significant > bump with it. on a machine that can forward 1100Kpps without vlan, it > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're > still trying to figure that last one out, but it does appear to be > faster. > > thoughts? ok? Why do we need to move stuff to ether_input() if all we want is to bypass ifiq_input()? Isn't a 3 line diff enough^^ ? >>> Fair point. It turns out it's not quite three lines, but it's still >>> smaller. >> I'm unhappy to see the bpf & packet magic reappear in pseudo-drivers. >> >> This is going to spread in every pseudo-driver, no? So why not keeping >> it in the new API? Should we document if_input() vs if_input_one()? >> Should we assert that if_input_one() is only called from a network >> thread? If yes, should we pick a better name? > Maybe. How's if_vinput? And as you suggest, it can do some more of > the magic? Let me try converting a few more drivers before we > burden it with constraints. Hi all, this diff really nicely increase forwarding performance over vlan. i tested it with this loop vlan300 destroy && sh netstart && sleep 5 && ifconfig vlan400 destroy && sh netstart && sleep 5 && ifconfig ix3 down && sh netstart and box seems stable ..
Re: bypass interface input queues for vlan(4)
On Fri, Feb 22, 2019 at 09:56:58AM -0300, Martin Pieuchot wrote: > On 22/02/19(Fri) 15:01, David Gwynne wrote: > > On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote: > > > On 21/02/19(Thu) 14:19, David Gwynne wrote: > > > > right now we add vlan_input as a possible input handler on the parent > > > > interface, and if the packet is for a vlan we take it and pretend we > > > > received it on the vlan interface by calling if_input against that mbuf. > > > > > > > > as mpi notes, the if input queue stuff looks like a lot of work, > > > > especially for a single packet. my opinion is that we got away with > > > > the if input stuff we've done to try and encourage an mpsafe network > > > > stack because we amortised the cost of it over many packets off the > > > > hardware ring. vlan does it a packet at a time. > > > > > > > > this moves the handling of vlan packets back into ether_input by > > > > calling vlan_input directly on packets that are either marked as vlan > > > > tagged or have a vlan ethertype. note that we have to do that anyway, > > > > this just makes it explicit. > > > > > > > > vlan_input is then tweaked to implement all the important bits of if > > > > input. part of what if input does is count the packets. because vlan > > > > already has per cpu counters for bypassing queues on output, we can use > > > > them again for input from any cpu. if i ever get round to making a > > > > driver handle multiple rx rings this means we can rx vlan packets > > > > concurrently, they don't get serialised to a single if input q. > > > > > > > > finally, hrvoje popovski has tested this diff and get's a significant > > > > bump with it. on a machine that can forward 1100Kpps without vlan, it > > > > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps > > > > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're > > > > still trying to figure that last one out, but it does appear to be > > > > faster. > > > > > > > > thoughts? ok? > > > > > > Why do we need to move stuff to ether_input() if all we want is to > > > bypass ifiq_input()? Isn't a 3 line diff enough^^ ? > > > > Fair point. It turns out it's not quite three lines, but it's still > > smaller. > > I'm unhappy to see the bpf & packet magic reappear in pseudo-drivers. > > This is going to spread in every pseudo-driver, no? So why not keeping > it in the new API? Should we document if_input() vs if_input_one()? > Should we assert that if_input_one() is only called from a network > thread? If yes, should we pick a better name? Maybe. How's if_vinput? And as you suggest, it can do some more of the magic? Let me try converting a few more drivers before we burden it with constraints. Index: if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.571 diff -u -p -r1.571 if.c --- if.c9 Jan 2019 01:14:21 - 1.571 +++ if.c23 Feb 2019 09:06:27 - @@ -895,11 +895,29 @@ if_ih_remove(struct ifnet *ifp, int (*in } void -if_input_process(struct ifnet *ifp, struct mbuf_list *ml) +if_input_ih(struct ifnet *ifp, struct mbuf *m) { - struct mbuf *m; struct ifih *ifih; struct srp_ref sr; + + /* +* Pass this mbuf to all input handlers of its +* interface until it is consumed. +*/ + SRPL_FOREACH(ifih, , >if_inputs, ifih_next) { + if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) + break; + } + SRPL_LEAVE(); + + if (ifih == NULL) + m_freem(m); +} + +void +if_input_process(struct ifnet *ifp, struct mbuf_list *ml) +{ + struct mbuf *m; int s; if (ml_empty(ml)) @@ -922,22 +940,32 @@ if_input_process(struct ifnet *ifp, stru */ NET_RLOCK(); s = splnet(); - while ((m = ml_dequeue(ml)) != NULL) { - /* -* Pass this mbuf to all input handlers of its -* interface until it is consumed. -*/ - SRPL_FOREACH(ifih, , >if_inputs, ifih_next) { - if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) - break; - } - SRPL_LEAVE(); + while ((m = ml_dequeue(ml)) != NULL) + if_input_ih(ifp, m); + splx(s); + NET_RUNLOCK(); +} + +void +if_vinput(struct ifnet *ifp, struct mbuf *m) +{ +#if NBPFILTER > 0 + caddr_t if_bpf = ifp->if_bpf; +#endif + + m->m_pkthdr.ph_ifidx = ifp->if_index; + m->m_pkthdr.ph_rtableid = ifp->if_rdomain; - if (ifih == NULL) +#if NBPFILTER > 0 + if (if_bpf) { + if (bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_OUT)) { m_freem(m); + return; + } } - splx(s); - NET_RUNLOCK(); +#endif + + if_input_ih(ifp, m); } void Index:
Re: bypass interface input queues for vlan(4)
On 22/02/19(Fri) 15:01, David Gwynne wrote: > On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote: > > On 21/02/19(Thu) 14:19, David Gwynne wrote: > > > right now we add vlan_input as a possible input handler on the parent > > > interface, and if the packet is for a vlan we take it and pretend we > > > received it on the vlan interface by calling if_input against that mbuf. > > > > > > as mpi notes, the if input queue stuff looks like a lot of work, > > > especially for a single packet. my opinion is that we got away with > > > the if input stuff we've done to try and encourage an mpsafe network > > > stack because we amortised the cost of it over many packets off the > > > hardware ring. vlan does it a packet at a time. > > > > > > this moves the handling of vlan packets back into ether_input by > > > calling vlan_input directly on packets that are either marked as vlan > > > tagged or have a vlan ethertype. note that we have to do that anyway, > > > this just makes it explicit. > > > > > > vlan_input is then tweaked to implement all the important bits of if > > > input. part of what if input does is count the packets. because vlan > > > already has per cpu counters for bypassing queues on output, we can use > > > them again for input from any cpu. if i ever get round to making a > > > driver handle multiple rx rings this means we can rx vlan packets > > > concurrently, they don't get serialised to a single if input q. > > > > > > finally, hrvoje popovski has tested this diff and get's a significant > > > bump with it. on a machine that can forward 1100Kpps without vlan, it > > > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps > > > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're > > > still trying to figure that last one out, but it does appear to be > > > faster. > > > > > > thoughts? ok? > > > > Why do we need to move stuff to ether_input() if all we want is to > > bypass ifiq_input()? Isn't a 3 line diff enough^^ ? > > Fair point. It turns out it's not quite three lines, but it's still > smaller. I'm unhappy to see the bpf & packet magic reappear in pseudo-drivers. This is going to spread in every pseudo-driver, no? So why not keeping it in the new API? Should we document if_input() vs if_input_one()? Should we assert that if_input_one() is only called from a network thread? If yes, should we pick a better name?
Re: bypass interface input queues for vlan(4)
On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote: > On 21/02/19(Thu) 14:19, David Gwynne wrote: > > right now we add vlan_input as a possible input handler on the parent > > interface, and if the packet is for a vlan we take it and pretend we > > received it on the vlan interface by calling if_input against that mbuf. > > > > as mpi notes, the if input queue stuff looks like a lot of work, > > especially for a single packet. my opinion is that we got away with > > the if input stuff we've done to try and encourage an mpsafe network > > stack because we amortised the cost of it over many packets off the > > hardware ring. vlan does it a packet at a time. > > > > this moves the handling of vlan packets back into ether_input by > > calling vlan_input directly on packets that are either marked as vlan > > tagged or have a vlan ethertype. note that we have to do that anyway, > > this just makes it explicit. > > > > vlan_input is then tweaked to implement all the important bits of if > > input. part of what if input does is count the packets. because vlan > > already has per cpu counters for bypassing queues on output, we can use > > them again for input from any cpu. if i ever get round to making a > > driver handle multiple rx rings this means we can rx vlan packets > > concurrently, they don't get serialised to a single if input q. > > > > finally, hrvoje popovski has tested this diff and get's a significant > > bump with it. on a machine that can forward 1100Kpps without vlan, it > > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps > > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're > > still trying to figure that last one out, but it does appear to be > > faster. > > > > thoughts? ok? > > Why do we need to move stuff to ether_input() if all we want is to > bypass ifiq_input()? Isn't a 3 line diff enough^^ ? Fair point. It turns out it's not quite three lines, but it's still smaller. > - ml_enqueue(, m); > - if_input(>ifv_if, ); > + if_input_one(>ifv_if, m); > > I'm saying that because I'm afraid of the breakage that will happen if > we remove the input handlers. So I'm not opposed to get rid of the > handlers, but as you said we should consider all drivers to not break > trunk on top of bridge on top of vlan or whatever crazy configuration > people do. I want to fold them up because there's semantic issues around which handlers were added in which order, and I'd bet there's a small performance win too. But it's mostly about the semantics. > Another point to keep in mind if you're going to remove the handlers is: > do we want to keep passing a single mbuf or was it a plan to pass the > whole list? My current thinking is you want to bundle mbufs on and off the hardware as much as possible via lists of mbufs, but between layers of pseudo interfaces it's better to dispatch each mbuf directly. If the call stack becomes too deep (high?) then lists might be useful for squashing that back down. Index: if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.571 diff -u -p -r1.571 if.c --- if.c9 Jan 2019 01:14:21 - 1.571 +++ if.c22 Feb 2019 02:13:03 - @@ -895,11 +895,29 @@ if_ih_remove(struct ifnet *ifp, int (*in } void -if_input_process(struct ifnet *ifp, struct mbuf_list *ml) +if_input_one(struct ifnet *ifp, struct mbuf *m) { - struct mbuf *m; struct ifih *ifih; struct srp_ref sr; + + /* +* Pass this mbuf to all input handlers of its +* interface until it is consumed. +*/ + SRPL_FOREACH(ifih, , >if_inputs, ifih_next) { + if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) + break; + } + SRPL_LEAVE(); + + if (ifih == NULL) + m_freem(m); +} + +void +if_input_process(struct ifnet *ifp, struct mbuf_list *ml) +{ + struct mbuf *m; int s; if (ml_empty(ml)) @@ -922,20 +940,8 @@ if_input_process(struct ifnet *ifp, stru */ NET_RLOCK(); s = splnet(); - while ((m = ml_dequeue(ml)) != NULL) { - /* -* Pass this mbuf to all input handlers of its -* interface until it is consumed. -*/ - SRPL_FOREACH(ifih, , >if_inputs, ifih_next) { - if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) - break; - } - SRPL_LEAVE(); - - if (ifih == NULL) - m_freem(m); - } + while ((m = ml_dequeue(ml)) != NULL) + if_input_one(ifp, m); splx(s); NET_RUNLOCK(); } Index: if_var.h === RCS file: /cvs/src/sys/net/if_var.h,v retrieving revision 1.94 diff -u -p -r1.94 if_var.h --- if_var.h9 Jan 2019
Re: bypass interface input queues for vlan(4)
On 21/02/19(Thu) 14:19, David Gwynne wrote: > right now we add vlan_input as a possible input handler on the parent > interface, and if the packet is for a vlan we take it and pretend we > received it on the vlan interface by calling if_input against that mbuf. > > as mpi notes, the if input queue stuff looks like a lot of work, > especially for a single packet. my opinion is that we got away with > the if input stuff we've done to try and encourage an mpsafe network > stack because we amortised the cost of it over many packets off the > hardware ring. vlan does it a packet at a time. > > this moves the handling of vlan packets back into ether_input by > calling vlan_input directly on packets that are either marked as vlan > tagged or have a vlan ethertype. note that we have to do that anyway, > this just makes it explicit. > > vlan_input is then tweaked to implement all the important bits of if > input. part of what if input does is count the packets. because vlan > already has per cpu counters for bypassing queues on output, we can use > them again for input from any cpu. if i ever get round to making a > driver handle multiple rx rings this means we can rx vlan packets > concurrently, they don't get serialised to a single if input q. > > finally, hrvoje popovski has tested this diff and get's a significant > bump with it. on a machine that can forward 1100Kpps without vlan, it > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're > still trying to figure that last one out, but it does appear to be > faster. > > thoughts? ok? Why do we need to move stuff to ether_input() if all we want is to bypass ifiq_input()? Isn't a 3 line diff enough^^ ? - ml_enqueue(, m); - if_input(>ifv_if, ); + if_input_one(>ifv_if, m); I'm saying that because I'm afraid of the breakage that will happen if we remove the input handlers. So I'm not opposed to get rid of the handlers, but as you said we should consider all drivers to not break trunk on top of bridge on top of vlan or whatever crazy configuration people do. Another point to keep in mind if you're going to remove the handlers is: do we want to keep passing a single mbuf or was it a plan to pass the whole list?
bypass interface input queues for vlan(4)
right now we add vlan_input as a possible input handler on the parent interface, and if the packet is for a vlan we take it and pretend we received it on the vlan interface by calling if_input against that mbuf. as mpi notes, the if input queue stuff looks like a lot of work, especially for a single packet. my opinion is that we got away with the if input stuff we've done to try and encourage an mpsafe network stack because we amortised the cost of it over many packets off the hardware ring. vlan does it a packet at a time. this moves the handling of vlan packets back into ether_input by calling vlan_input directly on packets that are either marked as vlan tagged or have a vlan ethertype. note that we have to do that anyway, this just makes it explicit. vlan_input is then tweaked to implement all the important bits of if input. part of what if input does is count the packets. because vlan already has per cpu counters for bypassing queues on output, we can use them again for input from any cpu. if i ever get round to making a driver handle multiple rx rings this means we can rx vlan packets concurrently, they don't get serialised to a single if input q. finally, hrvoje popovski has tested this diff and get's a significant bump with it. on a machine that can forward 1100Kpps without vlan, it goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps without vlans, it goes from 550Kpps with vlan to 840Kpps. We're still trying to figure that last one out, but it does appear to be faster. thoughts? ok? i would like to apply this style of tweak to the other ethernet magic pseudo interfaces like trunk, bridge, switch and bpe after this with the intention of making if_input a single function pointer again. Index: if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.571 diff -u -p -r1.571 if.c --- if.c9 Jan 2019 01:14:21 - 1.571 +++ if.c21 Feb 2019 04:03:44 - @@ -895,11 +895,29 @@ if_ih_remove(struct ifnet *ifp, int (*in } void -if_input_process(struct ifnet *ifp, struct mbuf_list *ml) +if_input_one(struct ifnet *ifp, struct mbuf *m) { - struct mbuf *m; struct ifih *ifih; struct srp_ref sr; + + /* +* Pass this mbuf to all input handlers of its +* interface until it is consumed. +*/ + SRPL_FOREACH(ifih, , >if_inputs, ifih_next) { + if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) + break; + } + SRPL_LEAVE(); + + if (ifih == NULL) + m_freem(m); +} + +void +if_input_process(struct ifnet *ifp, struct mbuf_list *ml) +{ + struct mbuf *m; int s; if (ml_empty(ml)) @@ -922,20 +940,8 @@ if_input_process(struct ifnet *ifp, stru */ NET_RLOCK(); s = splnet(); - while ((m = ml_dequeue(ml)) != NULL) { - /* -* Pass this mbuf to all input handlers of its -* interface until it is consumed. -*/ - SRPL_FOREACH(ifih, , >if_inputs, ifih_next) { - if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie)) - break; - } - SRPL_LEAVE(); - - if (ifih == NULL) - m_freem(m); - } + while ((m = ml_dequeue(ml)) != NULL) + if_input_one(ifp, m); splx(s); NET_RUNLOCK(); } Index: if_ethersubr.c === RCS file: /cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.258 diff -u -p -r1.258 if_ethersubr.c --- if_ethersubr.c 18 Feb 2019 03:41:21 - 1.258 +++ if_ethersubr.c 21 Feb 2019 04:03:44 - @@ -74,6 +74,7 @@ didn't get a copy, you may request one f */ #include "bpfilter.h" +#include "vlan.h" #include #include @@ -103,6 +104,10 @@ didn't get a copy, you may request one f #include #endif +#if NVLAN > 0 +#include +#endif + #include "pppoe.h" #if NPPPOE > 0 #include @@ -362,6 +367,17 @@ ether_input(struct ifnet *ifp, struct mb ac = (struct arpcom *)ifp; eh = mtod(m, struct ether_header *); + etype = ntohs(eh->ether_type); + + if (ISSET(m->m_flags, M_VLANTAG) || + etype == ETHERTYPE_VLAN || etype == ETHERTYPE_QINQ) { +#if NVLAN > 0 + m = vlan_input(ifp, m); + if (m == NULL) + return (1); +#endif + goto dropanyway; + } /* Is the packet for us? */ if (memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN) != 0) { @@ -387,15 +403,6 @@ ether_input(struct ifnet *ifp, struct mb m->m_flags |= M_MCAST; ifp->if_imcasts++; } - - /* -* HW vlan tagged packets that were not collected by vlan(4) must -* be dropped now. -*/ -