Re: bypass interface input queues for vlan(4)

2019-03-22 Thread Hrvoje Popovski
On 23.2.2019. 10:35, David Gwynne wrote:
> On Fri, Feb 22, 2019 at 09:56:58AM -0300, Martin Pieuchot wrote:
>> On 22/02/19(Fri) 15:01, David Gwynne wrote:
>>> On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote:
 On 21/02/19(Thu) 14:19, David Gwynne wrote:
> right now we add vlan_input as a possible input handler on the parent
> interface, and if the packet is for a vlan we take it and pretend we
> received it on the vlan interface by calling if_input against that mbuf.
>
> as mpi notes, the if input queue stuff looks like a lot of work,
> especially for a single packet. my opinion is that we got away with
> the if input stuff we've done to try and encourage an mpsafe network
> stack because we amortised the cost of it over many packets off the
> hardware ring. vlan does it a packet at a time.
>
> this moves the handling of vlan packets back into ether_input by
> calling vlan_input directly on packets that are either marked as vlan
> tagged or have a vlan ethertype. note that we have to do that anyway,
> this just makes it explicit.
>
> vlan_input is then tweaked to implement all the important bits of if
> input. part of what if input does is count the packets. because vlan
> already has per cpu counters for bypassing queues on output, we can use
> them again for input from any cpu. if i ever get round to making a
> driver handle multiple rx rings this means we can rx vlan packets
> concurrently, they don't get serialised to a single if input q.
>
> finally, hrvoje popovski has tested this diff and get's a significant
> bump with it. on a machine that can forward 1100Kpps without vlan, it
> goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps
> without vlans, it goes from 550Kpps with vlan to 840Kpps. We're
> still trying to figure that last one out, but it does appear to be
> faster.
>
> thoughts? ok?
 Why do we need to move stuff to ether_input() if all we want is to
 bypass ifiq_input()?  Isn't a 3 line diff enough^^ ?
>>> Fair point. It turns out it's not quite three lines, but it's still
>>> smaller.
>> I'm unhappy to see the bpf & packet magic reappear in pseudo-drivers.
>>
>> This is going to spread in every pseudo-driver, no?  So why not keeping
>> it in the new API?   Should we document if_input() vs if_input_one()?
>> Should we assert that if_input_one() is only called from a network
>> thread?  If yes, should we pick a better name?
> Maybe. How's if_vinput? And as you suggest, it can do some more of
> the magic? Let me try converting a few more drivers before we
> burden it with constraints.


Hi all,

this diff really nicely increase forwarding performance over vlan. i
tested it with this loop
vlan300 destroy && sh netstart && sleep 5 && ifconfig vlan400 destroy &&
sh netstart && sleep 5 && ifconfig ix3 down && sh netstart

and box seems stable ..




Re: bypass interface input queues for vlan(4)

2019-02-23 Thread David Gwynne
On Fri, Feb 22, 2019 at 09:56:58AM -0300, Martin Pieuchot wrote:
> On 22/02/19(Fri) 15:01, David Gwynne wrote:
> > On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote:
> > > On 21/02/19(Thu) 14:19, David Gwynne wrote:
> > > > right now we add vlan_input as a possible input handler on the parent
> > > > interface, and if the packet is for a vlan we take it and pretend we
> > > > received it on the vlan interface by calling if_input against that mbuf.
> > > > 
> > > > as mpi notes, the if input queue stuff looks like a lot of work,
> > > > especially for a single packet. my opinion is that we got away with
> > > > the if input stuff we've done to try and encourage an mpsafe network
> > > > stack because we amortised the cost of it over many packets off the
> > > > hardware ring. vlan does it a packet at a time.
> > > > 
> > > > this moves the handling of vlan packets back into ether_input by
> > > > calling vlan_input directly on packets that are either marked as vlan
> > > > tagged or have a vlan ethertype. note that we have to do that anyway,
> > > > this just makes it explicit.
> > > > 
> > > > vlan_input is then tweaked to implement all the important bits of if
> > > > input. part of what if input does is count the packets. because vlan
> > > > already has per cpu counters for bypassing queues on output, we can use
> > > > them again for input from any cpu. if i ever get round to making a
> > > > driver handle multiple rx rings this means we can rx vlan packets
> > > > concurrently, they don't get serialised to a single if input q.
> > > > 
> > > > finally, hrvoje popovski has tested this diff and get's a significant
> > > > bump with it. on a machine that can forward 1100Kpps without vlan, it
> > > > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps
> > > > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're
> > > > still trying to figure that last one out, but it does appear to be
> > > > faster.
> > > > 
> > > > thoughts? ok?
> > > 
> > > Why do we need to move stuff to ether_input() if all we want is to
> > > bypass ifiq_input()?  Isn't a 3 line diff enough^^ ?
> > 
> > Fair point. It turns out it's not quite three lines, but it's still
> > smaller.
> 
> I'm unhappy to see the bpf & packet magic reappear in pseudo-drivers.
> 
> This is going to spread in every pseudo-driver, no?  So why not keeping
> it in the new API?   Should we document if_input() vs if_input_one()?
> Should we assert that if_input_one() is only called from a network
> thread?  If yes, should we pick a better name?

Maybe. How's if_vinput? And as you suggest, it can do some more of
the magic? Let me try converting a few more drivers before we
burden it with constraints.

Index: if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.571
diff -u -p -r1.571 if.c
--- if.c9 Jan 2019 01:14:21 -   1.571
+++ if.c23 Feb 2019 09:06:27 -
@@ -895,11 +895,29 @@ if_ih_remove(struct ifnet *ifp, int (*in
 }
 
 void
-if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
+if_input_ih(struct ifnet *ifp, struct mbuf *m)
 {
-   struct mbuf *m;
struct ifih *ifih;
struct srp_ref sr;
+
+   /*
+* Pass this mbuf to all input handlers of its
+* interface until it is consumed.
+*/
+   SRPL_FOREACH(ifih, , >if_inputs, ifih_next) {
+   if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie))
+   break;
+   }
+   SRPL_LEAVE();
+
+   if (ifih == NULL)
+   m_freem(m);
+}
+
+void
+if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
+{
+   struct mbuf *m;
int s;
 
if (ml_empty(ml))
@@ -922,22 +940,32 @@ if_input_process(struct ifnet *ifp, stru
 */
NET_RLOCK();
s = splnet();
-   while ((m = ml_dequeue(ml)) != NULL) {
-   /*
-* Pass this mbuf to all input handlers of its
-* interface until it is consumed.
-*/
-   SRPL_FOREACH(ifih, , >if_inputs, ifih_next) {
-   if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie))
-   break;
-   }
-   SRPL_LEAVE();
+   while ((m = ml_dequeue(ml)) != NULL)
+   if_input_ih(ifp, m);
+   splx(s);
+   NET_RUNLOCK();
+}
+
+void
+if_vinput(struct ifnet *ifp, struct mbuf *m)
+{
+#if NBPFILTER > 0
+   caddr_t if_bpf = ifp->if_bpf;
+#endif
+
+   m->m_pkthdr.ph_ifidx = ifp->if_index;
+   m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
 
-   if (ifih == NULL)
+#if NBPFILTER > 0
+   if (if_bpf) {
+   if (bpf_mtap_ether(if_bpf, m, BPF_DIRECTION_OUT)) {
m_freem(m);
+   return;
+   }
}
-   splx(s);
-   NET_RUNLOCK();
+#endif
+
+   if_input_ih(ifp, m);
 }
 
 void
Index: 

Re: bypass interface input queues for vlan(4)

2019-02-22 Thread Martin Pieuchot
On 22/02/19(Fri) 15:01, David Gwynne wrote:
> On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote:
> > On 21/02/19(Thu) 14:19, David Gwynne wrote:
> > > right now we add vlan_input as a possible input handler on the parent
> > > interface, and if the packet is for a vlan we take it and pretend we
> > > received it on the vlan interface by calling if_input against that mbuf.
> > > 
> > > as mpi notes, the if input queue stuff looks like a lot of work,
> > > especially for a single packet. my opinion is that we got away with
> > > the if input stuff we've done to try and encourage an mpsafe network
> > > stack because we amortised the cost of it over many packets off the
> > > hardware ring. vlan does it a packet at a time.
> > > 
> > > this moves the handling of vlan packets back into ether_input by
> > > calling vlan_input directly on packets that are either marked as vlan
> > > tagged or have a vlan ethertype. note that we have to do that anyway,
> > > this just makes it explicit.
> > > 
> > > vlan_input is then tweaked to implement all the important bits of if
> > > input. part of what if input does is count the packets. because vlan
> > > already has per cpu counters for bypassing queues on output, we can use
> > > them again for input from any cpu. if i ever get round to making a
> > > driver handle multiple rx rings this means we can rx vlan packets
> > > concurrently, they don't get serialised to a single if input q.
> > > 
> > > finally, hrvoje popovski has tested this diff and get's a significant
> > > bump with it. on a machine that can forward 1100Kpps without vlan, it
> > > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps
> > > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're
> > > still trying to figure that last one out, but it does appear to be
> > > faster.
> > > 
> > > thoughts? ok?
> > 
> > Why do we need to move stuff to ether_input() if all we want is to
> > bypass ifiq_input()?  Isn't a 3 line diff enough^^ ?
> 
> Fair point. It turns out it's not quite three lines, but it's still
> smaller.

I'm unhappy to see the bpf & packet magic reappear in pseudo-drivers.

This is going to spread in every pseudo-driver, no?  So why not keeping
it in the new API?   Should we document if_input() vs if_input_one()?
Should we assert that if_input_one() is only called from a network
thread?  If yes, should we pick a better name?



Re: bypass interface input queues for vlan(4)

2019-02-21 Thread David Gwynne
On Thu, Feb 21, 2019 at 04:29:27PM -0300, Martin Pieuchot wrote:
> On 21/02/19(Thu) 14:19, David Gwynne wrote:
> > right now we add vlan_input as a possible input handler on the parent
> > interface, and if the packet is for a vlan we take it and pretend we
> > received it on the vlan interface by calling if_input against that mbuf.
> > 
> > as mpi notes, the if input queue stuff looks like a lot of work,
> > especially for a single packet. my opinion is that we got away with
> > the if input stuff we've done to try and encourage an mpsafe network
> > stack because we amortised the cost of it over many packets off the
> > hardware ring. vlan does it a packet at a time.
> > 
> > this moves the handling of vlan packets back into ether_input by
> > calling vlan_input directly on packets that are either marked as vlan
> > tagged or have a vlan ethertype. note that we have to do that anyway,
> > this just makes it explicit.
> > 
> > vlan_input is then tweaked to implement all the important bits of if
> > input. part of what if input does is count the packets. because vlan
> > already has per cpu counters for bypassing queues on output, we can use
> > them again for input from any cpu. if i ever get round to making a
> > driver handle multiple rx rings this means we can rx vlan packets
> > concurrently, they don't get serialised to a single if input q.
> > 
> > finally, hrvoje popovski has tested this diff and get's a significant
> > bump with it. on a machine that can forward 1100Kpps without vlan, it
> > goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps
> > without vlans, it goes from 550Kpps with vlan to 840Kpps. We're
> > still trying to figure that last one out, but it does appear to be
> > faster.
> > 
> > thoughts? ok?
> 
> Why do we need to move stuff to ether_input() if all we want is to
> bypass ifiq_input()?  Isn't a 3 line diff enough^^ ?

Fair point. It turns out it's not quite three lines, but it's still
smaller.

> - ml_enqueue(, m);
> - if_input(>ifv_if, );
> + if_input_one(>ifv_if, m);
>  
> I'm saying that because I'm afraid of the breakage that will happen if
> we remove the input handlers.  So I'm not opposed to get rid of the
> handlers, but as you said we should consider all drivers to not break
> trunk on top of bridge on top of vlan or whatever crazy configuration
> people do.

I want to fold them up because there's semantic issues around which
handlers were added in which order, and I'd bet there's a small
performance win too. But it's mostly about the semantics.

> Another point to keep in mind if you're going to remove the handlers is:
> do we want to keep passing a single mbuf or was it a plan to pass the
> whole list?

My current thinking is you want to bundle mbufs on and off the
hardware as much as possible via lists of mbufs, but between layers
of pseudo interfaces it's better to dispatch each mbuf directly.

If the call stack becomes too deep (high?) then lists might be useful
for squashing that back down.

Index: if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.571
diff -u -p -r1.571 if.c
--- if.c9 Jan 2019 01:14:21 -   1.571
+++ if.c22 Feb 2019 02:13:03 -
@@ -895,11 +895,29 @@ if_ih_remove(struct ifnet *ifp, int (*in
 }
 
 void
-if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
+if_input_one(struct ifnet *ifp, struct mbuf *m)
 {
-   struct mbuf *m;
struct ifih *ifih;
struct srp_ref sr;
+
+   /*
+* Pass this mbuf to all input handlers of its
+* interface until it is consumed.
+*/
+   SRPL_FOREACH(ifih, , >if_inputs, ifih_next) {
+   if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie))
+   break;
+   }
+   SRPL_LEAVE();
+
+   if (ifih == NULL)
+   m_freem(m);
+}
+
+void
+if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
+{
+   struct mbuf *m;
int s;
 
if (ml_empty(ml))
@@ -922,20 +940,8 @@ if_input_process(struct ifnet *ifp, stru
 */
NET_RLOCK();
s = splnet();
-   while ((m = ml_dequeue(ml)) != NULL) {
-   /*
-* Pass this mbuf to all input handlers of its
-* interface until it is consumed.
-*/
-   SRPL_FOREACH(ifih, , >if_inputs, ifih_next) {
-   if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie))
-   break;
-   }
-   SRPL_LEAVE();
-
-   if (ifih == NULL)
-   m_freem(m);
-   }
+   while ((m = ml_dequeue(ml)) != NULL)
+   if_input_one(ifp, m);
splx(s);
NET_RUNLOCK();
 }
Index: if_var.h
===
RCS file: /cvs/src/sys/net/if_var.h,v
retrieving revision 1.94
diff -u -p -r1.94 if_var.h
--- if_var.h9 Jan 2019 

Re: bypass interface input queues for vlan(4)

2019-02-21 Thread Martin Pieuchot
On 21/02/19(Thu) 14:19, David Gwynne wrote:
> right now we add vlan_input as a possible input handler on the parent
> interface, and if the packet is for a vlan we take it and pretend we
> received it on the vlan interface by calling if_input against that mbuf.
> 
> as mpi notes, the if input queue stuff looks like a lot of work,
> especially for a single packet. my opinion is that we got away with
> the if input stuff we've done to try and encourage an mpsafe network
> stack because we amortised the cost of it over many packets off the
> hardware ring. vlan does it a packet at a time.
> 
> this moves the handling of vlan packets back into ether_input by
> calling vlan_input directly on packets that are either marked as vlan
> tagged or have a vlan ethertype. note that we have to do that anyway,
> this just makes it explicit.
> 
> vlan_input is then tweaked to implement all the important bits of if
> input. part of what if input does is count the packets. because vlan
> already has per cpu counters for bypassing queues on output, we can use
> them again for input from any cpu. if i ever get round to making a
> driver handle multiple rx rings this means we can rx vlan packets
> concurrently, they don't get serialised to a single if input q.
> 
> finally, hrvoje popovski has tested this diff and get's a significant
> bump with it. on a machine that can forward 1100Kpps without vlan, it
> goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps
> without vlans, it goes from 550Kpps with vlan to 840Kpps. We're
> still trying to figure that last one out, but it does appear to be
> faster.
> 
> thoughts? ok?

Why do we need to move stuff to ether_input() if all we want is to
bypass ifiq_input()?  Isn't a 3 line diff enough^^ ?


-   ml_enqueue(, m);
-   if_input(>ifv_if, );
+   if_input_one(>ifv_if, m);
 
I'm saying that because I'm afraid of the breakage that will happen if
we remove the input handlers.  So I'm not opposed to get rid of the
handlers, but as you said we should consider all drivers to not break
trunk on top of bridge on top of vlan or whatever crazy configuration
people do.

Another point to keep in mind if you're going to remove the handlers is:
do we want to keep passing a single mbuf or was it a plan to pass the
whole list? 



bypass interface input queues for vlan(4)

2019-02-20 Thread David Gwynne
right now we add vlan_input as a possible input handler on the parent
interface, and if the packet is for a vlan we take it and pretend we
received it on the vlan interface by calling if_input against that mbuf.

as mpi notes, the if input queue stuff looks like a lot of work,
especially for a single packet. my opinion is that we got away with
the if input stuff we've done to try and encourage an mpsafe network
stack because we amortised the cost of it over many packets off the
hardware ring. vlan does it a packet at a time.

this moves the handling of vlan packets back into ether_input by
calling vlan_input directly on packets that are either marked as vlan
tagged or have a vlan ethertype. note that we have to do that anyway,
this just makes it explicit.

vlan_input is then tweaked to implement all the important bits of if
input. part of what if input does is count the packets. because vlan
already has per cpu counters for bypassing queues on output, we can use
them again for input from any cpu. if i ever get round to making a
driver handle multiple rx rings this means we can rx vlan packets
concurrently, they don't get serialised to a single if input q.

finally, hrvoje popovski has tested this diff and get's a significant
bump with it. on a machine that can forward 1100Kpps without vlan, it
goes from 790Kpps with vlan to 870Kpps. On a box that can do 730Kpps
without vlans, it goes from 550Kpps with vlan to 840Kpps. We're
still trying to figure that last one out, but it does appear to be
faster.

thoughts? ok?

i would like to apply this style of tweak to the other ethernet magic
pseudo interfaces like trunk, bridge, switch and bpe  after this
with the intention of making if_input a single function pointer again.

Index: if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.571
diff -u -p -r1.571 if.c
--- if.c9 Jan 2019 01:14:21 -   1.571
+++ if.c21 Feb 2019 04:03:44 -
@@ -895,11 +895,29 @@ if_ih_remove(struct ifnet *ifp, int (*in
 }
 
 void
-if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
+if_input_one(struct ifnet *ifp, struct mbuf *m)
 {
-   struct mbuf *m;
struct ifih *ifih;
struct srp_ref sr;
+
+   /*
+* Pass this mbuf to all input handlers of its
+* interface until it is consumed.
+*/
+   SRPL_FOREACH(ifih, , >if_inputs, ifih_next) {
+   if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie))
+   break;
+   }
+   SRPL_LEAVE();
+
+   if (ifih == NULL)
+   m_freem(m);
+}
+
+void
+if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
+{
+   struct mbuf *m;
int s;
 
if (ml_empty(ml))
@@ -922,20 +940,8 @@ if_input_process(struct ifnet *ifp, stru
 */
NET_RLOCK();
s = splnet();
-   while ((m = ml_dequeue(ml)) != NULL) {
-   /*
-* Pass this mbuf to all input handlers of its
-* interface until it is consumed.
-*/
-   SRPL_FOREACH(ifih, , >if_inputs, ifih_next) {
-   if ((*ifih->ifih_input)(ifp, m, ifih->ifih_cookie))
-   break;
-   }
-   SRPL_LEAVE();
-
-   if (ifih == NULL)
-   m_freem(m);
-   }
+   while ((m = ml_dequeue(ml)) != NULL)
+   if_input_one(ifp, m);
splx(s);
NET_RUNLOCK();
 }
Index: if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.258
diff -u -p -r1.258 if_ethersubr.c
--- if_ethersubr.c  18 Feb 2019 03:41:21 -  1.258
+++ if_ethersubr.c  21 Feb 2019 04:03:44 -
@@ -74,6 +74,7 @@ didn't get a copy, you may request one f
 */
 
 #include "bpfilter.h"
+#include "vlan.h"
 
 #include 
 #include 
@@ -103,6 +104,10 @@ didn't get a copy, you may request one f
 #include 
 #endif
 
+#if NVLAN > 0
+#include 
+#endif
+
 #include "pppoe.h"
 #if NPPPOE > 0
 #include 
@@ -362,6 +367,17 @@ ether_input(struct ifnet *ifp, struct mb
 
ac = (struct arpcom *)ifp;
eh = mtod(m, struct ether_header *);
+   etype = ntohs(eh->ether_type);
+
+   if (ISSET(m->m_flags, M_VLANTAG) ||
+   etype == ETHERTYPE_VLAN || etype == ETHERTYPE_QINQ) {
+#if NVLAN > 0
+   m = vlan_input(ifp, m);
+   if (m == NULL)
+   return (1);
+#endif
+   goto dropanyway;
+   }
 
/* Is the packet for us? */
if (memcmp(ac->ac_enaddr, eh->ether_dhost, ETHER_ADDR_LEN) != 0) {
@@ -387,15 +403,6 @@ ether_input(struct ifnet *ifp, struct mb
m->m_flags |= M_MCAST;
ifp->if_imcasts++;
}
-
-   /*
-* HW vlan tagged packets that were not collected by vlan(4) must
-* be dropped now.
-*/
-