Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP

2017-04-25 Thread David Miller
From: Andy Gospodarek 
Date: Tue, 25 Apr 2017 13:25:49 -0400

> On Mon, Apr 24, 2017 at 06:26:43PM -0400, David Miller wrote:
>> Andy, how goes it? :)
> 
> Sorry for the delayed response.  I was AFK yesterday, but based on
> testing from Friday and what I wrapped up today all looks good to me.
> 
> On my system (i7-6700 CPU @ 3.40GHz) the reported and actual TX
> throughput for xdp_tx_iptunnel is 4.6Mpps for the optimized XDP.
> 
> For generic XDP the reported throughput of xdp_tx_iptunnel is 4.6Mpps
> but only ~880kpps actually on the wire.  It seems to me that can be
> fixed with a follow-up for offending drivers or the stack if deemed that
> there is a real error there.

Ok, I'll commit the latest version with tested-by tags for you, Jesper,
and David Ahern added.

Thanks everyone.


Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP

2017-04-25 Thread Andy Gospodarek
On Mon, Apr 24, 2017 at 06:26:43PM -0400, David Miller wrote:
> From: Jesper Dangaard Brouer 
> Date: Mon, 24 Apr 2017 16:24:05 +0200
> 
> > I've done a very detailed evaluation of this patch, and I've created a
> > blogpost like report here:
> > 
> >  
> > https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html
> 
> Thanks for doing this Jesper.

Yes, this is excellent.  I'm not all the way thru it, but I looked at
the data and corroborate the results you are seeing.

My results for both optimized and generic XDP for
xdp_bench01_mem_access_cost --action XDP_DROP --readmem are quite
similar to yours (11.7Mpps and 7.8Mpps, respectively for me 11.7Mpps and
8.4Mpps for you).

I also noted (as you did) that there is no discernible difference
running xdp_bench01_mem_access_cost with or without the --readmem
option since the packet data is already being accessed that late it the
stack.

> 
> > I didn't evaluate the adjust_head part, so I hope Andy is still
> > planning to validate that part?
> 
> I was hoping he would post some results today as well.
> 
> Andy, how goes it? :)

Sorry for the delayed response.  I was AFK yesterday, but based on
testing from Friday and what I wrapped up today all looks good to me.

On my system (i7-6700 CPU @ 3.40GHz) the reported and actual TX
throughput for xdp_tx_iptunnel is 4.6Mpps for the optimized XDP.

For generic XDP the reported throughput of xdp_tx_iptunnel is 4.6Mpps
but only ~880kpps actually on the wire.  It seems to me that can be
fixed with a follow-up for offending drivers or the stack if deemed that
there is a real error there.

> Once the basic patch is ready and integrated in we can try to do
> xmit_more in generic XDP and see what that does for XDP_TX
> performance.

Agreed.



Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP

2017-04-25 Thread Jesper Dangaard Brouer
On Mon, 24 Apr 2017 18:26:43 -0400 (EDT)
David Miller  wrote:

> From: Jesper Dangaard Brouer 
> Date: Mon, 24 Apr 2017 16:24:05 +0200
> 
> > I've done a very detailed evaluation of this patch, and I've created a
> > blogpost like report here:
> > 
> >  
> > https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html
> >   
> 
> Thanks for doing this Jesper.
> 
> > I didn't evaluate the adjust_head part, so I hope Andy is still
> > planning to validate that part?  
> 
> I was hoping he would post some results today as well.
> 
> Andy, how goes it? :)
> 
> Once the basic patch is ready and integrated in we can try to do
> xmit_more in generic XDP and see what that does for XDP_TX
> performance.

I agree, we can do xmit_more for generic-XDP later, and it should not
be that hard... basically replacing netdev_start_xmit() with
dev_hard_start_xmit() in generic_xdp_tx(), and finding some place to
store a XDP-skb-pointer (functioning as the skb-list) that will be
"flushed" like __kfree_skb_flush().

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP

2017-04-24 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Mon, 24 Apr 2017 16:24:05 +0200

> I've done a very detailed evaluation of this patch, and I've created a
> blogpost like report here:
> 
>  
> https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html

Thanks for doing this Jesper.

> I didn't evaluate the adjust_head part, so I hope Andy is still
> planning to validate that part?

I was hoping he would post some results today as well.

Andy, how goes it? :)

Once the basic patch is ready and integrated in we can try to do
xmit_more in generic XDP and see what that does for XDP_TX
performance.


Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP

2017-04-24 Thread Jesper Dangaard Brouer

I've done a very detailed evaluation of this patch, and I've created a
blogpost like report here:

 
https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html

I didn't evaluate the adjust_head part, so I hope Andy is still
planning to validate that part?

--Jesper



On Thu, 13 Apr 2017 12:09:25 -0400 (EDT)
David Miller  wrote:

> This provides a generic SKB based non-optimized XDP path which is used
> if either the driver lacks a specific XDP implementation, or the user
> requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.
> 
> It is arguable that perhaps I should have required something like
> this as part of the initial XDP feature merge.
> 
> I believe this is critical for two reasons:
> 
> 1) Accessibility.  More people can play with XDP with less
>dependencies.  Yes I know we have XDP support in virtio_net, but
>that just creates another depedency for learning how to use this
>facility.
> 
>I wrote this to make life easier for the XDP newbies.
> 
> 2) As a model for what the expected semantics are.  If there is a pure
>generic core implementation, it serves as a semantic example for
>driver folks adding XDP support.
> 
> This is just a rough draft and is untested.



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-24 Thread Jesper Dangaard Brouer

On Thu, 20 Apr 2017 16:30:34 +0200 Jesper Dangaard Brouer  
wrote:

> On Wed, 19 Apr 2017 10:29:03 -0400
> Andy Gospodarek  wrote:
> 
> > I ran this on top of a card that uses the bnxt_en driver on a desktop
> > class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> > UDP traffic with flow control disabled and saw the following (all stats
> > in Million PPS).
> > 
> > xdp1xdp2xdp_tx_tunnel
> > Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
> > Optimized XDP   11.7 9.7  4.6
> > 
> > One thing to note is that the Generic XDP case shows some different
> > results for reported by the application vs actual (seen on the wire).  I
> > did not debug where the drops are happening and what counter needs to be
> > incremented to note this -- I'll add that to my TODO list.  The
> > Optimized XDP case does not have a difference in reported vs actual
> > frames on the wire.  
> 
> The reported application vs actual (seen on the wire) number sound scary.
> How do you evaluate/measure "seen on the wire"?
> 
> Perhaps you could use ethtool -S stats to see if anything is fishy?
> I recommend using my tool[1] like:
> 
>  ~/git/network-testing/bin/ethtool_stats.pl --dev mlx5p2 --sec 2
> 
> [1] 
> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
> 
> I'm evaluating this patch on a mlx5 NIC, and something is not right...
> I'm seeing:
> 
>  Ethtool(mlx5p2) stat: 349599 (349,599) <= tx_multicast_phy /sec
>  Ethtool(mlx5p2) stat:4940185 (  4,940,185) <= tx_packets /sec
>  Ethtool(mlx5p2) stat: 349596 (349,596) <= tx_packets_phy /sec
>  [...]
>  Ethtool(mlx5p2) stat:  36898 ( 36,898) <= rx_cache_busy /sec
>  Ethtool(mlx5p2) stat:  36898 ( 36,898) <= rx_cache_full /sec
>  Ethtool(mlx5p2) stat:4903287 (  4,903,287) <= rx_cache_reuse /sec
>  Ethtool(mlx5p2) stat:4940185 (  4,940,185) <= rx_csum_complete /sec
>  Ethtool(mlx5p2) stat:4940185 (  4,940,185) <= rx_packets /sec
> 
> Something is wrong... when I tcpdump on the generator machine, I see
> garbled packets with IPv6 multicast addresses.
> 
> And it looks like I'm only sending 349,596 tx_packets_phy/sec on the "wire".
> 

Not seeing packets on the TX wire was caused by the NIC HW dropping the
packets, because the ethernet MAC-addr were not changed/swapped.

Fixed this XDP_TX bug in my test program xdp_bench01_mem_access_cost.
https://github.com/netoptimizer/prototype-kernel/commit/85f7ba2f0ea2

Even added a new option --swapmac for creating another test option for
modifying the packet.
https://github.com/netoptimizer/prototype-kernel/commit/fe080e6f3ccf

I will shortly publish a full report of testing this patch.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-20 Thread Andy Gospodarek
On Wed, Apr 19, 2017 at 09:40:49PM -0400, David Miller wrote:
> From: Andy Gospodarek 
> Date: Wed, 19 Apr 2017 10:29:03 -0400
> 
> > So I tried a variety of things and the simplest change on top of yours that
> > works well for xdp1, xdp2, and xdp_tx_iptunnel. 
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index b3d3a6e..1bab3dc 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4316,11 +4316,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff 
> > *skb,
> >  
> > off = xdp.data - orig_data;
> > if (off)
> > -   __skb_push(skb, off);
> > +   __skb_push(skb, -off);
> 
> We have to handle both pushing and popping headers, so could you
> please test the snippet I asked you to try?
> 

I will tomorrow or by Monday of next week.

I'm also going to hack^W write a quick test app to exercise it as well.


> > if (off > 0)
> > __skb_pull(skb, off);
> > else if (off < 0)
> > __skb_push(skb, -off);
> 
> Thanks.


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-20 Thread Jesper Dangaard Brouer
On Wed, 19 Apr 2017 10:29:03 -0400
Andy Gospodarek  wrote:

> I ran this on top of a card that uses the bnxt_en driver on a desktop
> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> UDP traffic with flow control disabled and saw the following (all stats
> in Million PPS).
> 
> xdp1xdp2xdp_tx_tunnel
> Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
> Optimized XDP   11.7   9.7  4.6
> 
> One thing to note is that the Generic XDP case shows some different
> results for reported by the application vs actual (seen on the wire).  I
> did not debug where the drops are happening and what counter needs to be
> incremented to note this -- I'll add that to my TODO list.  The
> Optimized XDP case does not have a difference in reported vs actual
> frames on the wire.

The reported application vs actual (seen on the wire) number sound scary.
How do you evaluate/measure "seen on the wire"?

Perhaps you could use ethtool -S stats to see if anything is fishy?
I recommend using my tool[1] like:

 ~/git/network-testing/bin/ethtool_stats.pl --dev mlx5p2 --sec 2

[1] 
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

I'm evaluating this patch on a mlx5 NIC, and something is not right...
I'm seeing:

 Ethtool(mlx5p2) stat: 349599 (349,599) <= tx_multicast_phy /sec
 Ethtool(mlx5p2) stat:4940185 (  4,940,185) <= tx_packets /sec
 Ethtool(mlx5p2) stat: 349596 (349,596) <= tx_packets_phy /sec
 [...]
 Ethtool(mlx5p2) stat:  36898 ( 36,898) <= rx_cache_busy /sec
 Ethtool(mlx5p2) stat:  36898 ( 36,898) <= rx_cache_full /sec
 Ethtool(mlx5p2) stat:4903287 (  4,903,287) <= rx_cache_reuse /sec
 Ethtool(mlx5p2) stat:4940185 (  4,940,185) <= rx_csum_complete /sec
 Ethtool(mlx5p2) stat:4940185 (  4,940,185) <= rx_packets /sec

Something is wrong... when I tcpdump on the generator machine, I see
garbled packets with IPv6 multicast addresses.

And it looks like I'm only sending 349,596 tx_packets_phy/sec on the "wire".

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-19 Thread David Miller
From: Andy Gospodarek 
Date: Wed, 19 Apr 2017 10:29:03 -0400

> So I tried a variety of things and the simplest change on top of yours that
> works well for xdp1, xdp2, and xdp_tx_iptunnel. 
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b3d3a6e..1bab3dc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4316,11 +4316,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff 
> *skb,
>  
>   off = xdp.data - orig_data;
>   if (off)
> - __skb_push(skb, off);
> + __skb_push(skb, -off);

We have to handle both pushing and popping headers, so could you
please test the snippet I asked you to try?

>   if (off > 0)
>   __skb_pull(skb, off);
>   else if (off < 0)
>   __skb_push(skb, -off);

Thanks.


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-19 Thread Alexei Starovoitov
On Wed, Apr 19, 2017 at 04:25:43PM -0400, Andy Gospodarek wrote:
> On Wed, Apr 19, 2017 at 10:44:59AM -0700, John Fastabend wrote:
> > On 17-04-19 10:17 AM, Alexei Starovoitov wrote:
> > > On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote:
> > >>
> > >> I ran this on top of a card that uses the bnxt_en driver on a desktop
> > >> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> > >> UDP traffic with flow control disabled and saw the following (all stats
> > >> in Million PPS).
> > >>
> > >> xdp1xdp2xdp_tx_tunnel
> > >> Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
> > >> Optimized XDP   11.7  9.7  4.6
> > > 
> > > Nice! Thanks for testing.
> > > 
> > >> One thing to note is that the Generic XDP case shows some different
> > >> results for reported by the application vs actual (seen on the wire).  I
> > >> did not debug where the drops are happening and what counter needs to be
> > >> incremented to note this -- I'll add that to my TODO list.  The
> > >> Optimized XDP case does not have a difference in reported vs actual
> > >> frames on the wire.
> > > 
> > > The missed packets are probably due to xmit queue being full.
> > > We need 'xdp_tx_full' counter in:
> > > +   if (free_skb) {
> > > +   trace_xdp_exception(dev, xdp_prog, XDP_TX);
> > > +   kfree_skb(skb);
> > > +   }
> > > like in-driver xdp does.
> > > It's surprising that tx becomes full so often. May be bnxt specific 
> > > behavior?
> > 
> > hmm as a data point I get better numbers than 1.3Mpps running through the 
> > qdisc
> > layer with pktgen so seems like something is wrong with the driver perhaps? 
> > If
> 
> I get ~6.5Mpps on a single core with pktgen, so inconclusive for now

may be your tx queue is simply smaller than rx queue?



Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-19 Thread Andy Gospodarek
On Wed, Apr 19, 2017 at 10:44:59AM -0700, John Fastabend wrote:
> On 17-04-19 10:17 AM, Alexei Starovoitov wrote:
> > On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote:
> >>
> >> I ran this on top of a card that uses the bnxt_en driver on a desktop
> >> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> >> UDP traffic with flow control disabled and saw the following (all stats
> >> in Million PPS).
> >>
> >> xdp1xdp2xdp_tx_tunnel
> >> Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
> >> Optimized XDP   11.79.7  4.6
> > 
> > Nice! Thanks for testing.
> > 
> >> One thing to note is that the Generic XDP case shows some different
> >> results for reported by the application vs actual (seen on the wire).  I
> >> did not debug where the drops are happening and what counter needs to be
> >> incremented to note this -- I'll add that to my TODO list.  The
> >> Optimized XDP case does not have a difference in reported vs actual
> >> frames on the wire.
> > 
> > The missed packets are probably due to xmit queue being full.
> > We need 'xdp_tx_full' counter in:
> > +   if (free_skb) {
> > +   trace_xdp_exception(dev, xdp_prog, XDP_TX);
> > +   kfree_skb(skb);
> > +   }
> > like in-driver xdp does.
> > It's surprising that tx becomes full so often. May be bnxt specific 
> > behavior?
> 
> hmm as a data point I get better numbers than 1.3Mpps running through the 
> qdisc
> layer with pktgen so seems like something is wrong with the driver perhaps? If

I get ~6.5Mpps on a single core with pktgen, so inconclusive for now

> I get a chance I'll take a look with my setup here, although it likely wont be
> until the weekend. I don't think it needs to slow down dropping the RFC tag
> and getting the patch applied though.
> 
> > 
> >> I agree with all those who have asserted that this is great tool for
> >> those that want to get started with XDP but do not have hardware, so I'd
> >> say it's ready to have the 'RFC' tag dropped.  Thanks for pushing this
> >> forward, Dave!  :-)
> > 
> > +1
> >  
> > 
> 


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-19 Thread John Fastabend
On 17-04-19 10:17 AM, Alexei Starovoitov wrote:
> On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote:
>>
>> I ran this on top of a card that uses the bnxt_en driver on a desktop
>> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
>> UDP traffic with flow control disabled and saw the following (all stats
>> in Million PPS).
>>
>> xdp1xdp2xdp_tx_tunnel
>> Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
>> Optimized XDP   11.7  9.7  4.6
> 
> Nice! Thanks for testing.
> 
>> One thing to note is that the Generic XDP case shows some different
>> results for reported by the application vs actual (seen on the wire).  I
>> did not debug where the drops are happening and what counter needs to be
>> incremented to note this -- I'll add that to my TODO list.  The
>> Optimized XDP case does not have a difference in reported vs actual
>> frames on the wire.
> 
> The missed packets are probably due to xmit queue being full.
> We need 'xdp_tx_full' counter in:
> +   if (free_skb) {
> +   trace_xdp_exception(dev, xdp_prog, XDP_TX);
> +   kfree_skb(skb);
> +   }
> like in-driver xdp does.
> It's surprising that tx becomes full so often. May be bnxt specific behavior?

hmm as a data point I get better numbers than 1.3Mpps running through the qdisc
layer with pktgen so seems like something is wrong with the driver perhaps? If
I get a chance I'll take a look with my setup here, although it likely wont be
until the weekend. I don't think it needs to slow down dropping the RFC tag
and getting the patch applied though.

> 
>> I agree with all those who have asserted that this is great tool for
>> those that want to get started with XDP but do not have hardware, so I'd
>> say it's ready to have the 'RFC' tag dropped.  Thanks for pushing this
>> forward, Dave!  :-)
> 
> +1
>  
> 



Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-19 Thread Alexei Starovoitov
On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote:
> 
> I ran this on top of a card that uses the bnxt_en driver on a desktop
> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> UDP traffic with flow control disabled and saw the following (all stats
> in Million PPS).
> 
> xdp1xdp2xdp_tx_tunnel
> Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
> Optimized XDP   11.7   9.7  4.6

Nice! Thanks for testing.

> One thing to note is that the Generic XDP case shows some different
> results for reported by the application vs actual (seen on the wire).  I
> did not debug where the drops are happening and what counter needs to be
> incremented to note this -- I'll add that to my TODO list.  The
> Optimized XDP case does not have a difference in reported vs actual
> frames on the wire.

The missed packets are probably due to xmit queue being full.
We need 'xdp_tx_full' counter in:
+   if (free_skb) {
+   trace_xdp_exception(dev, xdp_prog, XDP_TX);
+   kfree_skb(skb);
+   }
like in-driver xdp does.
It's surprising that tx becomes full so often. May be bnxt specific behavior?

> I agree with all those who have asserted that this is great tool for
> those that want to get started with XDP but do not have hardware, so I'd
> say it's ready to have the 'RFC' tag dropped.  Thanks for pushing this
> forward, Dave!  :-)

+1
 


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-19 Thread Andy Gospodarek
On Tue, Apr 18, 2017 at 03:29:16PM -0400, David Miller wrote:
> From: David Miller 
> Date: Tue, 18 Apr 2017 15:07:08 -0400 (EDT)
> 
> > From: Andy Gospodarek 
> > Date: Tue, 18 Apr 2017 15:05:35 -0400
> > 
> >> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote:
> >>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
> >>> > +
> >>> > +   switch (act) {
> >>> > +   case XDP_TX:
> >>> > +   __skb_push(skb, skb->mac_len);
> >>> 
> >>> s/skb->mac_len/mac_len/
> >>> 
> >> 
> >> I was away from my keyboard for a few days, but was able to get some
> >> time to test this today.
> >> 
> >> When using this change above suggested by Alexei, XDP_DROP and XDP_TX
> >> actions appear to work well with xdp1 and xdp2.
> >> 
> >> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be
> >> good to hold off on committing this just yet.
> >> 
> >> At first glance, it looks like there is enough headroom for the new
> >> frame, but the resulting packet data do not look right and I'm actually
> >> seeing some data that may be left on the stack from a previous caller.
> > 
> > Thanks for testing Andy, I'll take a look and see if I can figure it out.
> 
> Andy, I think we might be getting burnt by signedness issues in the
> offset handling when the XDP program adjusts the packet data pointer.
> 
> In netif_receive_generic_xdp(), try changing the offset handling code to
> read something like:
> 
>   off = xdp.data - orig_data;
>   if (off > 0)
>   __skb_pull(skb, off);
>   else if (off < 0)
>   __skb_push(skb, -off);
> 
> If that doesn't work try adding:
> 
>   __skb_cow(skb, XDP_PACKET_HEADROOM, 0);
> 
> right after the skb_linearize() call in that same function.

So I tried a variety of things and the simplest change on top of yours that
works well for xdp1, xdp2, and xdp_tx_iptunnel. 

diff --git a/net/core/dev.c b/net/core/dev.c
index b3d3a6e..1bab3dc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4316,11 +4316,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff 
*skb,
 
off = xdp.data - orig_data;
if (off)
-   __skb_push(skb, off);
+   __skb_push(skb, -off);
 
switch (act) {
case XDP_TX:
-   __skb_push(skb, skb->mac_len);
+   __skb_push(skb, mac_len);
/* fall through */
case XDP_PASS:
break;

I ran this on top of a card that uses the bnxt_en driver on a desktop
class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
UDP traffic with flow control disabled and saw the following (all stats
in Million PPS).

xdp1xdp2xdp_tx_tunnel
Generic XDP  7.85.5 (1.3 actual) 4.6 (1.1 actual)
Optimized XDP   11.7 9.7  4.6

One thing to note is that the Generic XDP case shows some different
results for reported by the application vs actual (seen on the wire).  I
did not debug where the drops are happening and what counter needs to be
incremented to note this -- I'll add that to my TODO list.  The
Optimized XDP case does not have a difference in reported vs actual
frames on the wire.

I agree with all those who have asserted that this is great tool for
those that want to get started with XDP but do not have hardware, so I'd
say it's ready to have the 'RFC' tag dropped.  Thanks for pushing this
forward, Dave!  :-)



Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-18 Thread Jesper Dangaard Brouer

On Tue, 18 Apr 2017 15:05:35 -0400 Andy Gospodarek  wrote:

> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote:
> > On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:  
> > > +
> > > + switch (act) {
> > > + case XDP_TX:
> > > + __skb_push(skb, skb->mac_len);  
> > 
> > s/skb->mac_len/mac_len/
> >   
> 
[...]
> 
> When using this change above suggested by Alexei, XDP_DROP and XDP_TX
> actions appear to work well with xdp1 and xdp2.

Also adjusted patch accordingly.

Ran a few quick tests today, but just against an really old e1000 NIC
attached to a PCI-bus (32bit).  There were not difference between DROP
and PASS, but this is likely due to a NIC HW limit.

Sender were sending 951,146 pps (<= tx_queue_0_packets /sec)

$ sudo ./xdp_bench01_mem_access_cost --readmem --action XDP_DROP --dev e1000
XDP_action   ppspps-human-readable mem  
XDP_DROP 671975 671,975read 
XDP_DROP 671997 671,997read 
XDP_DROP 672061 672,061read 
XDP_DROP 671861 671,861read 
^CInterrupted: Removing XDP program on ifindex:2 device:e1000

$ sudo ./xdp_bench01_mem_access_cost --readmem --action XDP_PASS --dev e1000
XDP_action   ppspps-human-readable mem  
XDP_PASS 672032 672,032read 
XDP_PASS 671910 671,910read 
XDP_PASS 671926 671,926read 
XDP_PASS 671947 671,947read 
^CInterrupted: Removing XDP program on ifindex:2 device:e1000

Program xdp_bench01_mem_access_cost avail here:
 https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/samples/bpf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-18 Thread Andy Gospodarek
On Tue, Apr 18, 2017 at 03:29:16PM -0400, David Miller wrote:
> From: David Miller 
> Date: Tue, 18 Apr 2017 15:07:08 -0400 (EDT)
> 
> > From: Andy Gospodarek 
> > Date: Tue, 18 Apr 2017 15:05:35 -0400
> > 
> >> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote:
> >>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
> >>> > +
> >>> > +   switch (act) {
> >>> > +   case XDP_TX:
> >>> > +   __skb_push(skb, skb->mac_len);
> >>> 
> >>> s/skb->mac_len/mac_len/
> >>> 
> >> 
> >> I was away from my keyboard for a few days, but was able to get some
> >> time to test this today.
> >> 
> >> When using this change above suggested by Alexei, XDP_DROP and XDP_TX
> >> actions appear to work well with xdp1 and xdp2.
> >> 
> >> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be
> >> good to hold off on committing this just yet.
> >> 
> >> At first glance, it looks like there is enough headroom for the new
> >> frame, but the resulting packet data do not look right and I'm actually
> >> seeing some data that may be left on the stack from a previous caller.
> > 
> > Thanks for testing Andy, I'll take a look and see if I can figure it out.
> 
> Andy, I think we might be getting burnt by signedness issues in the
> offset handling when the XDP program adjusts the packet data pointer.

I completely agree -- I just noted that the offset would be -20 in the
tx_tunnel case and it's easy to get confused since a positive int for
the second arg in skb_pull() does go 'back' with a positive value  and
was just rebuilding and testing just this case with:

-   off = xdp.data - orig_data;
+   /* note that offset is negative */
+   off = orig_data - xdp.data;


> In netif_receive_generic_xdp(), try changing the offset handling code to
> read something like:
> 
>   off = xdp.data - orig_data;
>   if (off > 0)
>   __skb_pull(skb, off);
>   else if (off < 0)
>   __skb_push(skb, -off);

This should do it.  I'll give that run, too.

> If that doesn't work try adding:
> 
>   __skb_cow(skb, XDP_PACKET_HEADROOM, 0);
> 
> right after the skb_linearize() call in that same function.




Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-18 Thread David Miller
From: David Miller 
Date: Tue, 18 Apr 2017 15:07:08 -0400 (EDT)

> From: Andy Gospodarek 
> Date: Tue, 18 Apr 2017 15:05:35 -0400
> 
>> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote:
>>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
>>> > +
>>> > + switch (act) {
>>> > + case XDP_TX:
>>> > + __skb_push(skb, skb->mac_len);
>>> 
>>> s/skb->mac_len/mac_len/
>>> 
>> 
>> I was away from my keyboard for a few days, but was able to get some
>> time to test this today.
>> 
>> When using this change above suggested by Alexei, XDP_DROP and XDP_TX
>> actions appear to work well with xdp1 and xdp2.
>> 
>> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be
>> good to hold off on committing this just yet.
>> 
>> At first glance, it looks like there is enough headroom for the new
>> frame, but the resulting packet data do not look right and I'm actually
>> seeing some data that may be left on the stack from a previous caller.
> 
> Thanks for testing Andy, I'll take a look and see if I can figure it out.

Andy, I think we might be getting burnt by signedness issues in the
offset handling when the XDP program adjusts the packet data pointer.

In netif_receive_generic_xdp(), try changing the offset handling code to
read something like:

off = xdp.data - orig_data;
if (off > 0)
__skb_pull(skb, off);
else if (off < 0)
__skb_push(skb, -off);

If that doesn't work try adding:

__skb_cow(skb, XDP_PACKET_HEADROOM, 0);

right after the skb_linearize() call in that same function.


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-18 Thread David Miller
From: Andy Gospodarek 
Date: Tue, 18 Apr 2017 15:05:35 -0400

> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote:
>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
>> > +
>> > +  switch (act) {
>> > +  case XDP_TX:
>> > +  __skb_push(skb, skb->mac_len);
>> 
>> s/skb->mac_len/mac_len/
>> 
> 
> I was away from my keyboard for a few days, but was able to get some
> time to test this today.
> 
> When using this change above suggested by Alexei, XDP_DROP and XDP_TX
> actions appear to work well with xdp1 and xdp2.
> 
> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be
> good to hold off on committing this just yet.
> 
> At first glance, it looks like there is enough headroom for the new
> frame, but the resulting packet data do not look right and I'm actually
> seeing some data that may be left on the stack from a previous caller.

Thanks for testing Andy, I'll take a look and see if I can figure it out.


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-18 Thread Andy Gospodarek
On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote:
> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
> > +
> > +   switch (act) {
> > +   case XDP_TX:
> > +   __skb_push(skb, skb->mac_len);
> 
> s/skb->mac_len/mac_len/
> 

I was away from my keyboard for a few days, but was able to get some
time to test this today.

When using this change above suggested by Alexei, XDP_DROP and XDP_TX
actions appear to work well with xdp1 and xdp2.

I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be
good to hold off on committing this just yet.

At first glance, it looks like there is enough headroom for the new
frame, but the resulting packet data do not look right and I'm actually
seeing some data that may be left on the stack from a previous caller.

> > +   HARD_TX_UNLOCK(dev, txq);
> > +   if (free_skb) {
> > +   trace_xdp_exception(dev, xdp_prog, XDP_TX);
> > +   kfree_skb(skb);
> 
> nice that you didn't forget to add trace_xdp_exception in this path :)
> 
> Overall looks good to me and other than the minor nit in tx, i think it
> should work for programs already used with in-driver xdp.
> I'll test it next week unless people beat me to it.
> 


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-15 Thread David Ahern
On 4/14/17 6:59 PM, Alexei Starovoitov wrote:
> I'll test it next week unless people beat me to it.
> 

I have run the xdp1 example from samples/bpf.

I have a patch that modifies set_link_xdp_fd to take a flags argument
which it adds to the request as the IFLA_XDP_FLAGS attribute. That's
used by callers to set XDP_FLAGS_SKB_MODE.


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-14 Thread Alexei Starovoitov
On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote:
> +
> + switch (act) {
> + case XDP_TX:
> + __skb_push(skb, skb->mac_len);

s/skb->mac_len/mac_len/

> + HARD_TX_UNLOCK(dev, txq);
> + if (free_skb) {
> + trace_xdp_exception(dev, xdp_prog, XDP_TX);
> + kfree_skb(skb);

nice that you didn't forget to add trace_xdp_exception in this path :)

Overall looks good to me and other than the minor nit in tx, i think it
should work for programs already used with in-driver xdp.
I'll test it next week unless people beat me to it.



Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-14 Thread William Tu
On Thu, Apr 13, 2017 at 9:09 AM, David Miller  wrote:
>
[snip]


> +static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> +struct bpf_prog *xdp_prog)
> +{
> +   struct xdp_buff xdp;
> +   u32 act = XDP_DROP;
> +   void *orig_data;
> +   int hlen, off;
> +
> +   if (skb_linearize(skb))
> +   goto do_drop;
> +
> +   /* The XDP program wants to see the packet starting at the MAC
> +* header.
> +*/
> +   hlen = skb_headlen(skb) + skb->mac_len;
> +   xdp.data = skb->data - skb->mac_len;
> +   xdp.data_end = xdp.data + hlen;
> +   xdp.data_hard_start = xdp.data - skb_headroom(skb);
> +   orig_data = xdp.data;
> +
> +   act = bpf_prog_run_xdp(xdp_prog, );
> +
> +   off = xdp.data - orig_data;

should we do "orig_data - xdp.data" instead?
The 'off' might be < 0, when pushing new header.

> +   if (off)
> +   __skb_push(skb, off);

When doing encapsulation by calling xdp_adjust_head(skb, offset), the
offset is < 0 in order to make extra room in the front ; so xdp.data <
orig_data, off < 0.

But if we are decapsulating protocol, then off > 0, and maybe we
should call __skb_pull()?

Thanks,
William


Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-13 Thread David Miller
From: Michael Chan 
Date: Thu, 13 Apr 2017 13:16:43 -0700

> On Thu, Apr 13, 2017 at 9:09 AM, David Miller  wrote:
>>
>> ---
>>
>> v4:
>>  - Fix MAC header adjustmnet before calling prog (David Ahern)
>>  - Disable LRO when generic XDP is installed (Michael Chan)
> 
> I don't see where you are disabling LRO in the patch.

Ugh, I posted the wrong patch, here is the correct one.  Sorry about that:


Subject: [PATCH] net: Generic XDP

This provides a generic SKB based non-optimized XDP path which is used
if either the driver lacks a specific XDP implementation, or the user
requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.

It is arguable that perhaps I should have required something like
this as part of the initial XDP feature merge.

I believe this is critical for two reasons:

1) Accessibility.  More people can play with XDP with less
   dependencies.  Yes I know we have XDP support in virtio_net, but
   that just creates another depedency for learning how to use this
   facility.

   I wrote this to make life easier for the XDP newbies.

2) As a model for what the expected semantics are.  If there is a pure
   generic core implementation, it serves as a semantic example for
   driver folks adding XDP support.

This is just a rough draft and is untested.

One thing I have not tried to address here is the issue of
XDP_PACKET_HEADROOM, thanks to Daniel for spotting that.  It seems
incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or
whatever even if the XDP program doesn't try to push headers at all.
I think we really need the verifier to somehow propagate whether
certain XDP helpers are used or not.

Signed-off-by: David S. Miller 
---

v4:
 - Fix MAC header adjustmnet before calling prog (David Ahern)
 - Disable LRO when generic XDP is installed (Michael Chan)
 - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
 - Do not perform generic XDP on reinjected packets (DaveM)

v3:
 - Make sure XDP program sees packet at MAC header, push back MAC
   header if we do XDP_TX.  (Alexei)
 - Elide GRO when generic XDP is in use.  (Alexei)
 - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic
   XDP even if the driver has an XDP implementation.  (Alexei)
 - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS
   attribute.  (Daniel)

v2:
 - Add some "fall through" comments in switch statements based
   upon feedback from Andrew Lunn
 - Use RCU for generic xdp_prog, thanks to Johannes Berg.
---
 include/linux/netdevice.h|   8 +++
 include/uapi/linux/if_link.h |   4 +-
 net/core/dev.c   | 153 +--
 net/core/gro_cells.c |   2 +-
 net/core/rtnetlink.c |  40 ++-
 5 files changed, 185 insertions(+), 22 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b0aa089..071a58b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1891,9 +1891,17 @@ struct net_device {
struct lock_class_key   *qdisc_tx_busylock;
struct lock_class_key   *qdisc_running_key;
boolproto_down;
+   struct bpf_prog __rcu   *xdp_prog;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline bool netif_elide_gro(const struct net_device *dev)
+{
+   if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog)
+   return true;
+   return false;
+}
+
 #defineNETDEV_ALIGN32
 
 static inline
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8b405af..633aa02 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -887,7 +887,9 @@ enum {
 /* XDP section */
 
 #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0)
-#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST)
+#define XDP_FLAGS_SKB_MODE (2U << 0)
+#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \
+XDP_FLAGS_SKB_MODE)
 
 enum {
IFLA_XDP_UNSPEC,
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..b3d3a6e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -95,6 +95,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4247,6 +4248,123 @@ static int __netif_receive_skb(struct sk_buff *skb)
return ret;
 }
 
+static struct static_key generic_xdp_needed __read_mostly;
+
+static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   struct bpf_prog *new = xdp->prog;
+   int ret = 0;
+
+   switch (xdp->command) {
+   case XDP_SETUP_PROG: {
+   struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
+
+   rcu_assign_pointer(dev->xdp_prog, new);
+   if (old)
+   bpf_prog_put(old);
+
+   if (old && !new) {
+

Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-13 Thread Michael Chan
On Thu, Apr 13, 2017 at 9:09 AM, David Miller  wrote:
>
> ---
>
> v4:
>  - Fix MAC header adjustmnet before calling prog (David Ahern)
>  - Disable LRO when generic XDP is installed (Michael Chan)

I don't see where you are disabling LRO in the patch.

>  - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
>  - Do not perform generic XDP on reinjected packets (DaveM)
>


[PATCH v4 net-next RFC] net: Generic XDP

2017-04-13 Thread David Miller

This provides a generic SKB based non-optimized XDP path which is used
if either the driver lacks a specific XDP implementation, or the user
requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.

It is arguable that perhaps I should have required something like
this as part of the initial XDP feature merge.

I believe this is critical for two reasons:

1) Accessibility.  More people can play with XDP with less
   dependencies.  Yes I know we have XDP support in virtio_net, but
   that just creates another depedency for learning how to use this
   facility.

   I wrote this to make life easier for the XDP newbies.

2) As a model for what the expected semantics are.  If there is a pure
   generic core implementation, it serves as a semantic example for
   driver folks adding XDP support.

This is just a rough draft and is untested.

One thing I have not tried to address here is the issue of
XDP_PACKET_HEADROOM, thanks to Daniel for spotting that.  It seems
incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or
whatever even if the XDP program doesn't try to push headers at all.
I think we really need the verifier to somehow propagate whether
certain XDP helpers are used or not.

Signed-off-by: David S. Miller 
---

v4:
 - Fix MAC header adjustmnet before calling prog (David Ahern)
 - Disable LRO when generic XDP is installed (Michael Chan)
 - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
 - Do not perform generic XDP on reinjected packets (DaveM)

v3:
 - Make sure XDP program sees packet at MAC header, push back MAC
   header if we do XDP_TX.  (Alexei)
 - Elide GRO when generic XDP is in use.  (Alexei)
 - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic
   XDP even if the driver has an XDP implementation.  (Alexei)
 - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS
   attribute.  (Daniel)

v2:
 - Add some "fall through" comments in switch statements based
   upon feedback from Andrew Lunn
 - Use RCU for generic xdp_prog, thanks to Johannes Berg.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b0aa089..071a58b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1891,9 +1891,17 @@ struct net_device {
struct lock_class_key   *qdisc_tx_busylock;
struct lock_class_key   *qdisc_running_key;
boolproto_down;
+   struct bpf_prog __rcu   *xdp_prog;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline bool netif_elide_gro(const struct net_device *dev)
+{
+   if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog)
+   return true;
+   return false;
+}
+
 #defineNETDEV_ALIGN32
 
 static inline
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8b405af..633aa02 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -887,7 +887,9 @@ enum {
 /* XDP section */
 
 #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0)
-#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST)
+#define XDP_FLAGS_SKB_MODE (2U << 0)
+#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \
+XDP_FLAGS_SKB_MODE)
 
 enum {
IFLA_XDP_UNSPEC,
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..9ed4569 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -95,6 +95,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4247,6 +4248,88 @@ static int __netif_receive_skb(struct sk_buff *skb)
return ret;
 }
 
+static struct static_key generic_xdp_needed __read_mostly;
+
+static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   struct bpf_prog *new = xdp->prog;
+   int ret = 0;
+
+   switch (xdp->command) {
+   case XDP_SETUP_PROG: {
+   struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
+
+   rcu_assign_pointer(dev->xdp_prog, new);
+   if (old)
+   bpf_prog_put(old);
+
+   if (old && !new)
+   static_key_slow_dec(_xdp_needed);
+   else if (new && !old)
+   static_key_slow_inc(_xdp_needed);
+   break;
+   }
+
+   case XDP_QUERY_PROG:
+   xdp->prog_attached = !!rcu_access_pointer(dev->xdp_prog);
+   break;
+
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   return ret;
+}
+
+static u32 netif_receive_generic_xdp(struct sk_buff *skb,
+struct bpf_prog *xdp_prog)
+{
+   struct xdp_buff xdp;
+   u32 act = XDP_DROP;
+   void *orig_data;
+   int hlen, off;
+
+   if (skb_linearize(skb))
+   goto do_drop;
+
+   /* The XDP program wants to see the packet starting at the MAC
+* header.
+*/
+