Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-25 Thread Jamal Hadi Salim

On 16-09-23 10:14 AM, Tom Herbert wrote:

On Fri, Sep 23, 2016 at 4:13 AM, Jamal Hadi Salim  wrote:

On 16-09-20 06:00 PM, Tom Herbert wrote:



Tom,
perused the thread and it seems you are serious ;->
Are we heading towards Frankenstein Avenue?


I'm surprised. I thought you'd like democratizing an interface. :-)



I do;-> challenge here is the worry about the classical camel's nose
metaphor[1]. The original intent was for simplicity. Do one very
simple thing. Do it fast.
Note: this is code that is sitting in a critical driver path

[Democratizing the interfaces is a bigger scope discussion we can have
not just for XDP but tc, netfilter etc.]


The whole point behind letting in XDP is so that _small programs_
can be written to do some quick thing. eBPF 4K program limit was
touted as the check and bound. eBPF sounded fine.
This sounds like a huge contradiction.


Jamal, thanks for the feedback.

IMO, XDP != BPF. XDP is an interface in drivers that allow raw packet
processing in the receive path,. This is akin to DPDK in the kernel.


Understood.

Most of these DPDK apps do simple things. EX: show ILA forwarding with
high throughput on a single core - which can be achieved with XDP as is
right now. With DPDK you start doing more generic things and shit starts
slowing down. We can have the best of both worlds In Linux (with 2-3
other modular interfaces above XDP).
Getting both {performance, flexibility} is hard. Add a third dimension
of usability and everything starts spiralling on the z-axis.
i.e pick one and a quarter of those 3 features; anything else is a
an art project. Example: you start adding flexibility we will no doubt
need a northbound control interface, etc.

So I see XDP as this "fast and specific blob" and things above are more
generic.
Our biggest issue, as the netmap and other folks have shown,
is the obesity of the skb. On XDP, I would bet things
get faster because there is no skb - the danger is turning it into
"move your stack to XDP to avoid skbs"

> BPF is critical use case of the interface.

As for the 4K limit that still allows for a lot of damage and abuse
that the user can do with a loadable program and, yes, code in the
kernel can do significant damage as well.


But we have at least a handwavy insurance limit.
A determined person could circumvent the limit - but it is more a
deterrent than enforcement.



However, unlike BPF programs
being set from userspace, kernel code has a well established process
for acceptance, any suggested code gets review, and if it's going
"Frankenstein Avenue" I'd expect it to be rejected. I get a little
worried that we are going to start artificially limiting what we can
do in the kernel on the basis that we don't trust kernel programmers
to be competent enough not to abuse kernel interfaces-- the fact that
we are enabling userspace to arbitrarily program the kernel but
purposely limiting what the kernel itself can do would be a huge irony
to me.


My concern is not incompentency. We already allow people to shoot
their little finger if they want to. And just to be clear I am
not concerned with allowing other people to save the world their
way ;->

So lets say we said ebpf was one use case:
If you wrote an XDP "app" in plain C, could you still enforce this
limit so bigcorp doesnt start building islands around Linux?
Again, the camel metaphor applies.

Note, comparison with kernel modules at tc or netfilter hooks doesnt
apply because:
a) it is a lot harder to avoid kernel interfaces (skbs etc) in those
places. XDP for example totally avoids the skb. So contributing to the
Linux base is a given with kernel modules.
b) Programs that are usable as kernel modules tend to be geared towards
inclusion into the kernel because of the complexity. It is more
maintainance to keep them off tree.
XDP is such a  simple interface that the opportunity to move a whole
DPDK-like industry into the kernel is huge (at the detriment of the
Linux kernel networking).

cheers,
jamal


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-25 Thread Jamal Hadi Salim

On 16-09-23 09:00 AM, Jesper Dangaard Brouer wrote:

On Fri, 23 Sep 2016 07:13:30 -0400
Jamal Hadi Salim  wrote:



[..]


Tom,
perused the thread and it seems you are serious ;->
Are we heading towards Frankenstein Avenue?
The whole point behind letting in XDP is so that _small programs_
can be written to do some quick thing. eBPF 4K program limit was
touted as the check and bound. eBPF sounded fine.
This sounds like a huge contradiction.

cheers,
jamal


Hi Jamal,

I don't understand why you think this is so controversial. The way I
see it (after reading the thread): This is about allowing kernel
components to _also_ use the XDP hook.



The initial push was XDP to support very small programs that did
very simple things fast (to be extreme: for example running a whole
network stack is a no-no). EBPF with the 4K program limit was pointed
as the limit.
What Tom is presenting is implying this constraint is now being
removed. Thats the controversy.

cheers,
jamal



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-23 Thread Alexei Starovoitov

On 9/23/16 6:00 AM, Jesper Dangaard Brouer wrote:

To support Tom's case, with eBPF, I think we would have to implement
specific eBPF helper functions that can do the ILA table lookups.  And
then need a userspace component to load the eBPF program.  Why add all
this, when we could contain all this in the kernel, and simply call
this as native C-code via the XDP hook?


well, ILA router was written in BPF already :) Once for tc+clsact
and once for XDP. In both cases no extra bpf helpers were needed.
Due to this use case we added an optimization to be able to do
a map lookup with packet data directly (instead of copying packet
bytes to stack and pointing map_lookup helper to stack).
Looks like those ila router patches never made it to the list.
I'll dig them out and forward.



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-23 Thread Tom Herbert
On Fri, Sep 23, 2016 at 4:13 AM, Jamal Hadi Salim  wrote:
> On 16-09-20 06:00 PM, Tom Herbert wrote:
>>
>> This patch creates an infrastructure for registering and running code at
>> XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
>> heavily from the techniques used by netfilter to make generic nfhooks.
>>
>> An XDP hook is defined by the  xdp_hook_ops. This structure contains the
>> ops of an XDP hook. A pointer to this structure is passed into the XDP
>> register function to set up a hook. The XDP register function mallocs
>> its own xdp_hook_ops structure and copies the values from the
>> xdp_hook_ops passed in. The register function also stores the pointer
>> value of the xdp_hook_ops argument; this pointer is used in subsequently
>> calls to XDP to identify the registered hook.
>>
>> The interface is defined in net/xdp.h. This includes the definition of
>> xdp_hook_ops, functions to register and unregister hook ops on a device
>> or individual instances of napi, and xdp_hook_run that is called by
>> drivers to run the hooks.
>>
>
> Tom,
> perused the thread and it seems you are serious ;->
> Are we heading towards Frankenstein Avenue?

I'm surprised. I thought you'd like democratizing an interface. :-)

> The whole point behind letting in XDP is so that _small programs_
> can be written to do some quick thing. eBPF 4K program limit was
> touted as the check and bound. eBPF sounded fine.
> This sounds like a huge contradiction.

Jamal, thanks for the feedback.

IMO, XDP != BPF. XDP is an interface in drivers that allow raw packet
processing in the receive path,. This is akin to DPDK in the kernel.
BPF is critical use case of the interface.

As for the 4K limit that still allows for a lot of damage and abuse
that the user can do with a loadable program and, yes, code in the
kernel can do significant damage as well. However, unlike BPF programs
being set from userspace, kernel code has a well established process
for acceptance, any suggested code gets review, and if it's going
"Frankenstein Avenue" I'd expect it to be rejected. I get a little
worried that we are going to start artificially limiting what we can
do in the kernel on the basis that we don't trust kernel programmers
to be competent enough not to abuse kernel interfaces-- the fact that
we are enabling userspace to arbitrarily program the kernel but
purposely limiting what the kernel itself can do would be a huge irony
to me.

Thanks,
Tom




>
> cheers,
> jamal
>
>


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-23 Thread Jesper Dangaard Brouer
On Fri, 23 Sep 2016 07:13:30 -0400
Jamal Hadi Salim  wrote:

> On 16-09-20 06:00 PM, Tom Herbert wrote:
> > This patch creates an infrastructure for registering and running code at
> > XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
> > heavily from the techniques used by netfilter to make generic nfhooks.
> >
> > An XDP hook is defined by the  xdp_hook_ops. This structure contains the
> > ops of an XDP hook. A pointer to this structure is passed into the XDP
> > register function to set up a hook. The XDP register function mallocs
> > its own xdp_hook_ops structure and copies the values from the
> > xdp_hook_ops passed in. The register function also stores the pointer
> > value of the xdp_hook_ops argument; this pointer is used in subsequently
> > calls to XDP to identify the registered hook.
> >
> > The interface is defined in net/xdp.h. This includes the definition of
> > xdp_hook_ops, functions to register and unregister hook ops on a device
> > or individual instances of napi, and xdp_hook_run that is called by
> > drivers to run the hooks.
> >  
> 
> Tom,
> perused the thread and it seems you are serious ;->
> Are we heading towards Frankenstein Avenue?
> The whole point behind letting in XDP is so that _small programs_
> can be written to do some quick thing. eBPF 4K program limit was
> touted as the check and bound. eBPF sounded fine.
> This sounds like a huge contradiction.
> 
> cheers,
> jamal

Hi Jamal,

I don't understand why you think this is so controversial. The way I
see it (after reading the thread): This is about allowing kernel
components to _also_ use the XDP hook.

I believe Tom have a valid use-case in ILA. The NVE (Network
Virtualization Edge) component very naturally fits in the XDP hook, as
it only need to look at the IPv6 address and do a table lookup (in a
ILA specific data structure) to see if this packet is for local stack
delivery or forward.  For forward it does not need take the performance
hit of allocating SKBs etc.

You can see it as a way to accelerate the NVE component. I can imagine
it could be done in approx 10-20 lines of code, as it would use the
existing ILA lookup function calls.

AFAIK Thomas Graf also see XDP as an acceleration for bpf_cls, but he
is lucky because his code is already eBFP based.

To support Tom's case, with eBPF, I think we would have to implement
specific eBPF helper functions that can do the ILA table lookups.  And
then need a userspace component to load the eBPF program.  Why add all
this, when we could contain all this in the kernel, and simply call
this as native C-code via the XDP hook?

Notice, this is not about throwing eBPF out.  Using eBPF is _very_
essential for XDP.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-23 Thread Jamal Hadi Salim

On 16-09-20 06:00 PM, Tom Herbert wrote:

This patch creates an infrastructure for registering and running code at
XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
heavily from the techniques used by netfilter to make generic nfhooks.

An XDP hook is defined by the  xdp_hook_ops. This structure contains the
ops of an XDP hook. A pointer to this structure is passed into the XDP
register function to set up a hook. The XDP register function mallocs
its own xdp_hook_ops structure and copies the values from the
xdp_hook_ops passed in. The register function also stores the pointer
value of the xdp_hook_ops argument; this pointer is used in subsequently
calls to XDP to identify the registered hook.

The interface is defined in net/xdp.h. This includes the definition of
xdp_hook_ops, functions to register and unregister hook ops on a device
or individual instances of napi, and xdp_hook_run that is called by
drivers to run the hooks.



Tom,
perused the thread and it seems you are serious ;->
Are we heading towards Frankenstein Avenue?
The whole point behind letting in XDP is so that _small programs_
can be written to do some quick thing. eBPF 4K program limit was
touted as the check and bound. eBPF sounded fine.
This sounds like a huge contradiction.

cheers,
jamal




Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-22 Thread Eric Dumazet
On Thu, 2016-09-22 at 15:14 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 21 Sep 2016 21:56:58 +0200
> Jesper Dangaard Brouer  wrote:
> 
> > > > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > > > a linked list of hook consumers.  
> > 
> > I also worry about the performance impact of a linked list.  We should
> > simple benchmark it instead of discussing it! ;-)
> 
> (Note, there are some stability issue with this RFC patchset, when
> removing the xdp program, that I had to workaround/patch)
> 
> 
> I've started benchmarking this and I only see added cost of 2.89ns from
> these patches, at these crazy speeds it does correspond to -485Kpps.

I claim the methodology is too biased.

At full speed, all the extra code is hot in caches, and your core has
full access to memory bus anyway. Even branch predictor has fresh
information.

Now, in a mixed workload, where all cores compete to access L2/L3 and
RAM, things might be very different.

Testing icache/dcache pressure is not a matter of measuring how many
Kpps you add or remove on a hot path.

A latency test, when other cpus are busy reading/writing all over
memory, and your caches are cold, would be useful.





Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-22 Thread Jesper Dangaard Brouer
On Wed, 21 Sep 2016 21:56:58 +0200
Jesper Dangaard Brouer  wrote:

> > > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > > a linked list of hook consumers.  
> 
> I also worry about the performance impact of a linked list.  We should
> simple benchmark it instead of discussing it! ;-)

(Note, there are some stability issue with this RFC patchset, when
removing the xdp program, that I had to workaround/patch)


I've started benchmarking this and I only see added cost of 2.89ns from
these patches, at these crazy speeds it does correspond to -485Kpps.

 I was really expecting to see a higher cost of this approach.

I tested this on two different machines. One was suppose to work with
DDIO, but I could not get DDIO working on that machine (result in max
12.7Mpps drop).  Even-though the mlx5 card does work with DDIO.  Even
removed the mlx5 and used same slot but no luck.   (A side-note: Also
measured a 16ns performance difference between which PCIe slot I'm
using).

The reason I wanted to benchmark this on a DDIO machine is, that I'm
suspecting that the added cost, could be hiding behind the cache miss.

Well, I'm running out-of-time benchmarking this stuff, I must prepare
for my Network Performance Workshop ;-)


(A side-note: my skylake motherboard also had a PCI slot, so I found an
old e1000 NIC in my garage, and it worked!)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Jesper Dangaard Brouer

On Wed, 21 Sep 2016 08:08:34 -0700 Tom Herbert  wrote:

> On Wed, Sep 21, 2016 at 7:48 AM, Thomas Graf  wrote:
> > On 09/21/16 at 07:19am, Tom Herbert wrote:  
> >> certain design that because of constraints on one kernel interface. As
> >> a kernel developer I want flexibility on how we design and implement
> >> things!  
> >
> > Perfectly valid argument. I reviewed your ILA changes and did not
> > object to them.
> >
> >  
> >> I think there are two questions that this patch set poses for the
> >> community wrt XDP:
> >>
> >> #1: Should we allow alternate code to run in XDP other than BPF?
> >> #2: If #1 is true what is the best way to implement that?
> >>
> >> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
> >> with this RFC I'm hoping we can come the agreement on questions #1.  

I vote yes to #1.

> > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > a linked list of hook consumers.

I also worry about the performance impact of a linked list.  We should
simple benchmark it instead of discussing it! ;-)


> > Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
> > in combination with a potential XDP-nftables? We don't know yet I
> > guess.
> >  
> Right. Admittedly, I feel like we owe a bit of reciprocity to
> nftables. For ILA we are using the NF_INET_PRE_ROUTING hook with our
> own code (looks like ipvlan set nfhooks as well). This works really
> well and saves the value of early demux in ILA. Had we not had the
> ability to use nfhooks in this fashion it's likely we would have had
> to create another hook (we did try putting translation in nftables
> rules but that was too inefficient for ILA).

Thinking about it, I actually think Tom is proposing a very valid user
of the XDP hook, which is the kernel itself.  And Tom even have a real
first user ILA.  The way I read the ILA-RFC-draft[1], the XDP hook
would benefit the NVE (Network Virtualization Edge) component, which
can run separately or run on the Tenant System, where the latter case
could use XDP_PASS.

[1] https://www.ietf.org/archive/id/draft-herbert-nvo3-ila-02.txt
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Thomas Graf
On 09/21/16 at 11:50am, Tom Herbert wrote:
> 50 lines in one driver is not a big deal, 50 lines in a hundred
> drivers is! I learned this lesson in BQL which was well abstracted out
> to be minimally invasive but we still saw many issues because of the
> pecularities of different drivers.

You want to enable XDP in a hundred drivers? Are you planning to
deploy ISA NIC based ILA routers? ;-)


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Jakub Kicinski
On Wed, 21 Sep 2016 11:50:06 -0700, Tom Herbert wrote:
> On Wed, Sep 21, 2016 at 11:45 AM, Jakub Kicinski  wrote:
> > On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote:  
> >> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski  wrote:  
> >> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:  
> >> >>  >  - Reduces the amount of code and complexity needed in drivers to
> >> >>  >manage XDP  
> >> >>
> >> >> hmm:
> >> >> 534 insertions(+), 144 deletions(-)
> >> >> looks like increase in complexity instead.  
> >> >
> >> > and more to come to tie this with HW offloads.  
> >>
> >> The amount of driver code did decrease with these patches:
> >>
> >> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 
> >> --
> >> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 --
> >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
> >>
> >> Minimizing complexity being added to drivers for XDP is critical since
> >> we basically asking every driver to replicate the function. This
> >> property also should also apply to HW offloads, the more complexity we
> >> can abstract out drivers into a common backend infrastructure the
> >> better for supporting across different drivers.  
> >
> > I'm in the middle of writing/testing XDP support for the Netronome's
> > driver and generic infra is very much appreciated ;)  In my experience
> > the 50 lines of code which are required for assigning the programs and
> > freeing them aren't really a big deal, though.
> >  
> 
> 50 lines in one driver is not a big deal, 50 lines in a hundred
> drivers is! I learned this lesson in BQL which was well abstracted out
> to be minimally invasive but we still saw many issues because of the
> pecularities of different drivers.

Agreed, I just meant to say that splitting rings and rewritting RX path
to behave differently for XDP vs non-XDP case is way more brain
consuming than a bit of boilerplate code so if anyone could solve those
two it would be much appreciated :)  My main point was what I wrote
below, though.

> > Let's also separate putting xdp_prog in netdevice/napi_struct from the
> > generic hook infra.  All the simplifications to the driver AFAICS come
> > from the former.  If everyone is fine with growing napi_struct we can do
> > that but IMHO this is not an argument for the generic infra :)  



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Tom Herbert
On Wed, Sep 21, 2016 at 11:45 AM, Jakub Kicinski  wrote:
> On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote:
>> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski  wrote:
>> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:
>> >>  >  - Reduces the amount of code and complexity needed in drivers to
>> >>  >manage XDP
>> >>
>> >> hmm:
>> >> 534 insertions(+), 144 deletions(-)
>> >> looks like increase in complexity instead.
>> >
>> > and more to come to tie this with HW offloads.
>>
>> The amount of driver code did decrease with these patches:
>>
>> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 
>> --
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 --
>> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
>>
>> Minimizing complexity being added to drivers for XDP is critical since
>> we basically asking every driver to replicate the function. This
>> property also should also apply to HW offloads, the more complexity we
>> can abstract out drivers into a common backend infrastructure the
>> better for supporting across different drivers.
>
> I'm in the middle of writing/testing XDP support for the Netronome's
> driver and generic infra is very much appreciated ;)  In my experience
> the 50 lines of code which are required for assigning the programs and
> freeing them aren't really a big deal, though.
>

50 lines in one driver is not a big deal, 50 lines in a hundred
drivers is! I learned this lesson in BQL which was well abstracted out
to be minimally invasive but we still saw many issues because of the
pecularities of different drivers.

> Let's also separate putting xdp_prog in netdevice/napi_struct from the
> generic hook infra.  All the simplifications to the driver AFAICS come
> from the former.  If everyone is fine with growing napi_struct we can do
> that but IMHO this is not an argument for the generic infra :)


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Jakub Kicinski
On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote:
> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski  wrote:
> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:  
> >>  >  - Reduces the amount of code and complexity needed in drivers to
> >>  >manage XDP  
> >>
> >> hmm:
> >> 534 insertions(+), 144 deletions(-)
> >> looks like increase in complexity instead.  
> >
> > and more to come to tie this with HW offloads.  
> 
> The amount of driver code did decrease with these patches:
> 
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 --
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 --
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -
> 
> Minimizing complexity being added to drivers for XDP is critical since
> we basically asking every driver to replicate the function. This
> property also should also apply to HW offloads, the more complexity we
> can abstract out drivers into a common backend infrastructure the
> better for supporting across different drivers.

I'm in the middle of writing/testing XDP support for the Netronome's
driver and generic infra is very much appreciated ;)  In my experience
the 50 lines of code which are required for assigning the programs and
freeing them aren't really a big deal, though.

Let's also separate putting xdp_prog in netdevice/napi_struct from the
generic hook infra.  All the simplifications to the driver AFAICS come
from the former.  If everyone is fine with growing napi_struct we can do
that but IMHO this is not an argument for the generic infra :)


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Tom Herbert
On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski  wrote:
> On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:
>>  >  - Reduces the amount of code and complexity needed in drivers to
>>  >manage XDP
>>
>> hmm:
>> 534 insertions(+), 144 deletions(-)
>> looks like increase in complexity instead.
>
> and more to come to tie this with HW offloads.

The amount of driver code did decrease with these patches:

drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 --
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 --
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 -

Minimizing complexity being added to drivers for XDP is critical since
we basically asking every driver to replicate the function. This
property also should also apply to HW offloads, the more complexity we
can abstract out drivers into a common backend infrastructure the
better for supporting across different drivers.

Tom


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Jakub Kicinski
On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote:
>  >  - Reduces the amount of code and complexity needed in drivers to
>  >manage XDP  
> 
> hmm:
> 534 insertions(+), 144 deletions(-)
> looks like increase in complexity instead.

and more to come to tie this with HW offloads.


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Jakub Kicinski
On Wed, 21 Sep 2016 13:55:45 +0200, Thomas Graf wrote:
> > I am looking at using this for ILA router. The problem I am hitting is
> > that not all packets that we need to translate go through the XDP
> > path. Some would go through the kernel path, some through XDP path but  
> 
> When you say kernel path, what do you mean specifically? One aspect of
> XDP I love is that XDP can act as an acceleration option for existing
> BPF programs attached to cls_bpf. Support for direct packet read and
> write at clsact level have made it straight forward to write programs
> which are compatible or at minimum share a lot of common code. They
> can share data structures, lookup functionality, etc.

My very humble dream was that XDP would be transparently offloaded from
cls_bpf if program was simple enough.  That ship has most likely sailed
because XDP has different abort behaviour.  When possible though, trying
to offload higher-level hooks when the rules don't require access to
full skb would be really cool.


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Alexei Starovoitov

On 9/20/16 11:39 PM, Jesper Dangaard Brouer wrote:

We are the early stages of XDP development. Users cannot consider XDP a
stable UAPI yet.  I added a big fat warning to the docs here[1].

If you already consider this a stable API, then I will suggest that we
disable XDP or rip the hole thing out again!!!


the doc that you wrote is a great description of your understanding
of what XDP is about. It's not an official spec or design document.
Until it is reviewed and lands in kernel.org, please do not
make any assumption about present or future XDP API based on it.



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Alexei Starovoitov

On 9/21/16 7:19 AM, Tom Herbert wrote:

#1: Should we allow alternate code to run in XDP other than BPF?


separate nft hook - yes
generic hook - no
since it's one step away from kernel modules abusing this hook.
pass/drop/tx of raw buffer at the driver level is a perfect
interface to bypass everything in the stack.
The tighter we make it the better.

If nft and bpf are both not flexible enough to express
dataplane functionality we should extend them instead of
writing C code or kernel modules.

On bpf side we're trying very hard to kill any dream of
interoperability with kernel modules.
The map and prog type registration is done in a way to make
it impossible for kernel modules to register their own
map and program types or provide their own helper functions.

nfhooks approach is very lax at that and imo it was a mistake,
since there are plenty of out of tree modules that using nf hooks
and plenty of in-tree modules that are barely maintained.


#2: If #1 is true what is the best way to implement that?


Add separate nft hook that doesn't interfere in any way
with bpf hook at xdp level.
The order nft-first or bpf-first or exclusive attach
doesn't matter to me. These are details to be discussed.



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Tom Herbert
On Wed, Sep 21, 2016 at 7:48 AM, Thomas Graf  wrote:
> On 09/21/16 at 07:19am, Tom Herbert wrote:
>> certain design that because of constraints on one kernel interface. As
>> a kernel developer I want flexibility on how we design and implement
>> things!
>
> Perfectly valid argument. I reviewed your ILA changes and did not
> object to them.
>
>
>> I think there are two questions that this patch set poses for the
>> community wrt XDP:
>>
>> #1: Should we allow alternate code to run in XDP other than BPF?
>> #2: If #1 is true what is the best way to implement that?
>>
>> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
>> with this RFC I'm hoping we can come the agreement on questions #1.
>
> I'm not opposed to running non-BPF code at XDP. I'm against adding
> a linked list of hook consumers.
>
> Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
> in combination with a potential XDP-nftables? We don't know yet I
> guess.
>
Right. Admittedly, I feel like we owe a bit of reciprocity to
nftables. For ILA we are using the NF_INET_PRE_ROUTING hook with our
own code (looks like ipvlan set nfhooks as well). This works really
well and saves the value of early demux in ILA. Had we not had the
ability to use nfhooks in this fashion it's likely we would have had
to create another hook (we did try putting translation in nftables
rules but that was too inefficient for ILA).

> Maybe exclusive access to the hook for one consumer as selected by
> the user is good enough.
>
> If that is not good enough: BPF (and potentially nftables in the
> future) could provide means to perform a selection process where a
> helper call can run another XDP prog or return a verdict to trigger
> another XDP prog. Definitely more flexible and faster than a linear
> list doing  if, else if, else if, else if, ...

It seems reasonable that the the output of one program may be an
indication of another program. We've already talked about something
like that in regards to splitting BPF programs into device independent
program and device dependent program.

Tom


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Thomas Graf
On 09/21/16 at 07:19am, Tom Herbert wrote:
> certain design that because of constraints on one kernel interface. As
> a kernel developer I want flexibility on how we design and implement
> things!

Perfectly valid argument. I reviewed your ILA changes and did not
object to them.


> I think there are two questions that this patch set poses for the
> community wrt XDP:
> 
> #1: Should we allow alternate code to run in XDP other than BPF?
> #2: If #1 is true what is the best way to implement that?
> 
> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
> with this RFC I'm hoping we can come the agreement on questions #1.

I'm not opposed to running non-BPF code at XDP. I'm against adding
a linked list of hook consumers.

Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
in combination with a potential XDP-nftables? We don't know yet I
guess.

Maybe exclusive access to the hook for one consumer as selected by
the user is good enough.

If that is not good enough: BPF (and potentially nftables in the
future) could provide means to perform a selection process where a
helper call can run another XDP prog or return a verdict to trigger
another XDP prog. Definitely more flexible and faster than a linear
list doing  if, else if, else if, else if, ...


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Tom Herbert
On Wed, Sep 21, 2016 at 4:55 AM, Thomas Graf  wrote:
> On 09/20/16 at 04:59pm, Tom Herbert wrote:
>> Well, need to measure to ascertain the cost. As for complexity, this
>> actually reduces complexity needed for XDP in the drivers which is a
>> good thing because that's where most of the support and development
>> pain will be.
>
> I'm not objecting to anything that simplifies the process of adding
> XDP capability to drivers. You have my full support here.
>
>> I am looking at using this for ILA router. The problem I am hitting is
>> that not all packets that we need to translate go through the XDP
>> path. Some would go through the kernel path, some through XDP path but
>
> When you say kernel path, what do you mean specifically? One aspect of
> XDP I love is that XDP can act as an acceleration option for existing
> BPF programs attached to cls_bpf. Support for direct packet read and
> write at clsact level have made it straight forward to write programs
> which are compatible or at minimum share a lot of common code. They
> can share data structures, lookup functionality, etc.
>
>> We can optimize for allowing only one hook, or maybe limit to only
>> allowing one hook to be set. In any case this obviously requires a lot
>> of performance evaluation, I am hoping to feedback on the design
>> first. My question about using a linear list for this was real, do you
>> know a better method off hand to implement a call list?
>
> My main concern is that we overload the XDP hook. Instead of making use
> of the programmable glue, we put a linked list in front where everybody
> can attach a program to.
>
> A possible alternative:
>  1. The XDP hook always has single consumer controlled by the user
> through Netlink, BPF is one of them. If a user wants to hardcode
> the ILA router to that hook, he can do that.
>
>  2. BPF for XDP is extended to allow returning a verdict which results
> in something else to be invoked. If user wants to invoke the ILA
> router for just some packets, he can do that.
>
> That said, I see so much value in a BPF implementation of ILA at XDP
> level with all of the parsing logic and exact semantics remain
> flexible without the cost of translating some configuration to a set
> of actions.

Thomas,

As I mentioned though, not all packets we need to translate are going
to go through XDP. ILA already integrates with the routing table via
LWT and that solution works incredibly well especially when we are
sourcing connections (ILA is by far the fastest most efficient
technique of network virtualization). We already have a lookup table
and the translation process coded in the kernel and configuration is
already implemented by netlink and iproute. So I'm not yet sure I want
to redesign all of this around BPF, maybe that is the right answer
maybe it isn't; but, my point is I don't want to be _forced_ into a
certain design that because of constraints on one kernel interface. As
a kernel developer I want flexibility on how we design and implement
things!

I think there are two questions that this patch set poses for the
community wrt XDP:

#1: Should we allow alternate code to run in XDP other than BPF?
#2: If #1 is true what is the best way to implement that?

If the answer to #1 is "no" then the answer to #2 is irrelevant. So
with this RFC I'm hoping we can come the agreement on questions #1.

IMO, there are at least three factors that would motivate the answer
to #1 be yes

1) There is precedence in nfhooks in creating generic hooks in the
critical data path that can have other uses than those originally
intended by the authors. I would take it as general principle that
hooks in the data path should be as generic as possible to get the
most utility.
2) The changes to make XDP work in drivers are quite invasive and as
we've seen in at least one attempt to modify a driver to support XDP
it is easy to break other mechanisms. If we're going to be modifying a
lot of drivers we really want to get the most value for that work.
3) The XDP interface is well designed and really has no dependencies
on BPF. Writing code to directly parse packets is not rocket science.
So it seems reasonable to me that a subsystem may want to use XDP
interface to accelerate existing functionality without introducing
additional dependencies on BPF or other non-kernel mechanisms

In any case, I ask everyone to be open minded. If there is agreement
that BPF is sufficient for all XDP use cases then I wont' pursue these
patches. But given the ramifications of XDP, especially the
considering the changes required across drivers to support it, I
really would like to get input from the community on this.

Thanks,
Tom


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Thomas Graf
On 09/20/16 at 04:59pm, Tom Herbert wrote:
> Well, need to measure to ascertain the cost. As for complexity, this
> actually reduces complexity needed for XDP in the drivers which is a
> good thing because that's where most of the support and development
> pain will be.

I'm not objecting to anything that simplifies the process of adding
XDP capability to drivers. You have my full support here.

> I am looking at using this for ILA router. The problem I am hitting is
> that not all packets that we need to translate go through the XDP
> path. Some would go through the kernel path, some through XDP path but

When you say kernel path, what do you mean specifically? One aspect of
XDP I love is that XDP can act as an acceleration option for existing
BPF programs attached to cls_bpf. Support for direct packet read and
write at clsact level have made it straight forward to write programs
which are compatible or at minimum share a lot of common code. They
can share data structures, lookup functionality, etc.

> We can optimize for allowing only one hook, or maybe limit to only
> allowing one hook to be set. In any case this obviously requires a lot
> of performance evaluation, I am hoping to feedback on the design
> first. My question about using a linear list for this was real, do you
> know a better method off hand to implement a call list?

My main concern is that we overload the XDP hook. Instead of making use
of the programmable glue, we put a linked list in front where everybody
can attach a program to.

A possible alternative:
 1. The XDP hook always has single consumer controlled by the user
through Netlink, BPF is one of them. If a user wants to hardcode
the ILA router to that hook, he can do that.

 2. BPF for XDP is extended to allow returning a verdict which results
in something else to be invoked. If user wants to invoke the ILA
router for just some packets, he can do that.

That said, I see so much value in a BPF implementation of ILA at XDP
level with all of the parsing logic and exact semantics remain
flexible without the cost of translating some configuration to a set
of actions.


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-21 Thread Daniel Borkmann

On 09/21/2016 08:39 AM, Jesper Dangaard Brouer wrote:

On Tue, 20 Sep 2016 17:01:39 -0700 Alexei Starovoitov  wrote:


  >  - Provides a more structured environment that is extensible to new
  >features while being mostly transparent to the drivers

don't see that in these patches either.
Things like packet size change (that we're working on) still
has to be implemented for every driver.
Existing XDP_TX, XDP_DROP have to be implemented per driver as well.

Also introduction of xdp.h breaks existing UAPI.
That's not acceptable either.


This response piss me off!@@#$

We are the early stages of XDP development. Users cannot consider XDP a
stable UAPI yet.  I added a big fat warning to the docs here[1].

If you already consider this a stable API, then I will suggest that we
disable XDP or rip the hole thing out again!!! Create a separate tree
where we can cooperate on getting this right, before forcing this
upstream.  I have raise concern about this several times upstream, but
the patches got applied anyway, because you Alexei, promised this was
super extendable and we could still change the APIs later. Maybe you
tricked me?! I've started to look at the details, and I'm not happy with
the extensibility.


I don't quite follow you here, Jesper. If you're talking about uapi-related
extensibility wrt struct xdp_md, then it's done the same way as done for tc
with 'shadow' struct __sk_buff. The concept of having this internal BPF insn
rewriter is working quite well for this, and it is extendable with new meta
data. Wrt return codes we're flexible to add new ones once agreed upon. The
whole XDP config is done via netlink, nested in IFLA_XDP container, so it can
be extended in future with other attrs, flags, etc, for setup and dumping.
The 'breakage' that was mentioned here was related to moving things into xdp.h.
It can be done and we did it similarly in the past with bpf_common.h, but
then really also bpf.h needs to include the new xdp.h header.


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Jesper Dangaard Brouer
On Tue, 20 Sep 2016 17:01:39 -0700 Alexei Starovoitov  wrote:

>  >  - Provides a more structured environment that is extensible to new
>  >features while being mostly transparent to the drivers  
> 
> don't see that in these patches either.
> Things like packet size change (that we're working on) still
> has to be implemented for every driver.
> Existing XDP_TX, XDP_DROP have to be implemented per driver as well.
> 
> Also introduction of xdp.h breaks existing UAPI.
> That's not acceptable either.

This response piss me off!@@#$

We are the early stages of XDP development. Users cannot consider XDP a
stable UAPI yet.  I added a big fat warning to the docs here[1].

If you already consider this a stable API, then I will suggest that we
disable XDP or rip the hole thing out again!!! Create a separate tree
where we can cooperate on getting this right, before forcing this
upstream.  I have raise concern about this several times upstream, but
the patches got applied anyway, because you Alexei, promised this was
super extendable and we could still change the APIs later. Maybe you
tricked me?! I've started to look at the details, and I'm not happy with
the extensibility.  And I'm also not happy with Brenden's response to
my API concerns that this is just a "fire-and-forget" API.

Most importantly, the XDP interface for feature or capabilities
negotiation is missing.  Documented here[2].

I strongly believe the two actions XDP_DROP and XDP_TX should have been
express as two different capabilities, because XDP_TX requires more
changes to the device driver than a simple drop like XDP_DROP.

One can easily imagine (after the e1000 discussion) that an older
driver only want to implement the XDP_DROP facility. The reason is that
XDP_TX would requires changing too much driver code, which is a concern
for an old stable and time-proven driver.  Thus, the need for
negotiating features is already a practical problem!


[1] 
https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/implementation/userspace_api.html

Quote:
  "Warning: The userspace API specification should have to be defined
   properly before code was accepted upstream. Concerns have been raise
   about the current API upstream. Users should expect this first API
   attempt will need adjustments. This cannot be considered a stable API yet."

[2] 
https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/design.html#capabilities-negotiation

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Alexei Starovoitov

On 9/20/16 4:59 PM, Tom Herbert wrote:

I am looking at using this for ILA router. The problem I am hitting is
that not all packets that we need to translate go through the XDP
path. Some would go through the kernel path, some through XDP path but
that would mean I need parallel lookup tables to be maintained for the
two paths which won't scale. ILA translation is so trivial and not
really something that we need to be user programmable, the fast path
is really for accelerating an existing kernel capability. If I can
reuse the kernel code already written and the existing kernel data
structures to make a fast path in XDP there is a lot of value in that
for me.


sounds like you want to add hard coded ILA rewriter to the driver
instead of doing it as BPF program?!
That is 180 degree turn vs the whole protocol ossification tune
that I thought you strongly believe in.

What kernel data structures do you want to reuse?
ILA rewriter needs single hash lookup. Several different
types of hash maps exist on bpf side already and
even more are coming that will be usable by both tc and xdp side.
csum adjustment? we have them for tc. Not for xdp yet,
but it's trivial to allow them on xdp side too.
May be we should talk about real motivation for the patches
and see what is the best solution.



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Alexei Starovoitov

On 9/20/16 3:00 PM, Tom Herbert wrote:

+static inline int __xdp_hook_run(struct list_head *list_head,
+struct xdp_buff *xdp)
+{
+   struct xdp_hook_ops *elem;
+   int ret = XDP_PASS;
+
+   list_for_each_entry(elem, list_head, list) {
+   ret = elem->hook(elem->priv, xdp);
+   if (ret != XDP_PASS)
+   break;
+   }
+
+   return ret;
+}
+
+/* Run the XDP hooks for a napi device. Called from a driver's receive
+ * routine
+ */
+static inline int xdp_hook_run(struct napi_struct *napi, struct xdp_buff *xdp)
+{
+   struct net_device *dev = napi->dev;
+   int ret = XDP_PASS;
+
+   if (static_branch_unlikely(&xdp_hooks_needed)) {
+   /* Run hooks in napi first */
+   ret = __xdp_hook_run(&napi->xdp_hook_list, xdp);
+   if (ret != XDP_PASS)
+   return ret;
+
+   /* Now run device hooks */
+   ret = __xdp_hook_run(&dev->xdp_hook_list, xdp);
+   if (ret != XDP_PASS)
+   return ret;
+   }
+
+   return ret;
+}


it's an interesting idea to move prog pointer into napi struct,
but certainly not at such huge cost.
Right now it's 1 load + 1 cmp + 1 indirect jump per packet
to invoke the program, with above approach it becomes
6 loads + 3 cmp (just to get through run_needed_check() check)
+ 6 loads + 3 cmp + 2 indirect jumps.
(I may be little bit off +- few loads)
That is a non-starter.
When we were optimizing receive path of tc clast ingress hook
we saw 1Mpps saved for every load+cmp+indirect jump removed.

We're working on inlining of bpf_map_lookup to save one
indirect call per lookup, we cannot just waste them here.

We need to save cycles instead, especially when it doesn't
really solve your goals. It seems the goals are:

>- Allows alternative users of the XDP hooks other than the original
>BPF

this should be achieved by their own hooks while reusing
return codes XDP_TX, XDP_PASS to keep driver side the same.
I'm not against other packet processing engines, but not
at the cost of lower performance.

>  - Allows a means to pipeline XDP programs together

this can be done already via bpf_tail_call. No changes needed.

>  - Reduces the amount of code and complexity needed in drivers to
>manage XDP

hmm:
534 insertions(+), 144 deletions(-)
looks like increase in complexity instead.

>  - Provides a more structured environment that is extensible to new
>features while being mostly transparent to the drivers

don't see that in these patches either.
Things like packet size change (that we're working on) still
has to be implemented for every driver.
Existing XDP_TX, XDP_DROP have to be implemented per driver as well.

Also introduction of xdp.h breaks existing UAPI.
That's not acceptable either.



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert
On Tue, Sep 20, 2016 at 4:43 PM, Thomas Graf  wrote:
> On 09/20/16 at 04:18pm, Tom Herbert wrote:
>> This allows other use cases than BPF inserting code into the data
>> path. This gives XDP potential more utility and more users so that we
>> can motivate more driver implementations. For instance, I thinks it's
>> totally reasonable if the nftables guys want to insert some of their
>> rules to perform early DDOS drop to get the same performance that we
>> see in XDP.
>
> Reasonable point with nftables but are any of these users on the table
> already and ready to consume non-skbs? It would be a pity to add this
> complexity and cost if it is never used.
>
Well, need to measure to ascertain the cost. As for complexity, this
actually reduces complexity needed for XDP in the drivers which is a
good thing because that's where most of the support and development
pain will be.

I am looking at using this for ILA router. The problem I am hitting is
that not all packets that we need to translate go through the XDP
path. Some would go through the kernel path, some through XDP path but
that would mean I need parallel lookup tables to be maintained for the
two paths which won't scale. ILA translation is so trivial and not
really something that we need to be user programmable, the fast path
is really for accelerating an existing kernel capability. If I can
reuse the kernel code already written and the existing kernel data
structures to make a fast path in XDP there is a lot of value in that
for me.

> I don't see how we can ensure performance if we have multiple
> subsystems register for the hook each adding their own parsers which
> need to be passed through sequentially. Maybe I'm missing something.

We can optimize for allowing only one hook, or maybe limit to only
allowing one hook to be set. In any case this obviously requires a lot
of performance evaluation, I am hoping to feedback on the design
first. My question about using a linear list for this was real, do you
know a better method off hand to implement a call list?

Thanks,
Tom


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Thomas Graf
On 09/20/16 at 04:18pm, Tom Herbert wrote:
> This allows other use cases than BPF inserting code into the data
> path. This gives XDP potential more utility and more users so that we
> can motivate more driver implementations. For instance, I thinks it's
> totally reasonable if the nftables guys want to insert some of their
> rules to perform early DDOS drop to get the same performance that we
> see in XDP.

Reasonable point with nftables but are any of these users on the table
already and ready to consume non-skbs? It would be a pity to add this
complexity and cost if it is never used.

I don't see how we can ensure performance if we have multiple
subsystems register for the hook each adding their own parsers which
need to be passed through sequentially. Maybe I'm missing something.


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Daniel Borkmann

On 09/21/2016 01:09 AM, Thomas Graf wrote:

On 09/20/16 at 03:49pm, Tom Herbert wrote:

On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:

On 09/20/16 at 03:00pm, Tom Herbert wrote:

+static inline int __xdp_hook_run(struct list_head *list_head,
+  struct xdp_buff *xdp)
+{
+ struct xdp_hook_ops *elem;
+ int ret = XDP_PASS;
+
+ list_for_each_entry(elem, list_head, list) {
+ ret = elem->hook(elem->priv, xdp);
+ if (ret != XDP_PASS)
+ break;
+ }


Walking over a linear list? Really? :-) I thought this was supposed
to be fast, no compromises made.


Can you suggest an alternative?


Single BPF program that encodes whatever logic is required. This is
what BPF is for. If it absolutely has to run two programs in sequence
then it can still do that even though I really don't see much of a
point of doing that in a high performance environment.


Agreed, if there's one thing I would change in cls_bpf, then it's getting
rid of the (uapi unfortunately) list of classifiers and just make it a
single one, because that's all that is needed, and chaining/pipelining
can be done via tail calls for example. This whole list + callback likely
also makes things slower. Why not let more drivers come first to support
the current xdp model we have, and then we can move common parts to the
driver-independent core code?


I'm not even sure yet I understand full purpose of this yet ;-)


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert
On Tue, Sep 20, 2016 at 4:09 PM, Thomas Graf  wrote:
> On 09/20/16 at 03:49pm, Tom Herbert wrote:
>> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:
>> > On 09/20/16 at 03:00pm, Tom Herbert wrote:
>> >> +static inline int __xdp_hook_run(struct list_head *list_head,
>> >> +  struct xdp_buff *xdp)
>> >> +{
>> >> + struct xdp_hook_ops *elem;
>> >> + int ret = XDP_PASS;
>> >> +
>> >> + list_for_each_entry(elem, list_head, list) {
>> >> + ret = elem->hook(elem->priv, xdp);
>> >> + if (ret != XDP_PASS)
>> >> + break;
>> >> + }
>> >
>> > Walking over a linear list? Really? :-) I thought this was supposed
>> > to be fast, no compromises made.
>>
>> Can you suggest an alternative?
>
> Single BPF program that encodes whatever logic is required. This is
> what BPF is for. If it absolutely has to run two programs in sequence
> then it can still do that even though I really don't see much of a
> point of doing that in a high performance environment.
>
> I'm not even sure yet I understand full purpose of this yet ;-)

This allows other use cases than BPF inserting code into the data
path. This gives XDP potential more utility and more users so that we
can motivate more driver implementations. For instance, I thinks it's
totally reasonable if the nftables guys want to insert some of their
rules to perform early DDOS drop to get the same performance that we
see in XDP.

Tom


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Thomas Graf
On 09/20/16 at 03:49pm, Tom Herbert wrote:
> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:
> > On 09/20/16 at 03:00pm, Tom Herbert wrote:
> >> +static inline int __xdp_hook_run(struct list_head *list_head,
> >> +  struct xdp_buff *xdp)
> >> +{
> >> + struct xdp_hook_ops *elem;
> >> + int ret = XDP_PASS;
> >> +
> >> + list_for_each_entry(elem, list_head, list) {
> >> + ret = elem->hook(elem->priv, xdp);
> >> + if (ret != XDP_PASS)
> >> + break;
> >> + }
> >
> > Walking over a linear list? Really? :-) I thought this was supposed
> > to be fast, no compromises made.
> 
> Can you suggest an alternative?

Single BPF program that encodes whatever logic is required. This is
what BPF is for. If it absolutely has to run two programs in sequence
then it can still do that even though I really don't see much of a
point of doing that in a high performance environment.

I'm not even sure yet I understand full purpose of this yet ;-)


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert
On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf  wrote:
> On 09/20/16 at 03:00pm, Tom Herbert wrote:
>> +static inline int __xdp_hook_run(struct list_head *list_head,
>> +  struct xdp_buff *xdp)
>> +{
>> + struct xdp_hook_ops *elem;
>> + int ret = XDP_PASS;
>> +
>> + list_for_each_entry(elem, list_head, list) {
>> + ret = elem->hook(elem->priv, xdp);
>> + if (ret != XDP_PASS)
>> + break;
>> + }
>
> Walking over a linear list? Really? :-) I thought this was supposed
> to be fast, no compromises made.

Can you suggest an alternative?


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Thomas Graf
On 09/20/16 at 03:00pm, Tom Herbert wrote:
> +static inline int __xdp_hook_run(struct list_head *list_head,
> +  struct xdp_buff *xdp)
> +{
> + struct xdp_hook_ops *elem;
> + int ret = XDP_PASS;
> +
> + list_for_each_entry(elem, list_head, list) {
> + ret = elem->hook(elem->priv, xdp);
> + if (ret != XDP_PASS)
> + break;
> + }

Walking over a linear list? Really? :-) I thought this was supposed
to be fast, no compromises made.


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert
On Tue, Sep 20, 2016 at 3:37 PM, Eric Dumazet  wrote:
> On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote:
>
>> diff --git a/net/core/xdp.c b/net/core/xdp.c
>> new file mode 100644
>> index 000..815ead8
>> --- /dev/null
>> +++ b/net/core/xdp.c
>> @@ -0,0 +1,211 @@
>> +/*
>> + * Kernel Connection Multiplexor
>> + *
>> + * Copyright (c) 2016 Tom Herbert 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2
>> + * as published by the Free Software Foundation.
>> + */
>
>
> Too much copy/paste Tom :)
>
If this is you're only complaint Eric I'll be quite impressed ;-)

>


Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Eric Dumazet
On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote:

> diff --git a/net/core/xdp.c b/net/core/xdp.c
> new file mode 100644
> index 000..815ead8
> --- /dev/null
> +++ b/net/core/xdp.c
> @@ -0,0 +1,211 @@
> +/*
> + * Kernel Connection Multiplexor
> + *
> + * Copyright (c) 2016 Tom Herbert 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation.
> + */


Too much copy/paste Tom :)




[PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-20 Thread Tom Herbert
This patch creates an infrastructure for registering and running code at
XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows
heavily from the techniques used by netfilter to make generic nfhooks.

An XDP hook is defined by the  xdp_hook_ops. This structure contains the
ops of an XDP hook. A pointer to this structure is passed into the XDP
register function to set up a hook. The XDP register function mallocs
its own xdp_hook_ops structure and copies the values from the
xdp_hook_ops passed in. The register function also stores the pointer
value of the xdp_hook_ops argument; this pointer is used in subsequently
calls to XDP to identify the registered hook.

The interface is defined in net/xdp.h. This includes the definition of
xdp_hook_ops, functions to register and unregister hook ops on a device
or individual instances of napi, and xdp_hook_run that is called by
drivers to run the hooks.

Signed-off-by: Tom Herbert 
---
 include/linux/filter.h  |   6 +-
 include/linux/netdev_features.h |   3 +-
 include/linux/netdevice.h   |  11 ++
 include/net/xdp.h   | 218 
 include/uapi/linux/bpf.h|  20 
 include/uapi/linux/xdp.h|  24 +
 net/core/Makefile   |   2 +-
 net/core/dev.c  |   4 +
 net/core/xdp.c  | 211 ++
 9 files changed, 472 insertions(+), 27 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 include/uapi/linux/xdp.h
 create mode 100644 net/core/xdp.c

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..2a26133 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -16,6 +16,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 
@@ -432,11 +433,6 @@ struct bpf_skb_data_end {
void *data_end;
 };
 
-struct xdp_buff {
-   void *data;
-   void *data_end;
-};
-
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf and act_bpf programs
  */
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 9c6c8ef..697fdea 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -72,8 +72,8 @@ enum {
NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
NETIF_F_HW_L2FW_DOFFLOAD_BIT,   /* Allow L2 Forwarding in Hardware */
NETIF_F_BUSY_POLL_BIT,  /* Busy poll */
-
NETIF_F_HW_TC_BIT,  /* Offload TC infrastructure */
+   NETIF_F_XDP_BIT,/* Support XDP interface */
 
/*
 * Add your fresh new feature above and remember to update
@@ -136,6 +136,7 @@ enum {
 #define NETIF_F_HW_L2FW_DOFFLOAD   __NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL  __NETIF_F(BUSY_POLL)
 #define NETIF_F_HW_TC  __NETIF_F(HW_TC)
+#define NETIF_F_XDP__NETIF_F(XDP)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a10d8d1..f2b7d1b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,6 +324,7 @@ struct napi_struct {
struct sk_buff  *skb;
struct hrtimer  timer;
struct list_headdev_list;
+   struct list_headxdp_hook_list;
struct hlist_node   napi_hash_node;
unsigned intnapi_id;
 };
@@ -819,6 +820,14 @@ enum xdp_netdev_command {
 * return true if a program is currently attached and running.
 */
XDP_QUERY_PROG,
+   /* Initialize XDP in the device. Called the first time an XDP hook
+* hook is being set on the device.
+*/
+   XDP_DEV_INIT,
+   /* XDP is finished on the device. Called after the last XDP hook
+* has been removed from a device.
+*/
+   XDP_DEV_FINISH,
 };
 
 struct netdev_xdp {
@@ -1663,6 +1672,8 @@ struct net_device {
struct list_headclose_list;
struct list_headptype_all;
struct list_headptype_specific;
+   struct list_headxdp_hook_list;
+   unsigned intxdp_hook_cnt;
 
struct {
struct list_head upper;
diff --git a/include/net/xdp.h b/include/net/xdp.h
new file mode 100644
index 000..c01a44e
--- /dev/null
+++ b/include/net/xdp.h
@@ -0,0 +1,218 @@
+/*
+ * eXpress Data Path (XDP)
+ *
+ * Copyright (c) 2016 Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __NET_XDP_H_
+#define __NET_XDP_H_
+
+#include 
+#include 
+#include 
+
+/* XDP data structure.
+ *
+ * Fields:
+ *   data - pointer to first byte of data
+ *   data_end - pointer to last byte
+ *