Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 16-09-23 10:14 AM, Tom Herbert wrote: On Fri, Sep 23, 2016 at 4:13 AM, Jamal Hadi Salim wrote: On 16-09-20 06:00 PM, Tom Herbert wrote: Tom, perused the thread and it seems you are serious ;-> Are we heading towards Frankenstein Avenue? I'm surprised. I thought you'd like democratizing an interface. :-) I do;-> challenge here is the worry about the classical camel's nose metaphor[1]. The original intent was for simplicity. Do one very simple thing. Do it fast. Note: this is code that is sitting in a critical driver path [Democratizing the interfaces is a bigger scope discussion we can have not just for XDP but tc, netfilter etc.] The whole point behind letting in XDP is so that _small programs_ can be written to do some quick thing. eBPF 4K program limit was touted as the check and bound. eBPF sounded fine. This sounds like a huge contradiction. Jamal, thanks for the feedback. IMO, XDP != BPF. XDP is an interface in drivers that allow raw packet processing in the receive path,. This is akin to DPDK in the kernel. Understood. Most of these DPDK apps do simple things. EX: show ILA forwarding with high throughput on a single core - which can be achieved with XDP as is right now. With DPDK you start doing more generic things and shit starts slowing down. We can have the best of both worlds In Linux (with 2-3 other modular interfaces above XDP). Getting both {performance, flexibility} is hard. Add a third dimension of usability and everything starts spiralling on the z-axis. i.e pick one and a quarter of those 3 features; anything else is a an art project. Example: you start adding flexibility we will no doubt need a northbound control interface, etc. So I see XDP as this "fast and specific blob" and things above are more generic. Our biggest issue, as the netmap and other folks have shown, is the obesity of the skb. On XDP, I would bet things get faster because there is no skb - the danger is turning it into "move your stack to XDP to avoid skbs" > BPF is critical use case of the interface. As for the 4K limit that still allows for a lot of damage and abuse that the user can do with a loadable program and, yes, code in the kernel can do significant damage as well. But we have at least a handwavy insurance limit. A determined person could circumvent the limit - but it is more a deterrent than enforcement. However, unlike BPF programs being set from userspace, kernel code has a well established process for acceptance, any suggested code gets review, and if it's going "Frankenstein Avenue" I'd expect it to be rejected. I get a little worried that we are going to start artificially limiting what we can do in the kernel on the basis that we don't trust kernel programmers to be competent enough not to abuse kernel interfaces-- the fact that we are enabling userspace to arbitrarily program the kernel but purposely limiting what the kernel itself can do would be a huge irony to me. My concern is not incompentency. We already allow people to shoot their little finger if they want to. And just to be clear I am not concerned with allowing other people to save the world their way ;-> So lets say we said ebpf was one use case: If you wrote an XDP "app" in plain C, could you still enforce this limit so bigcorp doesnt start building islands around Linux? Again, the camel metaphor applies. Note, comparison with kernel modules at tc or netfilter hooks doesnt apply because: a) it is a lot harder to avoid kernel interfaces (skbs etc) in those places. XDP for example totally avoids the skb. So contributing to the Linux base is a given with kernel modules. b) Programs that are usable as kernel modules tend to be geared towards inclusion into the kernel because of the complexity. It is more maintainance to keep them off tree. XDP is such a simple interface that the opportunity to move a whole DPDK-like industry into the kernel is huge (at the detriment of the Linux kernel networking). cheers, jamal
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 16-09-23 09:00 AM, Jesper Dangaard Brouer wrote: On Fri, 23 Sep 2016 07:13:30 -0400 Jamal Hadi Salim wrote: [..] Tom, perused the thread and it seems you are serious ;-> Are we heading towards Frankenstein Avenue? The whole point behind letting in XDP is so that _small programs_ can be written to do some quick thing. eBPF 4K program limit was touted as the check and bound. eBPF sounded fine. This sounds like a huge contradiction. cheers, jamal Hi Jamal, I don't understand why you think this is so controversial. The way I see it (after reading the thread): This is about allowing kernel components to _also_ use the XDP hook. The initial push was XDP to support very small programs that did very simple things fast (to be extreme: for example running a whole network stack is a no-no). EBPF with the 4K program limit was pointed as the limit. What Tom is presenting is implying this constraint is now being removed. Thats the controversy. cheers, jamal
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 9/23/16 6:00 AM, Jesper Dangaard Brouer wrote: To support Tom's case, with eBPF, I think we would have to implement specific eBPF helper functions that can do the ILA table lookups. And then need a userspace component to load the eBPF program. Why add all this, when we could contain all this in the kernel, and simply call this as native C-code via the XDP hook? well, ILA router was written in BPF already :) Once for tc+clsact and once for XDP. In both cases no extra bpf helpers were needed. Due to this use case we added an optimization to be able to do a map lookup with packet data directly (instead of copying packet bytes to stack and pointing map_lookup helper to stack). Looks like those ila router patches never made it to the list. I'll dig them out and forward.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Fri, Sep 23, 2016 at 4:13 AM, Jamal Hadi Salim wrote: > On 16-09-20 06:00 PM, Tom Herbert wrote: >> >> This patch creates an infrastructure for registering and running code at >> XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows >> heavily from the techniques used by netfilter to make generic nfhooks. >> >> An XDP hook is defined by the xdp_hook_ops. This structure contains the >> ops of an XDP hook. A pointer to this structure is passed into the XDP >> register function to set up a hook. The XDP register function mallocs >> its own xdp_hook_ops structure and copies the values from the >> xdp_hook_ops passed in. The register function also stores the pointer >> value of the xdp_hook_ops argument; this pointer is used in subsequently >> calls to XDP to identify the registered hook. >> >> The interface is defined in net/xdp.h. This includes the definition of >> xdp_hook_ops, functions to register and unregister hook ops on a device >> or individual instances of napi, and xdp_hook_run that is called by >> drivers to run the hooks. >> > > Tom, > perused the thread and it seems you are serious ;-> > Are we heading towards Frankenstein Avenue? I'm surprised. I thought you'd like democratizing an interface. :-) > The whole point behind letting in XDP is so that _small programs_ > can be written to do some quick thing. eBPF 4K program limit was > touted as the check and bound. eBPF sounded fine. > This sounds like a huge contradiction. Jamal, thanks for the feedback. IMO, XDP != BPF. XDP is an interface in drivers that allow raw packet processing in the receive path,. This is akin to DPDK in the kernel. BPF is critical use case of the interface. As for the 4K limit that still allows for a lot of damage and abuse that the user can do with a loadable program and, yes, code in the kernel can do significant damage as well. However, unlike BPF programs being set from userspace, kernel code has a well established process for acceptance, any suggested code gets review, and if it's going "Frankenstein Avenue" I'd expect it to be rejected. I get a little worried that we are going to start artificially limiting what we can do in the kernel on the basis that we don't trust kernel programmers to be competent enough not to abuse kernel interfaces-- the fact that we are enabling userspace to arbitrarily program the kernel but purposely limiting what the kernel itself can do would be a huge irony to me. Thanks, Tom > > cheers, > jamal > >
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Fri, 23 Sep 2016 07:13:30 -0400 Jamal Hadi Salim wrote: > On 16-09-20 06:00 PM, Tom Herbert wrote: > > This patch creates an infrastructure for registering and running code at > > XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows > > heavily from the techniques used by netfilter to make generic nfhooks. > > > > An XDP hook is defined by the xdp_hook_ops. This structure contains the > > ops of an XDP hook. A pointer to this structure is passed into the XDP > > register function to set up a hook. The XDP register function mallocs > > its own xdp_hook_ops structure and copies the values from the > > xdp_hook_ops passed in. The register function also stores the pointer > > value of the xdp_hook_ops argument; this pointer is used in subsequently > > calls to XDP to identify the registered hook. > > > > The interface is defined in net/xdp.h. This includes the definition of > > xdp_hook_ops, functions to register and unregister hook ops on a device > > or individual instances of napi, and xdp_hook_run that is called by > > drivers to run the hooks. > > > > Tom, > perused the thread and it seems you are serious ;-> > Are we heading towards Frankenstein Avenue? > The whole point behind letting in XDP is so that _small programs_ > can be written to do some quick thing. eBPF 4K program limit was > touted as the check and bound. eBPF sounded fine. > This sounds like a huge contradiction. > > cheers, > jamal Hi Jamal, I don't understand why you think this is so controversial. The way I see it (after reading the thread): This is about allowing kernel components to _also_ use the XDP hook. I believe Tom have a valid use-case in ILA. The NVE (Network Virtualization Edge) component very naturally fits in the XDP hook, as it only need to look at the IPv6 address and do a table lookup (in a ILA specific data structure) to see if this packet is for local stack delivery or forward. For forward it does not need take the performance hit of allocating SKBs etc. You can see it as a way to accelerate the NVE component. I can imagine it could be done in approx 10-20 lines of code, as it would use the existing ILA lookup function calls. AFAIK Thomas Graf also see XDP as an acceleration for bpf_cls, but he is lucky because his code is already eBFP based. To support Tom's case, with eBPF, I think we would have to implement specific eBPF helper functions that can do the ILA table lookups. And then need a userspace component to load the eBPF program. Why add all this, when we could contain all this in the kernel, and simply call this as native C-code via the XDP hook? Notice, this is not about throwing eBPF out. Using eBPF is _very_ essential for XDP. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 16-09-20 06:00 PM, Tom Herbert wrote: This patch creates an infrastructure for registering and running code at XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows heavily from the techniques used by netfilter to make generic nfhooks. An XDP hook is defined by the xdp_hook_ops. This structure contains the ops of an XDP hook. A pointer to this structure is passed into the XDP register function to set up a hook. The XDP register function mallocs its own xdp_hook_ops structure and copies the values from the xdp_hook_ops passed in. The register function also stores the pointer value of the xdp_hook_ops argument; this pointer is used in subsequently calls to XDP to identify the registered hook. The interface is defined in net/xdp.h. This includes the definition of xdp_hook_ops, functions to register and unregister hook ops on a device or individual instances of napi, and xdp_hook_run that is called by drivers to run the hooks. Tom, perused the thread and it seems you are serious ;-> Are we heading towards Frankenstein Avenue? The whole point behind letting in XDP is so that _small programs_ can be written to do some quick thing. eBPF 4K program limit was touted as the check and bound. eBPF sounded fine. This sounds like a huge contradiction. cheers, jamal
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Thu, 2016-09-22 at 15:14 +0200, Jesper Dangaard Brouer wrote: > On Wed, 21 Sep 2016 21:56:58 +0200 > Jesper Dangaard Brouer wrote: > > > > > I'm not opposed to running non-BPF code at XDP. I'm against adding > > > > a linked list of hook consumers. > > > > I also worry about the performance impact of a linked list. We should > > simple benchmark it instead of discussing it! ;-) > > (Note, there are some stability issue with this RFC patchset, when > removing the xdp program, that I had to workaround/patch) > > > I've started benchmarking this and I only see added cost of 2.89ns from > these patches, at these crazy speeds it does correspond to -485Kpps. I claim the methodology is too biased. At full speed, all the extra code is hot in caches, and your core has full access to memory bus anyway. Even branch predictor has fresh information. Now, in a mixed workload, where all cores compete to access L2/L3 and RAM, things might be very different. Testing icache/dcache pressure is not a matter of measuring how many Kpps you add or remove on a hot path. A latency test, when other cpus are busy reading/writing all over memory, and your caches are cold, would be useful.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, 21 Sep 2016 21:56:58 +0200 Jesper Dangaard Brouer wrote: > > > I'm not opposed to running non-BPF code at XDP. I'm against adding > > > a linked list of hook consumers. > > I also worry about the performance impact of a linked list. We should > simple benchmark it instead of discussing it! ;-) (Note, there are some stability issue with this RFC patchset, when removing the xdp program, that I had to workaround/patch) I've started benchmarking this and I only see added cost of 2.89ns from these patches, at these crazy speeds it does correspond to -485Kpps. I was really expecting to see a higher cost of this approach. I tested this on two different machines. One was suppose to work with DDIO, but I could not get DDIO working on that machine (result in max 12.7Mpps drop). Even-though the mlx5 card does work with DDIO. Even removed the mlx5 and used same slot but no luck. (A side-note: Also measured a 16ns performance difference between which PCIe slot I'm using). The reason I wanted to benchmark this on a DDIO machine is, that I'm suspecting that the added cost, could be hiding behind the cache miss. Well, I'm running out-of-time benchmarking this stuff, I must prepare for my Network Performance Workshop ;-) (A side-note: my skylake motherboard also had a PCI slot, so I found an old e1000 NIC in my garage, and it worked!) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, 21 Sep 2016 08:08:34 -0700 Tom Herbert wrote: > On Wed, Sep 21, 2016 at 7:48 AM, Thomas Graf wrote: > > On 09/21/16 at 07:19am, Tom Herbert wrote: > >> certain design that because of constraints on one kernel interface. As > >> a kernel developer I want flexibility on how we design and implement > >> things! > > > > Perfectly valid argument. I reviewed your ILA changes and did not > > object to them. > > > > > >> I think there are two questions that this patch set poses for the > >> community wrt XDP: > >> > >> #1: Should we allow alternate code to run in XDP other than BPF? > >> #2: If #1 is true what is the best way to implement that? > >> > >> If the answer to #1 is "no" then the answer to #2 is irrelevant. So > >> with this RFC I'm hoping we can come the agreement on questions #1. I vote yes to #1. > > I'm not opposed to running non-BPF code at XDP. I'm against adding > > a linked list of hook consumers. I also worry about the performance impact of a linked list. We should simple benchmark it instead of discussing it! ;-) > > Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF > > in combination with a potential XDP-nftables? We don't know yet I > > guess. > > > Right. Admittedly, I feel like we owe a bit of reciprocity to > nftables. For ILA we are using the NF_INET_PRE_ROUTING hook with our > own code (looks like ipvlan set nfhooks as well). This works really > well and saves the value of early demux in ILA. Had we not had the > ability to use nfhooks in this fashion it's likely we would have had > to create another hook (we did try putting translation in nftables > rules but that was too inefficient for ILA). Thinking about it, I actually think Tom is proposing a very valid user of the XDP hook, which is the kernel itself. And Tom even have a real first user ILA. The way I read the ILA-RFC-draft[1], the XDP hook would benefit the NVE (Network Virtualization Edge) component, which can run separately or run on the Tenant System, where the latter case could use XDP_PASS. [1] https://www.ietf.org/archive/id/draft-herbert-nvo3-ila-02.txt -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/21/16 at 11:50am, Tom Herbert wrote: > 50 lines in one driver is not a big deal, 50 lines in a hundred > drivers is! I learned this lesson in BQL which was well abstracted out > to be minimally invasive but we still saw many issues because of the > pecularities of different drivers. You want to enable XDP in a hundred drivers? Are you planning to deploy ISA NIC based ILA routers? ;-)
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, 21 Sep 2016 11:50:06 -0700, Tom Herbert wrote: > On Wed, Sep 21, 2016 at 11:45 AM, Jakub Kicinski wrote: > > On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote: > >> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski wrote: > >> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote: > >> >> > - Reduces the amount of code and complexity needed in drivers to > >> >> >manage XDP > >> >> > >> >> hmm: > >> >> 534 insertions(+), 144 deletions(-) > >> >> looks like increase in complexity instead. > >> > > >> > and more to come to tie this with HW offloads. > >> > >> The amount of driver code did decrease with these patches: > >> > >> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 > >> -- > >> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 -- > >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 - > >> > >> Minimizing complexity being added to drivers for XDP is critical since > >> we basically asking every driver to replicate the function. This > >> property also should also apply to HW offloads, the more complexity we > >> can abstract out drivers into a common backend infrastructure the > >> better for supporting across different drivers. > > > > I'm in the middle of writing/testing XDP support for the Netronome's > > driver and generic infra is very much appreciated ;) In my experience > > the 50 lines of code which are required for assigning the programs and > > freeing them aren't really a big deal, though. > > > > 50 lines in one driver is not a big deal, 50 lines in a hundred > drivers is! I learned this lesson in BQL which was well abstracted out > to be minimally invasive but we still saw many issues because of the > pecularities of different drivers. Agreed, I just meant to say that splitting rings and rewritting RX path to behave differently for XDP vs non-XDP case is way more brain consuming than a bit of boilerplate code so if anyone could solve those two it would be much appreciated :) My main point was what I wrote below, though. > > Let's also separate putting xdp_prog in netdevice/napi_struct from the > > generic hook infra. All the simplifications to the driver AFAICS come > > from the former. If everyone is fine with growing napi_struct we can do > > that but IMHO this is not an argument for the generic infra :)
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, Sep 21, 2016 at 11:45 AM, Jakub Kicinski wrote: > On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote: >> On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski wrote: >> > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote: >> >> > - Reduces the amount of code and complexity needed in drivers to >> >> >manage XDP >> >> >> >> hmm: >> >> 534 insertions(+), 144 deletions(-) >> >> looks like increase in complexity instead. >> > >> > and more to come to tie this with HW offloads. >> >> The amount of driver code did decrease with these patches: >> >> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 >> -- >> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 -- >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 - >> >> Minimizing complexity being added to drivers for XDP is critical since >> we basically asking every driver to replicate the function. This >> property also should also apply to HW offloads, the more complexity we >> can abstract out drivers into a common backend infrastructure the >> better for supporting across different drivers. > > I'm in the middle of writing/testing XDP support for the Netronome's > driver and generic infra is very much appreciated ;) In my experience > the 50 lines of code which are required for assigning the programs and > freeing them aren't really a big deal, though. > 50 lines in one driver is not a big deal, 50 lines in a hundred drivers is! I learned this lesson in BQL which was well abstracted out to be minimally invasive but we still saw many issues because of the pecularities of different drivers. > Let's also separate putting xdp_prog in netdevice/napi_struct from the > generic hook infra. All the simplifications to the driver AFAICS come > from the former. If everyone is fine with growing napi_struct we can do > that but IMHO this is not an argument for the generic infra :)
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, 21 Sep 2016 10:39:40 -0700, Tom Herbert wrote: > On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski wrote: > > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote: > >> > - Reduces the amount of code and complexity needed in drivers to > >> >manage XDP > >> > >> hmm: > >> 534 insertions(+), 144 deletions(-) > >> looks like increase in complexity instead. > > > > and more to come to tie this with HW offloads. > > The amount of driver code did decrease with these patches: > > drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 -- > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 -- > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 - > > Minimizing complexity being added to drivers for XDP is critical since > we basically asking every driver to replicate the function. This > property also should also apply to HW offloads, the more complexity we > can abstract out drivers into a common backend infrastructure the > better for supporting across different drivers. I'm in the middle of writing/testing XDP support for the Netronome's driver and generic infra is very much appreciated ;) In my experience the 50 lines of code which are required for assigning the programs and freeing them aren't really a big deal, though. Let's also separate putting xdp_prog in netdevice/napi_struct from the generic hook infra. All the simplifications to the driver AFAICS come from the former. If everyone is fine with growing napi_struct we can do that but IMHO this is not an argument for the generic infra :)
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, Sep 21, 2016 at 10:26 AM, Jakub Kicinski wrote: > On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote: >> > - Reduces the amount of code and complexity needed in drivers to >> >manage XDP >> >> hmm: >> 534 insertions(+), 144 deletions(-) >> looks like increase in complexity instead. > > and more to come to tie this with HW offloads. The amount of driver code did decrease with these patches: drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 64 -- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 25 -- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 - Minimizing complexity being added to drivers for XDP is critical since we basically asking every driver to replicate the function. This property also should also apply to HW offloads, the more complexity we can abstract out drivers into a common backend infrastructure the better for supporting across different drivers. Tom
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, 20 Sep 2016 17:01:39 -0700, Alexei Starovoitov wrote: > > - Reduces the amount of code and complexity needed in drivers to > >manage XDP > > hmm: > 534 insertions(+), 144 deletions(-) > looks like increase in complexity instead. and more to come to tie this with HW offloads.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, 21 Sep 2016 13:55:45 +0200, Thomas Graf wrote: > > I am looking at using this for ILA router. The problem I am hitting is > > that not all packets that we need to translate go through the XDP > > path. Some would go through the kernel path, some through XDP path but > > When you say kernel path, what do you mean specifically? One aspect of > XDP I love is that XDP can act as an acceleration option for existing > BPF programs attached to cls_bpf. Support for direct packet read and > write at clsact level have made it straight forward to write programs > which are compatible or at minimum share a lot of common code. They > can share data structures, lookup functionality, etc. My very humble dream was that XDP would be transparently offloaded from cls_bpf if program was simple enough. That ship has most likely sailed because XDP has different abort behaviour. When possible though, trying to offload higher-level hooks when the rules don't require access to full skb would be really cool.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 9/20/16 11:39 PM, Jesper Dangaard Brouer wrote: We are the early stages of XDP development. Users cannot consider XDP a stable UAPI yet. I added a big fat warning to the docs here[1]. If you already consider this a stable API, then I will suggest that we disable XDP or rip the hole thing out again!!! the doc that you wrote is a great description of your understanding of what XDP is about. It's not an official spec or design document. Until it is reviewed and lands in kernel.org, please do not make any assumption about present or future XDP API based on it.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 9/21/16 7:19 AM, Tom Herbert wrote: #1: Should we allow alternate code to run in XDP other than BPF? separate nft hook - yes generic hook - no since it's one step away from kernel modules abusing this hook. pass/drop/tx of raw buffer at the driver level is a perfect interface to bypass everything in the stack. The tighter we make it the better. If nft and bpf are both not flexible enough to express dataplane functionality we should extend them instead of writing C code or kernel modules. On bpf side we're trying very hard to kill any dream of interoperability with kernel modules. The map and prog type registration is done in a way to make it impossible for kernel modules to register their own map and program types or provide their own helper functions. nfhooks approach is very lax at that and imo it was a mistake, since there are plenty of out of tree modules that using nf hooks and plenty of in-tree modules that are barely maintained. #2: If #1 is true what is the best way to implement that? Add separate nft hook that doesn't interfere in any way with bpf hook at xdp level. The order nft-first or bpf-first or exclusive attach doesn't matter to me. These are details to be discussed.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, Sep 21, 2016 at 7:48 AM, Thomas Graf wrote: > On 09/21/16 at 07:19am, Tom Herbert wrote: >> certain design that because of constraints on one kernel interface. As >> a kernel developer I want flexibility on how we design and implement >> things! > > Perfectly valid argument. I reviewed your ILA changes and did not > object to them. > > >> I think there are two questions that this patch set poses for the >> community wrt XDP: >> >> #1: Should we allow alternate code to run in XDP other than BPF? >> #2: If #1 is true what is the best way to implement that? >> >> If the answer to #1 is "no" then the answer to #2 is irrelevant. So >> with this RFC I'm hoping we can come the agreement on questions #1. > > I'm not opposed to running non-BPF code at XDP. I'm against adding > a linked list of hook consumers. > > Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF > in combination with a potential XDP-nftables? We don't know yet I > guess. > Right. Admittedly, I feel like we owe a bit of reciprocity to nftables. For ILA we are using the NF_INET_PRE_ROUTING hook with our own code (looks like ipvlan set nfhooks as well). This works really well and saves the value of early demux in ILA. Had we not had the ability to use nfhooks in this fashion it's likely we would have had to create another hook (we did try putting translation in nftables rules but that was too inefficient for ILA). > Maybe exclusive access to the hook for one consumer as selected by > the user is good enough. > > If that is not good enough: BPF (and potentially nftables in the > future) could provide means to perform a selection process where a > helper call can run another XDP prog or return a verdict to trigger > another XDP prog. Definitely more flexible and faster than a linear > list doing if, else if, else if, else if, ... It seems reasonable that the the output of one program may be an indication of another program. We've already talked about something like that in regards to splitting BPF programs into device independent program and device dependent program. Tom
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/21/16 at 07:19am, Tom Herbert wrote: > certain design that because of constraints on one kernel interface. As > a kernel developer I want flexibility on how we design and implement > things! Perfectly valid argument. I reviewed your ILA changes and did not object to them. > I think there are two questions that this patch set poses for the > community wrt XDP: > > #1: Should we allow alternate code to run in XDP other than BPF? > #2: If #1 is true what is the best way to implement that? > > If the answer to #1 is "no" then the answer to #2 is irrelevant. So > with this RFC I'm hoping we can come the agreement on questions #1. I'm not opposed to running non-BPF code at XDP. I'm against adding a linked list of hook consumers. Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF in combination with a potential XDP-nftables? We don't know yet I guess. Maybe exclusive access to the hook for one consumer as selected by the user is good enough. If that is not good enough: BPF (and potentially nftables in the future) could provide means to perform a selection process where a helper call can run another XDP prog or return a verdict to trigger another XDP prog. Definitely more flexible and faster than a linear list doing if, else if, else if, else if, ...
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Wed, Sep 21, 2016 at 4:55 AM, Thomas Graf wrote: > On 09/20/16 at 04:59pm, Tom Herbert wrote: >> Well, need to measure to ascertain the cost. As for complexity, this >> actually reduces complexity needed for XDP in the drivers which is a >> good thing because that's where most of the support and development >> pain will be. > > I'm not objecting to anything that simplifies the process of adding > XDP capability to drivers. You have my full support here. > >> I am looking at using this for ILA router. The problem I am hitting is >> that not all packets that we need to translate go through the XDP >> path. Some would go through the kernel path, some through XDP path but > > When you say kernel path, what do you mean specifically? One aspect of > XDP I love is that XDP can act as an acceleration option for existing > BPF programs attached to cls_bpf. Support for direct packet read and > write at clsact level have made it straight forward to write programs > which are compatible or at minimum share a lot of common code. They > can share data structures, lookup functionality, etc. > >> We can optimize for allowing only one hook, or maybe limit to only >> allowing one hook to be set. In any case this obviously requires a lot >> of performance evaluation, I am hoping to feedback on the design >> first. My question about using a linear list for this was real, do you >> know a better method off hand to implement a call list? > > My main concern is that we overload the XDP hook. Instead of making use > of the programmable glue, we put a linked list in front where everybody > can attach a program to. > > A possible alternative: > 1. The XDP hook always has single consumer controlled by the user > through Netlink, BPF is one of them. If a user wants to hardcode > the ILA router to that hook, he can do that. > > 2. BPF for XDP is extended to allow returning a verdict which results > in something else to be invoked. If user wants to invoke the ILA > router for just some packets, he can do that. > > That said, I see so much value in a BPF implementation of ILA at XDP > level with all of the parsing logic and exact semantics remain > flexible without the cost of translating some configuration to a set > of actions. Thomas, As I mentioned though, not all packets we need to translate are going to go through XDP. ILA already integrates with the routing table via LWT and that solution works incredibly well especially when we are sourcing connections (ILA is by far the fastest most efficient technique of network virtualization). We already have a lookup table and the translation process coded in the kernel and configuration is already implemented by netlink and iproute. So I'm not yet sure I want to redesign all of this around BPF, maybe that is the right answer maybe it isn't; but, my point is I don't want to be _forced_ into a certain design that because of constraints on one kernel interface. As a kernel developer I want flexibility on how we design and implement things! I think there are two questions that this patch set poses for the community wrt XDP: #1: Should we allow alternate code to run in XDP other than BPF? #2: If #1 is true what is the best way to implement that? If the answer to #1 is "no" then the answer to #2 is irrelevant. So with this RFC I'm hoping we can come the agreement on questions #1. IMO, there are at least three factors that would motivate the answer to #1 be yes 1) There is precedence in nfhooks in creating generic hooks in the critical data path that can have other uses than those originally intended by the authors. I would take it as general principle that hooks in the data path should be as generic as possible to get the most utility. 2) The changes to make XDP work in drivers are quite invasive and as we've seen in at least one attempt to modify a driver to support XDP it is easy to break other mechanisms. If we're going to be modifying a lot of drivers we really want to get the most value for that work. 3) The XDP interface is well designed and really has no dependencies on BPF. Writing code to directly parse packets is not rocket science. So it seems reasonable to me that a subsystem may want to use XDP interface to accelerate existing functionality without introducing additional dependencies on BPF or other non-kernel mechanisms In any case, I ask everyone to be open minded. If there is agreement that BPF is sufficient for all XDP use cases then I wont' pursue these patches. But given the ramifications of XDP, especially the considering the changes required across drivers to support it, I really would like to get input from the community on this. Thanks, Tom
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/20/16 at 04:59pm, Tom Herbert wrote: > Well, need to measure to ascertain the cost. As for complexity, this > actually reduces complexity needed for XDP in the drivers which is a > good thing because that's where most of the support and development > pain will be. I'm not objecting to anything that simplifies the process of adding XDP capability to drivers. You have my full support here. > I am looking at using this for ILA router. The problem I am hitting is > that not all packets that we need to translate go through the XDP > path. Some would go through the kernel path, some through XDP path but When you say kernel path, what do you mean specifically? One aspect of XDP I love is that XDP can act as an acceleration option for existing BPF programs attached to cls_bpf. Support for direct packet read and write at clsact level have made it straight forward to write programs which are compatible or at minimum share a lot of common code. They can share data structures, lookup functionality, etc. > We can optimize for allowing only one hook, or maybe limit to only > allowing one hook to be set. In any case this obviously requires a lot > of performance evaluation, I am hoping to feedback on the design > first. My question about using a linear list for this was real, do you > know a better method off hand to implement a call list? My main concern is that we overload the XDP hook. Instead of making use of the programmable glue, we put a linked list in front where everybody can attach a program to. A possible alternative: 1. The XDP hook always has single consumer controlled by the user through Netlink, BPF is one of them. If a user wants to hardcode the ILA router to that hook, he can do that. 2. BPF for XDP is extended to allow returning a verdict which results in something else to be invoked. If user wants to invoke the ILA router for just some packets, he can do that. That said, I see so much value in a BPF implementation of ILA at XDP level with all of the parsing logic and exact semantics remain flexible without the cost of translating some configuration to a set of actions.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/21/2016 08:39 AM, Jesper Dangaard Brouer wrote: On Tue, 20 Sep 2016 17:01:39 -0700 Alexei Starovoitov wrote: > - Provides a more structured environment that is extensible to new >features while being mostly transparent to the drivers don't see that in these patches either. Things like packet size change (that we're working on) still has to be implemented for every driver. Existing XDP_TX, XDP_DROP have to be implemented per driver as well. Also introduction of xdp.h breaks existing UAPI. That's not acceptable either. This response piss me off!@@#$ We are the early stages of XDP development. Users cannot consider XDP a stable UAPI yet. I added a big fat warning to the docs here[1]. If you already consider this a stable API, then I will suggest that we disable XDP or rip the hole thing out again!!! Create a separate tree where we can cooperate on getting this right, before forcing this upstream. I have raise concern about this several times upstream, but the patches got applied anyway, because you Alexei, promised this was super extendable and we could still change the APIs later. Maybe you tricked me?! I've started to look at the details, and I'm not happy with the extensibility. I don't quite follow you here, Jesper. If you're talking about uapi-related extensibility wrt struct xdp_md, then it's done the same way as done for tc with 'shadow' struct __sk_buff. The concept of having this internal BPF insn rewriter is working quite well for this, and it is extendable with new meta data. Wrt return codes we're flexible to add new ones once agreed upon. The whole XDP config is done via netlink, nested in IFLA_XDP container, so it can be extended in future with other attrs, flags, etc, for setup and dumping. The 'breakage' that was mentioned here was related to moving things into xdp.h. It can be done and we did it similarly in the past with bpf_common.h, but then really also bpf.h needs to include the new xdp.h header.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, 20 Sep 2016 17:01:39 -0700 Alexei Starovoitov wrote: > > - Provides a more structured environment that is extensible to new > >features while being mostly transparent to the drivers > > don't see that in these patches either. > Things like packet size change (that we're working on) still > has to be implemented for every driver. > Existing XDP_TX, XDP_DROP have to be implemented per driver as well. > > Also introduction of xdp.h breaks existing UAPI. > That's not acceptable either. This response piss me off!@@#$ We are the early stages of XDP development. Users cannot consider XDP a stable UAPI yet. I added a big fat warning to the docs here[1]. If you already consider this a stable API, then I will suggest that we disable XDP or rip the hole thing out again!!! Create a separate tree where we can cooperate on getting this right, before forcing this upstream. I have raise concern about this several times upstream, but the patches got applied anyway, because you Alexei, promised this was super extendable and we could still change the APIs later. Maybe you tricked me?! I've started to look at the details, and I'm not happy with the extensibility. And I'm also not happy with Brenden's response to my API concerns that this is just a "fire-and-forget" API. Most importantly, the XDP interface for feature or capabilities negotiation is missing. Documented here[2]. I strongly believe the two actions XDP_DROP and XDP_TX should have been express as two different capabilities, because XDP_TX requires more changes to the device driver than a simple drop like XDP_DROP. One can easily imagine (after the e1000 discussion) that an older driver only want to implement the XDP_DROP facility. The reason is that XDP_TX would requires changing too much driver code, which is a concern for an old stable and time-proven driver. Thus, the need for negotiating features is already a practical problem! [1] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/implementation/userspace_api.html Quote: "Warning: The userspace API specification should have to be defined properly before code was accepted upstream. Concerns have been raise about the current API upstream. Users should expect this first API attempt will need adjustments. This cannot be considered a stable API yet." [2] https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/design.html#capabilities-negotiation -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 9/20/16 4:59 PM, Tom Herbert wrote: I am looking at using this for ILA router. The problem I am hitting is that not all packets that we need to translate go through the XDP path. Some would go through the kernel path, some through XDP path but that would mean I need parallel lookup tables to be maintained for the two paths which won't scale. ILA translation is so trivial and not really something that we need to be user programmable, the fast path is really for accelerating an existing kernel capability. If I can reuse the kernel code already written and the existing kernel data structures to make a fast path in XDP there is a lot of value in that for me. sounds like you want to add hard coded ILA rewriter to the driver instead of doing it as BPF program?! That is 180 degree turn vs the whole protocol ossification tune that I thought you strongly believe in. What kernel data structures do you want to reuse? ILA rewriter needs single hash lookup. Several different types of hash maps exist on bpf side already and even more are coming that will be usable by both tc and xdp side. csum adjustment? we have them for tc. Not for xdp yet, but it's trivial to allow them on xdp side too. May be we should talk about real motivation for the patches and see what is the best solution.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 9/20/16 3:00 PM, Tom Herbert wrote: +static inline int __xdp_hook_run(struct list_head *list_head, +struct xdp_buff *xdp) +{ + struct xdp_hook_ops *elem; + int ret = XDP_PASS; + + list_for_each_entry(elem, list_head, list) { + ret = elem->hook(elem->priv, xdp); + if (ret != XDP_PASS) + break; + } + + return ret; +} + +/* Run the XDP hooks for a napi device. Called from a driver's receive + * routine + */ +static inline int xdp_hook_run(struct napi_struct *napi, struct xdp_buff *xdp) +{ + struct net_device *dev = napi->dev; + int ret = XDP_PASS; + + if (static_branch_unlikely(&xdp_hooks_needed)) { + /* Run hooks in napi first */ + ret = __xdp_hook_run(&napi->xdp_hook_list, xdp); + if (ret != XDP_PASS) + return ret; + + /* Now run device hooks */ + ret = __xdp_hook_run(&dev->xdp_hook_list, xdp); + if (ret != XDP_PASS) + return ret; + } + + return ret; +} it's an interesting idea to move prog pointer into napi struct, but certainly not at such huge cost. Right now it's 1 load + 1 cmp + 1 indirect jump per packet to invoke the program, with above approach it becomes 6 loads + 3 cmp (just to get through run_needed_check() check) + 6 loads + 3 cmp + 2 indirect jumps. (I may be little bit off +- few loads) That is a non-starter. When we were optimizing receive path of tc clast ingress hook we saw 1Mpps saved for every load+cmp+indirect jump removed. We're working on inlining of bpf_map_lookup to save one indirect call per lookup, we cannot just waste them here. We need to save cycles instead, especially when it doesn't really solve your goals. It seems the goals are: >- Allows alternative users of the XDP hooks other than the original >BPF this should be achieved by their own hooks while reusing return codes XDP_TX, XDP_PASS to keep driver side the same. I'm not against other packet processing engines, but not at the cost of lower performance. > - Allows a means to pipeline XDP programs together this can be done already via bpf_tail_call. No changes needed. > - Reduces the amount of code and complexity needed in drivers to >manage XDP hmm: 534 insertions(+), 144 deletions(-) looks like increase in complexity instead. > - Provides a more structured environment that is extensible to new >features while being mostly transparent to the drivers don't see that in these patches either. Things like packet size change (that we're working on) still has to be implemented for every driver. Existing XDP_TX, XDP_DROP have to be implemented per driver as well. Also introduction of xdp.h breaks existing UAPI. That's not acceptable either.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, Sep 20, 2016 at 4:43 PM, Thomas Graf wrote: > On 09/20/16 at 04:18pm, Tom Herbert wrote: >> This allows other use cases than BPF inserting code into the data >> path. This gives XDP potential more utility and more users so that we >> can motivate more driver implementations. For instance, I thinks it's >> totally reasonable if the nftables guys want to insert some of their >> rules to perform early DDOS drop to get the same performance that we >> see in XDP. > > Reasonable point with nftables but are any of these users on the table > already and ready to consume non-skbs? It would be a pity to add this > complexity and cost if it is never used. > Well, need to measure to ascertain the cost. As for complexity, this actually reduces complexity needed for XDP in the drivers which is a good thing because that's where most of the support and development pain will be. I am looking at using this for ILA router. The problem I am hitting is that not all packets that we need to translate go through the XDP path. Some would go through the kernel path, some through XDP path but that would mean I need parallel lookup tables to be maintained for the two paths which won't scale. ILA translation is so trivial and not really something that we need to be user programmable, the fast path is really for accelerating an existing kernel capability. If I can reuse the kernel code already written and the existing kernel data structures to make a fast path in XDP there is a lot of value in that for me. > I don't see how we can ensure performance if we have multiple > subsystems register for the hook each adding their own parsers which > need to be passed through sequentially. Maybe I'm missing something. We can optimize for allowing only one hook, or maybe limit to only allowing one hook to be set. In any case this obviously requires a lot of performance evaluation, I am hoping to feedback on the design first. My question about using a linear list for this was real, do you know a better method off hand to implement a call list? Thanks, Tom
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/20/16 at 04:18pm, Tom Herbert wrote: > This allows other use cases than BPF inserting code into the data > path. This gives XDP potential more utility and more users so that we > can motivate more driver implementations. For instance, I thinks it's > totally reasonable if the nftables guys want to insert some of their > rules to perform early DDOS drop to get the same performance that we > see in XDP. Reasonable point with nftables but are any of these users on the table already and ready to consume non-skbs? It would be a pity to add this complexity and cost if it is never used. I don't see how we can ensure performance if we have multiple subsystems register for the hook each adding their own parsers which need to be passed through sequentially. Maybe I'm missing something.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/21/2016 01:09 AM, Thomas Graf wrote: On 09/20/16 at 03:49pm, Tom Herbert wrote: On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf wrote: On 09/20/16 at 03:00pm, Tom Herbert wrote: +static inline int __xdp_hook_run(struct list_head *list_head, + struct xdp_buff *xdp) +{ + struct xdp_hook_ops *elem; + int ret = XDP_PASS; + + list_for_each_entry(elem, list_head, list) { + ret = elem->hook(elem->priv, xdp); + if (ret != XDP_PASS) + break; + } Walking over a linear list? Really? :-) I thought this was supposed to be fast, no compromises made. Can you suggest an alternative? Single BPF program that encodes whatever logic is required. This is what BPF is for. If it absolutely has to run two programs in sequence then it can still do that even though I really don't see much of a point of doing that in a high performance environment. Agreed, if there's one thing I would change in cls_bpf, then it's getting rid of the (uapi unfortunately) list of classifiers and just make it a single one, because that's all that is needed, and chaining/pipelining can be done via tail calls for example. This whole list + callback likely also makes things slower. Why not let more drivers come first to support the current xdp model we have, and then we can move common parts to the driver-independent core code? I'm not even sure yet I understand full purpose of this yet ;-)
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, Sep 20, 2016 at 4:09 PM, Thomas Graf wrote: > On 09/20/16 at 03:49pm, Tom Herbert wrote: >> On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf wrote: >> > On 09/20/16 at 03:00pm, Tom Herbert wrote: >> >> +static inline int __xdp_hook_run(struct list_head *list_head, >> >> + struct xdp_buff *xdp) >> >> +{ >> >> + struct xdp_hook_ops *elem; >> >> + int ret = XDP_PASS; >> >> + >> >> + list_for_each_entry(elem, list_head, list) { >> >> + ret = elem->hook(elem->priv, xdp); >> >> + if (ret != XDP_PASS) >> >> + break; >> >> + } >> > >> > Walking over a linear list? Really? :-) I thought this was supposed >> > to be fast, no compromises made. >> >> Can you suggest an alternative? > > Single BPF program that encodes whatever logic is required. This is > what BPF is for. If it absolutely has to run two programs in sequence > then it can still do that even though I really don't see much of a > point of doing that in a high performance environment. > > I'm not even sure yet I understand full purpose of this yet ;-) This allows other use cases than BPF inserting code into the data path. This gives XDP potential more utility and more users so that we can motivate more driver implementations. For instance, I thinks it's totally reasonable if the nftables guys want to insert some of their rules to perform early DDOS drop to get the same performance that we see in XDP. Tom
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/20/16 at 03:49pm, Tom Herbert wrote: > On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf wrote: > > On 09/20/16 at 03:00pm, Tom Herbert wrote: > >> +static inline int __xdp_hook_run(struct list_head *list_head, > >> + struct xdp_buff *xdp) > >> +{ > >> + struct xdp_hook_ops *elem; > >> + int ret = XDP_PASS; > >> + > >> + list_for_each_entry(elem, list_head, list) { > >> + ret = elem->hook(elem->priv, xdp); > >> + if (ret != XDP_PASS) > >> + break; > >> + } > > > > Walking over a linear list? Really? :-) I thought this was supposed > > to be fast, no compromises made. > > Can you suggest an alternative? Single BPF program that encodes whatever logic is required. This is what BPF is for. If it absolutely has to run two programs in sequence then it can still do that even though I really don't see much of a point of doing that in a high performance environment. I'm not even sure yet I understand full purpose of this yet ;-)
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, Sep 20, 2016 at 3:44 PM, Thomas Graf wrote: > On 09/20/16 at 03:00pm, Tom Herbert wrote: >> +static inline int __xdp_hook_run(struct list_head *list_head, >> + struct xdp_buff *xdp) >> +{ >> + struct xdp_hook_ops *elem; >> + int ret = XDP_PASS; >> + >> + list_for_each_entry(elem, list_head, list) { >> + ret = elem->hook(elem->priv, xdp); >> + if (ret != XDP_PASS) >> + break; >> + } > > Walking over a linear list? Really? :-) I thought this was supposed > to be fast, no compromises made. Can you suggest an alternative?
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On 09/20/16 at 03:00pm, Tom Herbert wrote: > +static inline int __xdp_hook_run(struct list_head *list_head, > + struct xdp_buff *xdp) > +{ > + struct xdp_hook_ops *elem; > + int ret = XDP_PASS; > + > + list_for_each_entry(elem, list_head, list) { > + ret = elem->hook(elem->priv, xdp); > + if (ret != XDP_PASS) > + break; > + } Walking over a linear list? Really? :-) I thought this was supposed to be fast, no compromises made.
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, Sep 20, 2016 at 3:37 PM, Eric Dumazet wrote: > On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote: > >> diff --git a/net/core/xdp.c b/net/core/xdp.c >> new file mode 100644 >> index 000..815ead8 >> --- /dev/null >> +++ b/net/core/xdp.c >> @@ -0,0 +1,211 @@ >> +/* >> + * Kernel Connection Multiplexor >> + * >> + * Copyright (c) 2016 Tom Herbert >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 >> + * as published by the Free Software Foundation. >> + */ > > > Too much copy/paste Tom :) > If this is you're only complaint Eric I'll be quite impressed ;-) >
Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
On Tue, 2016-09-20 at 15:00 -0700, Tom Herbert wrote: > diff --git a/net/core/xdp.c b/net/core/xdp.c > new file mode 100644 > index 000..815ead8 > --- /dev/null > +++ b/net/core/xdp.c > @@ -0,0 +1,211 @@ > +/* > + * Kernel Connection Multiplexor > + * > + * Copyright (c) 2016 Tom Herbert > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 > + * as published by the Free Software Foundation. > + */ Too much copy/paste Tom :)
[PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
This patch creates an infrastructure for registering and running code at XDP hooks in drivers. This is based on the orignal XDP?BPF and borrows heavily from the techniques used by netfilter to make generic nfhooks. An XDP hook is defined by the xdp_hook_ops. This structure contains the ops of an XDP hook. A pointer to this structure is passed into the XDP register function to set up a hook. The XDP register function mallocs its own xdp_hook_ops structure and copies the values from the xdp_hook_ops passed in. The register function also stores the pointer value of the xdp_hook_ops argument; this pointer is used in subsequently calls to XDP to identify the registered hook. The interface is defined in net/xdp.h. This includes the definition of xdp_hook_ops, functions to register and unregister hook ops on a device or individual instances of napi, and xdp_hook_run that is called by drivers to run the hooks. Signed-off-by: Tom Herbert --- include/linux/filter.h | 6 +- include/linux/netdev_features.h | 3 +- include/linux/netdevice.h | 11 ++ include/net/xdp.h | 218 include/uapi/linux/bpf.h| 20 include/uapi/linux/xdp.h| 24 + net/core/Makefile | 2 +- net/core/dev.c | 4 + net/core/xdp.c | 211 ++ 9 files changed, 472 insertions(+), 27 deletions(-) create mode 100644 include/net/xdp.h create mode 100644 include/uapi/linux/xdp.h create mode 100644 net/core/xdp.c diff --git a/include/linux/filter.h b/include/linux/filter.h index 1f09c52..2a26133 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -16,6 +16,7 @@ #include #include +#include #include @@ -432,11 +433,6 @@ struct bpf_skb_data_end { void *data_end; }; -struct xdp_buff { - void *data; - void *data_end; -}; - /* compute the linear packet data range [data, data_end) which * will be accessed by cls_bpf and act_bpf programs */ diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 9c6c8ef..697fdea 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -72,8 +72,8 @@ enum { NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */ NETIF_F_HW_L2FW_DOFFLOAD_BIT, /* Allow L2 Forwarding in Hardware */ NETIF_F_BUSY_POLL_BIT, /* Busy poll */ - NETIF_F_HW_TC_BIT, /* Offload TC infrastructure */ + NETIF_F_XDP_BIT,/* Support XDP interface */ /* * Add your fresh new feature above and remember to update @@ -136,6 +136,7 @@ enum { #define NETIF_F_HW_L2FW_DOFFLOAD __NETIF_F(HW_L2FW_DOFFLOAD) #define NETIF_F_BUSY_POLL __NETIF_F(BUSY_POLL) #define NETIF_F_HW_TC __NETIF_F(HW_TC) +#define NETIF_F_XDP__NETIF_F(XDP) #define for_each_netdev_feature(mask_addr, bit)\ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a10d8d1..f2b7d1b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -324,6 +324,7 @@ struct napi_struct { struct sk_buff *skb; struct hrtimer timer; struct list_headdev_list; + struct list_headxdp_hook_list; struct hlist_node napi_hash_node; unsigned intnapi_id; }; @@ -819,6 +820,14 @@ enum xdp_netdev_command { * return true if a program is currently attached and running. */ XDP_QUERY_PROG, + /* Initialize XDP in the device. Called the first time an XDP hook +* hook is being set on the device. +*/ + XDP_DEV_INIT, + /* XDP is finished on the device. Called after the last XDP hook +* has been removed from a device. +*/ + XDP_DEV_FINISH, }; struct netdev_xdp { @@ -1663,6 +1672,8 @@ struct net_device { struct list_headclose_list; struct list_headptype_all; struct list_headptype_specific; + struct list_headxdp_hook_list; + unsigned intxdp_hook_cnt; struct { struct list_head upper; diff --git a/include/net/xdp.h b/include/net/xdp.h new file mode 100644 index 000..c01a44e --- /dev/null +++ b/include/net/xdp.h @@ -0,0 +1,218 @@ +/* + * eXpress Data Path (XDP) + * + * Copyright (c) 2016 Tom Herbert + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + */ + +#ifndef __NET_XDP_H_ +#define __NET_XDP_H_ + +#include +#include +#include + +/* XDP data structure. + * + * Fields: + * data - pointer to first byte of data + * data_end - pointer to last byte + *