Re: [iovisor-dev] Best userspace programming API for XDP features query to kernel?

2018-04-05 Thread Jakub Kicinski via iovisor-dev
On Thu, 5 Apr 2018 22:51:33 +0200, Jesper Dangaard Brouer wrote:
> > What about nfp in terms of XDP
> > offload capabilities, should they be included as well or is probing to load
> > the program and see if it loads/JITs as we do today just fine (e.g. you'd
> > otherwise end up with extra flags on a per BPF helper basis)?  
> 
> No, flags per BPF helper basis. As I've described above, helper belong
> to the BPF core, not the driver.  Here I want to know what the specific
> driver support.

I think Daniel meant for nfp offload.  The offload restrictions are
quite involved, are we going to be able to express those?

This is a bit simpler but reminds me of the TC flower capability
discussion.  Expressing features and capabilities gets messy quickly.

I have a gut feeling that a good starting point would be defining and
building a test suite or a set of probing tests to check things work at
system level (incl. redirects to different ports etc.)  I think having
a concrete set of litmus tests that confirm the meaning of a given
feature/capability would go a long way in making people more comfortable
with accepting any form of BPF driver capability.  And serious BPF
projects already do probing so it's just centralizing this in the
kernel.

That's my two cents.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH RFC 0/4] Initial 32-bit eBPF encoding support

2017-09-23 Thread Jakub Kicinski via iovisor-dev
On Fri, 22 Sep 2017 22:03:47 -0700, Yonghong Song wrote:
> On 9/22/17 9:24 AM, Jakub Kicinski wrote:
> > On Thu, 21 Sep 2017 11:56:55 -0700, Alexei Starovoitov wrote:  
> >> On Wed, Sep 20, 2017 at 12:20:40AM +0100, Jiong Wang via iovisor-dev 
> >> wrote:  
> >>> On 18/09/2017 22:29, Daniel Borkmann wrote:  
>  On 09/18/2017 10:47 PM, Jiong Wang wrote:  
> > Hi,
> >
> >     Currently, LLVM eBPF backend always generate code in 64-bit mode,
> > this may
> > cause troubles when JITing to 32-bit targets.
> >
> >     For example, it is quite common for XDP eBPF program to access
> > some packet
> > fields through base + offset that the default eBPF will generate
> > BPF_ALU64 for
> > the address formation, later when JITing to 32-bit hardware,
> > BPF_ALU64 needs
> > to be expanded into 32 bit ALU sequences even though the address
> > space is
> > 32-bit that the high bits is not significant.
> >
> >     While a complete 32-bit mode implemention may need an new ABI
> > (something like
> > -target-abi=ilp32), this patch set first add some initial code so we
> > could
> > construct 32-bit eBPF tests through hand-written assembly.
> >
> >     A new 32-bit register set is introduced, its name is with "w"
> > prefix and LLVM
> > assembler will encode statements like "w1 += w2" into the following
> > 8-bit code
> > field:
> >
> >   BPF_ADD | BPF_X | BPF_ALU
> >
> > BPF_ALU will be used instead of BPF_ALU64.
> >
> >     NOTE, currently you can only use "w" register with ALU
> > statements, not with
> > others like branches etc as they don't have different encoding for
> > 32-bit
> > target.  
> 
>  Great to see work in this direction! Can we also enable to use / emit
>  all the 32bit BPF_ALU instructions whenever possible for the currently
>  available bpf targets while at it (which only use BPF_ALU64 right now)?  
> >>>
> >>> Hi Daniel,
> >>>
> >>>     Thanks for the feedback.
> >>>
> >>>     I think we could also enable the use of all the 32bit BPF_ALU under
> >>> currently
> >>> available bpf targets.  As we now have 32bit register set support, we 
> >>> could
> >>> make
> >>> i32 type as legal type to prevent it be promoted into i64, then hook it up
> >>> with i32
> >>> ALU patterns, will look into this.  
> >>
> >> I don't think we need to gate 32bit alu generation with a flag.
> >> Though interpreter and JITs support 32-bit since day one, the verifier
> >> never seen such programs before, so some valid programs may get
> >> rejected. After some time passes and we're sure that all progs
> >> still work fine when they're optimized with 32-bit alu, we can flip
> >> the switch in llvm and make it default.  
> > 
> > Thinking about next steps - do we expect the 32b operations to clear the
> > upper halves of the registers?  The interpreter does it, and so does
> > x86.  I don't think we can load 32bit-only programs on 64bit hosts, so
> > we would need some form of data flow analysis in the kernel to prune
> > the zeroing for 32bit offload targets.  Is that correct?  
> 
> Could you contrive an example to show the problem? If I understand 
> correctly, you most worried that some natural sign extension is gone
> with "clearing the upper 32-bit register" and such clearing may make
> some operation, esp. memory operation not correct in 64-bit machine?

Hm.  Perhaps it's a blunder on my side, but let's take:

  r1 = ~0ULL
  w1 = 0
  # use r1

on x86 and the interpreter, the w1 = 0 will clear upper 32bits, so r1
ends up as 0.  32b arches may translate this to something like:

  # r1 = ~0ULL
  r1.lo = ~0
  r1.hi = ~0
  # w1 = 0
  r1.lo = 0
  # r1.hi not touched

which will obviously result in r1 == 0x.  LLVM should
not assume r1.hi is cleared, but I'm not sure this is a strong enough
argument.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [oss-drivers] Re: [PATCH RFC 0/4] Initial 32-bit eBPF encoding support

2017-09-22 Thread Jakub Kicinski via iovisor-dev
On Thu, 21 Sep 2017 11:56:55 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 20, 2017 at 12:20:40AM +0100, Jiong Wang via iovisor-dev wrote:
> > On 18/09/2017 22:29, Daniel Borkmann wrote:  
> > > On 09/18/2017 10:47 PM, Jiong Wang wrote:  
> > > > Hi,
> > > > 
> > > >    Currently, LLVM eBPF backend always generate code in 64-bit mode,
> > > > this may
> > > > cause troubles when JITing to 32-bit targets.
> > > > 
> > > >    For example, it is quite common for XDP eBPF program to access
> > > > some packet
> > > > fields through base + offset that the default eBPF will generate
> > > > BPF_ALU64 for
> > > > the address formation, later when JITing to 32-bit hardware,
> > > > BPF_ALU64 needs
> > > > to be expanded into 32 bit ALU sequences even though the address
> > > > space is
> > > > 32-bit that the high bits is not significant.
> > > > 
> > > >    While a complete 32-bit mode implemention may need an new ABI
> > > > (something like
> > > > -target-abi=ilp32), this patch set first add some initial code so we
> > > > could
> > > > construct 32-bit eBPF tests through hand-written assembly.
> > > > 
> > > >    A new 32-bit register set is introduced, its name is with "w"
> > > > prefix and LLVM
> > > > assembler will encode statements like "w1 += w2" into the following
> > > > 8-bit code
> > > > field:
> > > > 
> > > >  BPF_ADD | BPF_X | BPF_ALU
> > > > 
> > > > BPF_ALU will be used instead of BPF_ALU64.
> > > > 
> > > >    NOTE, currently you can only use "w" register with ALU
> > > > statements, not with
> > > > others like branches etc as they don't have different encoding for
> > > > 32-bit
> > > > target.  
> > > 
> > > Great to see work in this direction! Can we also enable to use / emit
> > > all the 32bit BPF_ALU instructions whenever possible for the currently
> > > available bpf targets while at it (which only use BPF_ALU64 right now)?  
> > 
> > Hi Daniel,
> > 
> >    Thanks for the feedback.
> > 
> >    I think we could also enable the use of all the 32bit BPF_ALU under
> > currently
> > available bpf targets.  As we now have 32bit register set support, we could
> > make
> > i32 type as legal type to prevent it be promoted into i64, then hook it up
> > with i32
> > ALU patterns, will look into this.  
> 
> I don't think we need to gate 32bit alu generation with a flag.
> Though interpreter and JITs support 32-bit since day one, the verifier
> never seen such programs before, so some valid programs may get
> rejected. After some time passes and we're sure that all progs
> still work fine when they're optimized with 32-bit alu, we can flip
> the switch in llvm and make it default.

Thinking about next steps - do we expect the 32b operations to clear the
upper halves of the registers?  The interpreter does it, and so does
x86.  I don't think we can load 32bit-only programs on 64bit hosts, so
we would need some form of data flow analysis in the kernel to prune
the zeroing for 32bit offload targets.  Is that correct?
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH v3 net-next 03/12] nfp: change bpf verifier hooks to match new verifier data structures

2017-06-28 Thread Jakub Kicinski via iovisor-dev
On Tue, 27 Jun 2017 13:57:34 +0100, Edward Cree wrote:
> Signed-off-by: Edward Cree 

Acked-by: Jakub Kicinski 

Sorry about the delay.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-12 Thread Jakub Kicinski via iovisor-dev
On Tue, 12 Jul 2016 12:13:01 -0700, John Fastabend wrote:
> On 16-07-11 07:24 PM, Alexei Starovoitov wrote:
> > On Sat, Jul 09, 2016 at 01:27:26PM +0200, Jesper Dangaard Brouer wrote:  
> >> On Fri, 8 Jul 2016 18:51:07 +0100
> >> Jakub Kicinski  wrote:
> >>  
> >>> On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:  
>  The only distinction between VFs and queue groupings on my side is VFs
>  provide RSS where as queue groupings have to be selected explicitly.
>  In a programmable NIC world the distinction might be lost if a "RSS"
>  program can be loaded into the NIC to select queues but for existing
>  hardware the distinction is there.
> >>>
> >>> To do BPF RSS we need a way to select the queue which I think is all
> >>> Jesper wanted.  So we will have to tackle the queue selection at some
> >>> point.  The main obstacle with it for me is to define what queue
> >>> selection means when program is not offloaded to HW...  Implementing
> >>> queue selection on HW side is trivial.  
> >>
> >> Yes, I do see the problem of fallback, when the programs "filter" demux
> >> cannot be offloaded to hardware.
> >>
> >> First I though it was a good idea to keep the "demux-filter" part of
> >> the eBPF program, as software fallback can still apply this filter in
> >> SW, and just mark the packets as not-zero-copy-safe.  But when HW
> >> offloading is not possible, then packets can be delivered every RX
> >> queue, and SW would need to handle that, which hard to keep transparent.
> >>
> >>  
>  If you demux using a eBPF program or via a filter model like
>  flow_director or cls_{u32|flower} I think we can support both. And this
>  just depends on the programmability of the hardware. Note flow_director
>  and cls_{u32|flower} steering to VFs is already in place.
> >>
> >> Maybe we should keep HW demuxing as a separate setup step.
> >>
> >> Today I can almost do what I want: by setting up ntuple filters, and (if
> >> Alexei allows it) assign an application specific XDP eBPF program to a
> >> specific RX queue.
> >>
> >>  ethtool -K eth2 ntuple on
> >>  ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
> >>
> >> Then the XDP program can be attached to RX queue 42, and
> >> promise/guarantee that it will consume all packet.  And then the
> >> backing page-pool can allow zero-copy RX (and enable scrubbing when
> >> refilling pool).  
> > 
> > so such ntuple rule will send udp4 traffic for specific ip and port
> > into a queue then it will somehow gets zero-copied to vm?
> > . looks like a lot of other pieces about zero-copy and qemu need to be
> > implemented (or at least architected) for this scheme to be conceivable
> > . and when all that happens what vm is going to do with this very specific
> > traffic? vm won't have any tcp or even ping?  
> 
> I have perhaps a different motivation to have queue steering in 'tc
> cls-u32' and eventually xdp. The general idea is I have thousands of
> queues and I can bind applications to the queues. When I know an
> application is bound to a queue I can enable per queue busy polling (to
> be implemented), set specific interrupt rates on the queue
> (implementation will be posted soon), bind the queue to the correct
> cpu, etc.
> 
> ntuple works OK for this now but xdp provides more flexibility and
> also lets us add additional policy on the queue other than simply
> queue steering.
> 
> I'm not convinced though that the demux queue selection should be part
> of the XDP program itself just because it has no software analog to me
> it sits in front of the set of XDP programs. 

Yes, although if we expect XDP to be target of offloading efforts
putting the demux here doesn't seem like an entirely bad idea.  We
could say demux is just an API that more capable drivers/HW can
implement.

> But I think I could perhaps
> be convinced it does if there is some reasonable way to do it. I guess
> the single program method would result in an XDP program that read like
> 
>   if (rx_queue == x)
>do_foo
>   if (rx_queue == y)
>do_bar
> 
> A hardware jit may be able to sort that out.

+1  
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-08 Thread Jakub Kicinski via iovisor-dev
On Thu, 7 Jul 2016 19:22:12 -0700, Alexei Starovoitov wrote:
> > If the goal is to just separate XDP traffic from non-XDP traffic you could 
> > accomplish this with a combination of SR-IOV/macvlan to separate the device 
> > queues into multiple netdevs and then run XDP on just one of the netdevs. 
> > Then use flow director (ethtool) or 'tc cls_u32/flower' to steer traffic to 
> > the netdev. This is how we support multiple networking stacks on one device 
> > by the way it is called the bifurcated driver. Its not too far of a stretch 
> > to think we could offload some simple XDP programs to program the splitting 
> > of traffic instead of cls_u32/flower/flow_director and then you would have 
> > a stack of XDP programs. One running in hardware and a set running on the 
> > queues in software.  
> 
> the above sounds like much better approach then Jesper/mine prog_per_ring 
> stuff.
> If we can split the nic via sriov and have dedicated netdev via VF just for 
> XDP that's way cleaner approach.
> I guess we won't need to do xdp_rxqmask after all.

+1

I was thinking about using eBPF to direct to NIC queues but concluded
that doing a redirect to a VF is cleaner.  Especially if the PF driver
supports VF representatives we could potentially just use
bpf_redirect(VFR netdev) and the VF doesn't even have to be handled by
the same stack.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] XDP seeking input from NIC hardware vendors

2016-07-07 Thread Jakub Kicinski via iovisor-dev
On Thu, 7 Jul 2016 15:18:11 +, Fastabend, John R wrote:
> The other interesting thing would be to do more than just packet
> steering but actually run a more complete XDP program. Netronome
> supports this right. The question I have though is this a stacked of
> XDP programs one or more designated for hardware and some running in
> software perhaps with some annotation in the program so the hardware
> JIT knows where to place programs or do we expect the JIT itself to
> try and decide what is best to offload. I think the easiest to start
> with is to annotate the programs.
> 
> Also as far as I know a lot of hardware can stick extra data to the
> front or end of a packet so you could push metadata calculated by the
> program here in a generic way without having to extend XDP defined
> metadata structures. Another option is to DMA the metadata to a
> specified address. With this metadata the consumer/producer XDP
> programs have to agree on the format but no one else.

Yes!

At the XDP summit we were discussing pipe-lining XDP programs in
general, with different stages of the pipeline potentially using
specific hardware capabilities or even being directly mappable on
fixed HW functions.

Designating parsing as one of specialized blocks makes sense in a long
run, probably at the first stage with recirculation possible.  We also
have some parsing HW we could utilize at some point.  However, I'm
worried that it's too early to impose constraints and APIs.  I agree
that we should first set a standard way to pass metadata across tail
calls to facilitate any form of pipe lining, regardless of which parts
of pipeline HW is able to offload.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev