Re: [PATCH RFC PoC 0/3] nftables meets bpf

2018-02-21 Thread Jakub Kicinski
On Wed, 21 Feb 2018 16:30:07 -0800, Florian Fainelli wrote:
> On 02/21/2018 03:46 PM, Jakub Kicinski wrote:
> > On Tue, 20 Feb 2018 11:58:22 +0100, Pablo Neira Ayuso wrote:  
> >> We also have a large range of TCAM based hardware offload outthere
> >> that will _not_ work with your BPF HW offload infrastructure. What
> >> this bpf infrastructure pushes into the kernel is just a blob
> >> expressing things in a very low-level instruction-set: trying to find
> >> a mapping of that to typical HW intermediate representations in the
> >> TCAM based HW offload world will be simply crazy.  
> > 
> > I'm not sure where the TCAM talk is coming from.  Think much smaller -
> > cellular modems/phone SoCs, 32bit ARM/MIPS router box CPUs.  The
> > information the verifier is gathering will be crucial for optimizing
> > those.  Please don't discount the value of being able to use
> > heterogeneous processing units by the networking stack.
> 
> The only use case that we have a good answer for is when there is no HW
> offload capability available, because there, we know that eBPF is our
> best possible solution for a software fast path, in large part because
> of all the efforts that went into making it both safe and fast.

I was trying to point out that JITing eBPF for the host on 32 bit
systems is already a pain, Jiong Wang is leading an effort to improve
this both from LLVM and verifier angles, IOW running through the
verifier may become useful even for host JITs :)

> When there is offloading HW available, there does not appear to be a
> perfect answer to this problem of, given a standard Linux utility that
> can express any sort of match + action, be it ethtool::rxnfc,
> tc/cls_{u32,flower}, nftables, how do I transform that into what makes
> most sense to my HW? You could:
> 
> - have hardware that understands BPF bytecode directly, great, then you
> don't have to do anything, just pass it up the driver baby, oh wait,
> it's not that simple, the NFP driver is not small

True, it's not the largest but fair point, IMHO we should be trying to
push for sharing as much code between drivers as possible, and on all
fronts, but that's a topic for another time...

> - transform BPF back into something that your hardware understand, does
> that belong in the kernel? Maybe, maybe not

Personally, I think there is non-zero probability of AMP CPUs/systems
becoming more common.  NFP is very powerful and fast, but less advanced
solution may just use an off-the-shelf MIPS/ARM/Andes core.  Taking it
slightly further from home to the cellular/WiFi wake up problem which
was mentioned by Android folks at one of netdevs - if we have
MIPS/ARM/Andes *host* JIT in the kernel, and the NIC processor is built
on one of those all the driver needs to provide is some glue and we can
offload filtering to the MCU on the NIC/modem!

> - use a completely different intermediate representation like P4,
> brainfuck, I don't know
>
> Maybe first things first, we have at least 3 different programming
> interfaces, if not more: ethtool::rxnfc, tc/cls_{u32,flower}, nftables
> that are all capable of programming TCAMs and hardware capable of match
> + action, how about we start with having some sort of common library
> code that:
> 
> - validates input parameters against HW capabilities

This one may be quite hard.

> - does the adequate transformation from any of these interfaces into a
> generic set of input parameters
> - define what the appropriate behavior is when programming through all
> of these 3 interfaces that ultimately access the same shared piece of
> HW, and therefore need to manage resources allocation?

That would be great! :)  Flower stands out today as the most feature
rich and a go-to for TCAM offloads.

> 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC PoC 0/3] nftables meets bpf

2018-02-21 Thread Florian Fainelli
On 02/21/2018 03:46 PM, Jakub Kicinski wrote:
> On Tue, 20 Feb 2018 11:58:22 +0100, Pablo Neira Ayuso wrote:
>> We also have a large range of TCAM based hardware offload outthere
>> that will _not_ work with your BPF HW offload infrastructure. What
>> this bpf infrastructure pushes into the kernel is just a blob
>> expressing things in a very low-level instruction-set: trying to find
>> a mapping of that to typical HW intermediate representations in the
>> TCAM based HW offload world will be simply crazy.
> 
> I'm not sure where the TCAM talk is coming from.  Think much smaller -
> cellular modems/phone SoCs, 32bit ARM/MIPS router box CPUs.  The
> information the verifier is gathering will be crucial for optimizing
> those.  Please don't discount the value of being able to use
> heterogeneous processing units by the networking stack.
> 

The only use case that we have a good answer for is when there is no HW
offload capability available, because there, we know that eBPF is our
best possible solution for a software fast path, in large part because
of all the efforts that went into making it both safe and fast.

When there is offloading HW available, there does not appear to be a
perfect answer to this problem of, given a standard Linux utility that
can express any sort of match + action, be it ethtool::rxnfc,
tc/cls_{u32,flower}, nftables, how do I transform that into what makes
most sense to my HW? You could:

- have hardware that understands BPF bytecode directly, great, then you
don't have to do anything, just pass it up the driver baby, oh wait,
it's not that simple, the NFP driver is not small

- transform BPF back into something that your hardware understand, does
that belong in the kernel? Maybe, maybe not

- use a completely different intermediate representation like P4,
brainfuck, I don't know

Maybe first things first, we have at least 3 different programming
interfaces, if not more: ethtool::rxnfc, tc/cls_{u32,flower}, nftables
that are all capable of programming TCAMs and hardware capable of match
+ action, how about we start with having some sort of common library
code that:

- validates input parameters against HW capabilities
- does the adequate transformation from any of these interfaces into a
generic set of input parameters
- define what the appropriate behavior is when programming through all
of these 3 interfaces that ultimately access the same shared piece of
HW, and therefore need to manage resources allocation?


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC PoC 0/3] nftables meets bpf

2018-02-21 Thread Jakub Kicinski
On Tue, 20 Feb 2018 11:58:22 +0100, Pablo Neira Ayuso wrote:
> We also have a large range of TCAM based hardware offload outthere
> that will _not_ work with your BPF HW offload infrastructure. What
> this bpf infrastructure pushes into the kernel is just a blob
> expressing things in a very low-level instruction-set: trying to find
> a mapping of that to typical HW intermediate representations in the
> TCAM based HW offload world will be simply crazy.

I'm not sure where the TCAM talk is coming from.  Think much smaller -
cellular modems/phone SoCs, 32bit ARM/MIPS router box CPUs.  The
information the verifier is gathering will be crucial for optimizing
those.  Please don't discount the value of being able to use
heterogeneous processing units by the networking stack.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC PoC 0/3] nftables meets bpf

2018-02-20 Thread Daniel Borkmann
Hi Pablo,

On 02/20/2018 11:58 AM, Pablo Neira Ayuso wrote:
> On Mon, Feb 19, 2018 at 08:57:39PM +0100, Daniel Borkmann wrote:
>> On 02/19/2018 05:37 PM, Pablo Neira Ayuso wrote:
>> [...]
>>> * Simplified infrastructure: We don't need the ebpf verifier complexity
>>>   either given we trust the code we generate from the kernel. We don't
>>>   need any complex userspace tooling either, just libnftnl and nft
>>>   userspace binaries.
>>>
>>> * Hardware offload: We can use this to offload rulesets to the only
>>>   smartnic driver that we have in the tree that already implements bpf
>>>   offload, hence, we can reuse this work already in place.
>>
>> In addition Dave's points, regarding the above two, this will also only
>> work behind the verifier since NIC offloading piggy-backs on the verifier's
>> program analysis to prepare and generate a dev specific JITed BPF
>> prog, so it's not the same as normal host JITs (and there, the cBPF ->
>> eBPF in kernel migration adds a lot of headaches already due to
>> different underlying assumptions coming from the two flavors, even
>> if both are eBPF insns in the end), and given this, offloading will
>> also only work for eBPF and not cBPF.
> 
> We also have a large range of TCAM based hardware offload outthere
> that will _not_ work with your BPF HW offload infrastructure. What
> this bpf infrastructure pushes into the kernel is just a blob
> expressing things in a very low-level instruction-set: trying to find
> a mapping of that to typical HW intermediate representations in the
> TCAM based HW offload world will be simply crazy.

Sure, and I think that's fine; there have been possible ways proposed
in last netdev conference how this can be addressed by adding hints [0]
in a programmable way as meta data in front of the packet as one option
to accelerate. Other than that for fully pushing into hardware people will
get a SmartNIC and there are multiple big vendors in that area working
on them. Potentially in few years from now they're more and more becoming
a commodity in DCs, lets see. Maybe we'll be programming them similarly
as the case with graphics cards today. :-)

  [0] 
https://www.netdevconf.org/2.2/session.html?waskiewicz-xdpacceleration-talk

>> There's a lot more the verifier is doing internally, like performing
>> various different program rewrites from the context, for helpers
>> (e.g. inlining), and for internal insn mappings that are not exposed
>> (e.g. in calls), so we definitely need to go through it.
> 
> If we need to call the verifier from the kernel for the code that we
> generate there for this initial stage, that should be not an issue.
> 
> The BPF interface is lacking many of the features and flexibility we
> have in netlink these days, and it is only allowing for monolitic
> ruleset replacement. This approach also loses internal rule stateful

That only depends how you partition your program, a partial reconfiguration
is definitely possible and done so today, for example as talked about in
LB use case where the packet processing is staged e.g. into sampling,
DDoS mitigation, and encap + redirect phase, where each of the components
can be replaced atomically during runtime. So there is definitely flexibility
available.

Thanks,
Daniel

> information that we're doing in the packet path when updating the
> ruleset. So it's taking us back to exactly the same mistakes we made
> in iptables back in the 90s as it's been mentioned already.
> 
> So I just wish I can count with your help in this process, we can get
> the best of the two worlds by providing a subsystem that allows users
> to configure packet classification through one single interface, no
> matter if the policy representation ends up being in software or HW
> offloads, either TCAM or smartnic.
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC PoC 0/3] nftables meets bpf

2018-02-20 Thread Pablo Neira Ayuso
Hi Daniel,

On Mon, Feb 19, 2018 at 08:57:39PM +0100, Daniel Borkmann wrote:
> On 02/19/2018 05:37 PM, Pablo Neira Ayuso wrote:
> [...]
> > * Simplified infrastructure: We don't need the ebpf verifier complexity
> >   either given we trust the code we generate from the kernel. We don't
> >   need any complex userspace tooling either, just libnftnl and nft
> >   userspace binaries.
> > 
> > * Hardware offload: We can use this to offload rulesets to the only
> >   smartnic driver that we have in the tree that already implements bpf
> >   offload, hence, we can reuse this work already in place.
> 
> In addition Dave's points, regarding the above two, this will also only
> work behind the verifier since NIC offloading piggy-backs on the verifier's
> program analysis to prepare and generate a dev specific JITed BPF
> prog, so it's not the same as normal host JITs (and there, the cBPF ->
> eBPF in kernel migration adds a lot of headaches already due to
> different underlying assumptions coming from the two flavors, even
> if both are eBPF insns in the end), and given this, offloading will
> also only work for eBPF and not cBPF.

We also have a large range of TCAM based hardware offload outthere
that will _not_ work with your BPF HW offload infrastructure. What
this bpf infrastructure pushes into the kernel is just a blob
expressing things in a very low-level instruction-set: trying to find
a mapping of that to typical HW intermediate representations in the
TCAM based HW offload world will be simply crazy.

> There's a lot more the verifier is doing internally, like performing
> various different program rewrites from the context, for helpers
> (e.g. inlining), and for internal insn mappings that are not exposed
> (e.g. in calls), so we definitely need to go through it.

If we need to call the verifier from the kernel for the code that we
generate there for this initial stage, that should be not an issue.

The BPF interface is lacking many of the features and flexibility we
have in netlink these days, and it is only allowing for monolitic
ruleset replacement. This approach also loses internal rule stateful
information that we're doing in the packet path when updating the
ruleset. So it's taking us back to exactly the same mistakes we made
in iptables back in the 90s as it's been mentioned already.

So I just wish I can count with your help in this process, we can get
the best of the two worlds by providing a subsystem that allows users
to configure packet classification through one single interface, no
matter if the policy representation ends up being in software or HW
offloads, either TCAM or smartnic.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC PoC 0/3] nftables meets bpf

2018-02-19 Thread Daniel Borkmann
On 02/19/2018 05:37 PM, Pablo Neira Ayuso wrote:
[...]
> * Simplified infrastructure: We don't need the ebpf verifier complexity
>   either given we trust the code we generate from the kernel. We don't
>   need any complex userspace tooling either, just libnftnl and nft
>   userspace binaries.
> 
> * Hardware offload: We can use this to offload rulesets to the only
>   smartnic driver that we have in the tree that already implements bpf
>   offload, hence, we can reuse this work already in place.

In addition Dave's points, regarding the above two, this will also only
work behind the verifier since NIC offloading piggy-backs on the verifier's
program analysis to prepare and generate a dev specific JITed BPF prog, so
it's not the same as normal host JITs (and there, the cBPF -> eBPF in kernel
migration adds a lot of headaches already due to different underlying
assumptions coming from the two flavors, even if both are eBPF insns in the
end), and given this, offloading will also only work for eBPF and not cBPF.
There's a lot more the verifier is doing internally, like performing various
different program rewrites from the context, for helpers (e.g. inlining),
and for internal insn mappings that are not exposed (e.g. in calls), so we
definitely need to go through it.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html