Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-10-17 Thread Alexei Starovoitov
On Mon, Oct 16, 2023 at 7:38 PM Jason Wang  wrote:
>
> On Tue, Oct 17, 2023 at 7:53 AM Alexei Starovoitov
>  wrote:
> >
> > On Sun, Oct 15, 2023 at 10:10 AM Akihiko Odaki  
> > wrote:
> > >
> > > On 2023/10/16 1:07, Alexei Starovoitov wrote:
> > > > On Sun, Oct 15, 2023 at 7:17 AM Akihiko Odaki 
> > > >  wrote:
> > > >>
> > > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > >> index 0448700890f7..298634556fab 100644
> > > >> --- a/include/uapi/linux/bpf.h
> > > >> +++ b/include/uapi/linux/bpf.h
> > > >> @@ -988,6 +988,7 @@ enum bpf_prog_type {
> > > >>  BPF_PROG_TYPE_SK_LOOKUP,
> > > >>  BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls 
> > > >> */
> > > >>  BPF_PROG_TYPE_NETFILTER,
> > > >> +   BPF_PROG_TYPE_VNET_HASH,
> > > >
> > > > Sorry, we do not add new stable program types anymore.
> > > >
> > > >> @@ -6111,6 +6112,10 @@ struct __sk_buff {
> > > >>  __u8  tstamp_type;
> > > >>  __u32 :24;  /* Padding, future use. */
> > > >>  __u64 hwtstamp;
> > > >> +
> > > >> +   __u32 vnet_hash_value;
> > > >> +   __u16 vnet_hash_report;
> > > >> +   __u16 vnet_rss_queue;
> > > >>   };
> > > >
> > > > we also do not add anything to uapi __sk_buff.
> > > >
> > > >> +const struct bpf_verifier_ops vnet_hash_verifier_ops = {
> > > >> +   .get_func_proto = sk_filter_func_proto,
> > > >> +   .is_valid_access= sk_filter_is_valid_access,
> > > >> +   .convert_ctx_access = bpf_convert_ctx_access,
> > > >> +   .gen_ld_abs = bpf_gen_ld_abs,
> > > >> +};
> > > >
> > > > and we don't do ctx rewrites like this either.
> > > >
> > > > Please see how hid-bpf and cgroup rstat are hooking up bpf
> > > > in _unstable_ way.
> > >
> > > Can you describe what "stable" and "unstable" mean here? I'm new to BPF
> > > and I'm worried if it may mean the interface stability.
> > >
> > > Let me describe the context. QEMU bundles an eBPF program that is used
> > > for the "eBPF steering program" feature of tun. Now I'm proposing to
> > > extend the feature to allow to return some values to the userspace and
> > > vhost_net. As such, the extension needs to be done in a way that ensures
> > > interface stability.
> >
> > bpf is not an option then.
> > we do not add stable bpf program types or hooks any more.
>
> Does this mean eBPF could not be used for any new use cases other than
> the existing ones?

It means that any new use of bpf has to be unstable for the time being.

> > If a kernel subsystem wants to use bpf it needs to accept the fact
> > that such bpf extensibility will be unstable and subsystem maintainers
> > can decide to remove such bpf support in the future.
>
> I don't see how it is different from the existing ones.

Can we remove BPF_CGROUP_RUN_PROG_INET_INGRESS hook along
with BPF_PROG_TYPE_CGROUP_SKB program type?
Obviously not.
We can refactor it. We can move it around, but not remove.
That's the difference in stable vs unstable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-10-16 Thread Alexei Starovoitov
On Sun, Oct 15, 2023 at 10:10 AM Akihiko Odaki  wrote:
>
> On 2023/10/16 1:07, Alexei Starovoitov wrote:
> > On Sun, Oct 15, 2023 at 7:17 AM Akihiko Odaki  
> > wrote:
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 0448700890f7..298634556fab 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -988,6 +988,7 @@ enum bpf_prog_type {
> >>  BPF_PROG_TYPE_SK_LOOKUP,
> >>  BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
> >>  BPF_PROG_TYPE_NETFILTER,
> >> +   BPF_PROG_TYPE_VNET_HASH,
> >
> > Sorry, we do not add new stable program types anymore.
> >
> >> @@ -6111,6 +6112,10 @@ struct __sk_buff {
> >>  __u8  tstamp_type;
> >>  __u32 :24;  /* Padding, future use. */
> >>  __u64 hwtstamp;
> >> +
> >> +   __u32 vnet_hash_value;
> >> +   __u16 vnet_hash_report;
> >> +   __u16 vnet_rss_queue;
> >>   };
> >
> > we also do not add anything to uapi __sk_buff.
> >
> >> +const struct bpf_verifier_ops vnet_hash_verifier_ops = {
> >> +   .get_func_proto = sk_filter_func_proto,
> >> +   .is_valid_access= sk_filter_is_valid_access,
> >> +   .convert_ctx_access = bpf_convert_ctx_access,
> >> +   .gen_ld_abs = bpf_gen_ld_abs,
> >> +};
> >
> > and we don't do ctx rewrites like this either.
> >
> > Please see how hid-bpf and cgroup rstat are hooking up bpf
> > in _unstable_ way.
>
> Can you describe what "stable" and "unstable" mean here? I'm new to BPF
> and I'm worried if it may mean the interface stability.
>
> Let me describe the context. QEMU bundles an eBPF program that is used
> for the "eBPF steering program" feature of tun. Now I'm proposing to
> extend the feature to allow to return some values to the userspace and
> vhost_net. As such, the extension needs to be done in a way that ensures
> interface stability.

bpf is not an option then.
we do not add stable bpf program types or hooks any more.
If a kernel subsystem wants to use bpf it needs to accept the fact
that such bpf extensibility will be unstable and subsystem maintainers
can decide to remove such bpf support in the future.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-10-15 Thread Alexei Starovoitov
On Sun, Oct 15, 2023 at 7:17 AM Akihiko Odaki  wrote:
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 0448700890f7..298634556fab 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -988,6 +988,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SK_LOOKUP,
> BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
> BPF_PROG_TYPE_NETFILTER,
> +   BPF_PROG_TYPE_VNET_HASH,

Sorry, we do not add new stable program types anymore.

> @@ -6111,6 +6112,10 @@ struct __sk_buff {
> __u8  tstamp_type;
> __u32 :24;  /* Padding, future use. */
> __u64 hwtstamp;
> +
> +   __u32 vnet_hash_value;
> +   __u16 vnet_hash_report;
> +   __u16 vnet_rss_queue;
>  };

we also do not add anything to uapi __sk_buff.

> +const struct bpf_verifier_ops vnet_hash_verifier_ops = {
> +   .get_func_proto = sk_filter_func_proto,
> +   .is_valid_access= sk_filter_is_valid_access,
> +   .convert_ctx_access = bpf_convert_ctx_access,
> +   .gen_ld_abs = bpf_gen_ld_abs,
> +};

and we don't do ctx rewrites like this either.

Please see how hid-bpf and cgroup rstat are hooking up bpf
in _unstable_ way.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH bpf-next v2] net: don't include filter.h from net/sock.h

2021-12-29 Thread Alexei Starovoitov
On Tue, Dec 28, 2021 at 4:49 PM Jakub Kicinski  wrote:
>
> sock.h is pretty heavily used (5k objects rebuilt on x86 after
> it's touched). We can drop the include of filter.h from it and
> add a forward declaration of struct sk_filter instead.
> This decreases the number of rebuilt objects when bpf.h
> is touched from ~5k to ~1k.
>
> There's a lot of missing includes this was masking. Primarily
> in networking tho, this time.
>
> Acked-by: Marc Kleine-Budde 
> Signed-off-by: Jakub Kicinski 
> ---
> v2: https://lore.kernel.org/all/20211228192519.386913-1-k...@kernel.org/
>  - fix build in bond on ia64
>  - fix build in ip6_fib with randconfig

Nice! Applied. Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type

2021-01-20 Thread Alexei Starovoitov
On Tue, Jan 12, 2021 at 12:55 PM Yuri Benditovich
 wrote:
>
> On Tue, Jan 12, 2021 at 10:40 PM Yuri Benditovich
>  wrote:
> >
> > On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
> >  wrote:
> > >
> > > This program type can set skb hash value. It will be useful
> > > when the tun will support hash reporting feature if virtio-net.
> > >
> > > Signed-off-by: Yuri Benditovich 
> > > ---
> > >  drivers/net/tun.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 7959b5c2d11f..455f7afc1f36 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, 
> > > struct tun_prog __rcu **prog_p,
> > > prog = NULL;
> > > } else {
> > > prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > +   if (IS_ERR(prog))
> > > +       prog = bpf_prog_get_type(fd, 
> > > BPF_PROG_TYPE_SCHED_CLS);
> > > if (IS_ERR(prog))
> > > return PTR_ERR(prog);
> > > }
> >
> > Comment from Alexei Starovoitov:
> > Patches 1 and 2 are missing for me, so I couldn't review properly,
> > but this diff looks odd.
> > It allows sched_cls prog type to attach to tun.
> > That means everything that sched_cls progs can do will be done from tun 
> > hook?
>
> We do not have an intention to modify the packet in this steering eBPF.

The intent is irrelevant. Using SCHED_CLS here will let users modify the packet
and some users will do so. Hence the tun code has to support it.

> There is just one function that unavailable for BPF_PROG_TYPE_SOCKET_FILTER
> that the eBPF needs to make possible to deliver the hash to the guest
> VM - it is 'bpf_set_hash'
>
> Does it mean that we need to define a new eBPF type for socket filter
> operations + set_hash?
>
> Our problem is that the eBPF calculates 32-bit hash, 16-bit queue
> index and 8-bit of hash type.
> But it is able to return only 32-bit integer, so in this set of
> patches the eBPF returns
> queue index and hash type and saves the hash in skb->hash using 
> bpf_set_hash().

bpf prog can only return a 32-bit integer. That's true.
But the prog can use helpers to set any number of bits and variables.
bpf_set_hash_v2() with hash, queue and index arguments could fit this purpose,
but if you allow it for SCHED_CLS type,
tc side of the code should be ready to deal with that too and this extended
helper should be meaningful for both tc and tun.

In general if the purpose of the prog is to compute three values they better be
grouped together. Returned two of them via ORed 32-bit integer and
returning 32-bit via bpf_set_hash is an awkward api.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type

2021-01-12 Thread Alexei Starovoitov
On Tue, Jan 12, 2021 at 11:42 AM Yuri Benditovich
 wrote:
>
> This program type can set skb hash value. It will be useful
> when the tun will support hash reporting feature if virtio-net.
>
> Signed-off-by: Yuri Benditovich 
> ---
>  drivers/net/tun.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 7959b5c2d11f..455f7afc1f36 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct 
> tun_prog __rcu **prog_p,
> prog = NULL;
> } else {
> prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> +   if (IS_ERR(prog))
> +   prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);

You've ignored the feedback and just resend? what for?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/18] tools: bpf: Use local copy of headers including uapi/linux/filter.h

2020-07-01 Thread Alexei Starovoitov
On Tue, Jun 30, 2020 at 10:37 AM Will Deacon  wrote:
>
> Pulling header files directly out of the kernel sources for inclusion in
> userspace programs is highly error prone, not least because it bypasses
> the kbuild infrastructure entirely and so may end up referencing other
> header files that have not been generated.
>
> Subsequent patches will cause compiler.h to pull in the ungenerated
> asm/rwonce.h file via filter.h, breaking the build for tools/bpf:
>
>   | $ make -C tools/bpf
>   | make: Entering directory '/linux/tools/bpf'
>   |   CC   bpf_jit_disasm.o
>   |   LINK bpf_jit_disasm
>   |   CC   bpf_dbg.o
>   | In file included from /linux/include/uapi/linux/filter.h:9,
>   |  from /linux/tools/bpf/bpf_dbg.c:41:
>   | /linux/include/linux/compiler.h:247:10: fatal error: asm/rwonce.h: No 
> such file or directory
>   |  #include 
>   |   ^~
>   | compilation terminated.
>   | make: *** [Makefile:61: bpf_dbg.o] Error 1
>   | make: Leaving directory '/linux/tools/bpf'
>
> Take a copy of the installed version of linux/filter.h  (i.e. the one
> created by the 'headers_install' target) into tools/include/uapi/linux/
> and adjust the BPF tool Makefile to reference the local include
> directories instead of those in the main source tree.
>
> Cc: Alexei Starovoitov 
> Cc: Masahiro Yamada 
> Suggested-by: Daniel Borkmann 
> Reported-by: Xiao Yang 
> Signed-off-by: Will Deacon 

Acked-by: Alexei Starovoitov 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_net: fix PAGE_SIZE > 64k

2017-01-28 Thread Alexei Starovoitov
On Tue, Jan 24, 2017 at 7:48 PM, John Fastabend
 wrote:
>
> It is a concern on my side. I want XDP and Linux stack to work
> reasonably well together.

btw the micro benchmarks showed that page per packet approach
that xdp took in mlx4 should be 10% slower vs normal operation
for tcp/ip stack. We thought that for our LB use case
it will be an acceptable slowdown, but turned out that overall we
got a performance boost, since xdp model simplified user space
and got data path faster, so we magically got extra free cpu
that is used for other apps on the same host and overall
perf win despite extra overhead in tcp/ip.
Not all use cases are the same and not everyone will be as lucky,
so I'd like to see performance of xdp_pass improving too, though
it turned out to be not as high priority as I initially estimated.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization