Re: selftests/bpf :get_cgroup_id_user: File not found: /sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id

2018-11-21 Thread Y Song
On Wed, Nov 21, 2018 at 3:44 AM Naresh Kamboju
 wrote:
>
> Kselftest bpf get_cgroup_id_user is failed on all devices.
>
> selftests: bpf: get_cgroup_id_user
> main:PASS:setup_cgroup_environment
> main:PASS:create_and_get_cgroup
> main:PASS:join_cgroup
> main:PASS:bpf_prog_load
> main:PASS:bpf_find_map
> main:PASS:bpf_find_map
> main:FAIL:open err -1 errno 2
> not ok 1..15 selftests: bpf: get_cgroup_id_user [FAIL]
>
> The strace output shows,
> This expected file not found,
> "/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id"
>
> bpf(BPF_MAP_UPDATE_ELEM, {map_fd=5, key=0x7fff0c68c138,
> value=0x7fff0c68c13c, flags=BPF_ANY}, 72) = 0
> open(\"/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id\",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> write(1, \"main:FAIL:open err -1 errno 2\n\", 30main:FAIL:open err -1 errno 2
>
> Am I missing any pre-requirement ?

You probably miss kernel config CONFIG_FTRACE_SYSCALLS.

>
> Best regards
> Naresh Kamboju


Re: [PATCH bpf-next v2] libbpf: make sure bpf headers are c++ include-able

2018-11-20 Thread Y Song
On Tue, Nov 20, 2018 at 1:37 PM Stanislav Fomichev  wrote:
>
> Wrap headers in extern "C", to turn off C++ mangling.
> This simplifies including libbpf in c++ and linking against it.
>
> v2 changes:
> * do the same for btf.h
>
> Signed-off-by: Stanislav Fomichev 

Acked-by: Yonghong Song 

> ---
>  tools/lib/bpf/bpf.h| 9 +
>  tools/lib/bpf/btf.h| 8 
>  tools/lib/bpf/libbpf.h | 9 +
>  3 files changed, 26 insertions(+)


Re: [PATCH bpf-next] libbpf: make sure bpf headers are c++ include-able

2018-11-20 Thread Y Song
On Tue, Nov 20, 2018 at 10:19 AM Stanislav Fomichev  wrote:
>
> Wrap headers in extern "C", to turn off C++ mangling.
> This simplifies including libbpf in c++ and linking against it.
>
> Signed-off-by: Stanislav Fomichev 
> ---
>  tools/lib/bpf/bpf.h| 9 +
>  tools/lib/bpf/libbpf.h | 9 +

Do you want to add tools/lib/bpf/btf.h as well? it has some functions
which could be used outside libbpf as well.

>  2 files changed, 18 insertions(+)
>
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 26a51538213c..9ea3aec82d8a 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -27,6 +27,10 @@
>  #include 
>  #include 
>
> +#ifdef __cplusplus
> +extern "C" {
> +#endif

Some (but not all) __cplusplus extern wrappers wraps standard include as well
like "#include ", "#include ". probably does not
matter here
since they may not have function prototype. But just want to point it out so you
are aware of this and may double check.

> +
>  #ifndef LIBBPF_API
>  #define LIBBPF_API __attribute__((visibility("default")))
>  #endif
> @@ -128,4 +132,9 @@ LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, 
> char *log_buf,
>  LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
>  __u32 *buf_len, __u32 *prog_id, __u32 
> *fd_type,
>  __u64 *probe_offset, __u64 *probe_addr);
> +
> +#ifdef __cplusplus
> +} /* extern "C" */
> +#endif
> +
>  #endif /* __LIBBPF_BPF_H */
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index b1686a787102..74e57e041705 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -16,6 +16,10 @@
>  #include   // for size_t
>  #include 
>
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  #ifndef LIBBPF_API
>  #define LIBBPF_API __attribute__((visibility("default")))
>  #endif
> @@ -335,4 +339,9 @@ int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, 
> int ifindex,
> libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
>  int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int 
> handle,
>  libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie);
> +
> +#ifdef __cplusplus
> +} /* extern "C" */
> +#endif
> +
>  #endif /* __LIBBPF_LIBBPF_H */
> --
> 2.19.1.1215.g8438c0b245-goog
>


Re: [PATCH v2 0/4] Fix unsafe BPF_PROG_TEST_RUN interface

2018-11-20 Thread Y Song
On Tue, Nov 20, 2018 at 7:43 AM Lorenz Bauer  wrote:
>
> Right now, there is no safe way to use BPF_PROG_TEST_RUN with data_out.
> This is because bpf_test_finish copies the output buffer to user space
> without checking its size. This can lead to the kernel overwriting
> data in user space after the buffer if xdp_adjust_head and friends are
> in play.
>
> Changes in v2:
> * Make the syscall return ENOSPC if data_size_out is too small
> * Make bpf_prog_test_run return EINVAL if size_out is missing
> * Document the new behaviour of data_size_out
>
> Lorenz Bauer (4):
>   bpf: respect size hint to BPF_PROG_TEST_RUN if present
>   tools: sync uapi/linux/bpf.h
>   libbpf: require size hint in bpf_prog_test_run
>   selftests: add a test for bpf_prog_test_run output size

For the series, if we decided to take this approach rather than
amending another field
in the uapi as described in https://www.spinics.net/lists/netdev/msg534277.html,
then
  Acked-by: Yonghong Song 


Re: [PATCH 3/3] selftests: add a test for bpf_prog_test_run output size

2018-11-20 Thread Y Song
On Tue, Nov 20, 2018 at 3:35 AM Lorenz Bauer  wrote:
>
> On Sun, 18 Nov 2018 at 05:59, Y Song  wrote:
> >
> > On Fri, Nov 16, 2018 at 12:55 PM Lorenz Bauer  wrote:
> > >
> > > Make sure that bpf_prog_test_run returns the correct length
> > > in the size_out argument and that the kernel respects the
> > > output size hint.
> > >
> > > Signed-off-by: Lorenz Bauer 
> > > ---
> > >  tools/testing/selftests/bpf/test_progs.c | 34 
> > >  1 file changed, 34 insertions(+)
> > >
> > > diff --git a/tools/testing/selftests/bpf/test_progs.c 
> > > b/tools/testing/selftests/bpf/test_progs.c
> > > index 560d7527b86b..6ab98e10e86f 100644
> > > --- a/tools/testing/selftests/bpf/test_progs.c
> > > +++ b/tools/testing/selftests/bpf/test_progs.c
> > > @@ -124,6 +124,39 @@ static void test_pkt_access(void)
> > > bpf_object__close(obj);
> > >  }
> > >
> > > +static void test_output_size_hint(void)
> > > +{
> > > +   const char *file = "./test_pkt_access.o";
> > > +   struct bpf_object *obj;
> > > +   __u32 retval, size, duration;
> > > +   int err, prog_fd;
> > > +   char buf[10];
> > > +
> > > +   err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, , 
> > > _fd);
> > > +   if (err) {
> > > +   error_cnt++;
> > > +   return;
> > > +   }
> > CHECK can also be used here.
> > if (CHECK(...)) {
> >goto done;
> > }
> > where label "done" is right before bpf_object__close.
>
> I just copied this part from test_pkt_access, happy to change it though.
> However, I think "goto done" would lead to freeing an unallocated
> object in this case?

Right, you can just return here.

>
> --
> Lorenz Bauer  |  Systems Engineer
> 25 Lavington St., London SE1 0NZ
>
> www.cloudflare.com


Re: [PATCH v2 bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-19 Thread Y Song
On Mon, Nov 19, 2018 at 9:40 PM Nikita V. Shirokov  wrote:
>
> in this patch series i'm adding a helper for libbpf which would allow
> it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
> BPF_MAP_TYPE_HASH_OF_MAPS).
> first patch contains new helper + explains proposed workflow
> second patch contains tests which also could be used as example of
> usage
>
> v1->v2:
>  - addressing nits
>  - removing const identifier from fd in new helper
>  - starting to check return val for bpf_map_update_elem
>
> Nikita V. Shirokov (2):
>   bpf: adding support for map in map in libbpf
>   bpf: adding tests for mapinmap helpber in libbpf

For the whole series.
Acked-by: Yonghong Song 

>
>  tools/lib/bpf/libbpf.c  |  7 +++
>  tools/lib/bpf/libbpf.h  |  2 +
>  tools/testing/selftests/bpf/Makefile|  3 +-
>  tools/testing/selftests/bpf/test_mapinmap.c | 49 +
>  tools/testing/selftests/bpf/test_maps.c | 82 
> +
>  5 files changed, 142 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c
>
> --
> 2.15.1
>


Re: [PATCH iproute2] bpf: initialise map symbol before retrieving and comparing its type

2018-11-19 Thread Y Song
On Mon, Nov 19, 2018 at 5:28 PM Quentin Monnet
 wrote:
>
> In order to compare BPF map symbol type correctly in regard to the
> latest LLVM, commit 7a04dd84a7f9 ("bpf: check map symbol type properly
> with newer llvm compiler") compares map symbol type to both NOTYPE and
> OBJECT. To do so, it first retrieves the type from "sym.st_info" and
> stores it into a temporary variable.
>
> However, the type is collected from the symbol "sym" before this latter
> symbol is actually updated. gelf_getsym() is called after that and
> updates "sym", and when comparison with OBJECT or NOTYPE happens it is
> done on the type of the symbol collected in the previous passage of the
> loop (or on an uninitialised symbol on the first passage). This may
> eventually break map collection from the ELF file.
>
> Fix this by assigning the type to the temporary variable only after the
> call to gelf_getsym().
>
> Fixes: 7a04dd84a7f9 ("bpf: check map symbol type properly with newer llvm 
> compiler")
> Reported-by: Ron Philip 
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jiong Wang 

Thanks for the fix!
Acked-by: Yonghong Song 

> ---
>  lib/bpf.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/lib/bpf.c b/lib/bpf.c
> index 45f279fa4a41..6aff8f7bad7f 100644
> --- a/lib/bpf.c
> +++ b/lib/bpf.c
> @@ -1758,11 +1758,12 @@ static const char *bpf_map_fetch_name(struct 
> bpf_elf_ctx *ctx, int which)
> int i;
>
> for (i = 0; i < ctx->sym_num; i++) {
> -   int type = GELF_ST_TYPE(sym.st_info);
> +   int type;
>
> if (gelf_getsym(ctx->sym_tab, i, ) != )
> continue;
>
> +   type = GELF_ST_TYPE(sym.st_info);
> if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
> (type != STT_NOTYPE && type != STT_OBJECT) ||
> sym.st_shndx != ctx->sec_maps ||
> @@ -1851,11 +1852,12 @@ static int bpf_map_num_sym(struct bpf_elf_ctx *ctx)
> GElf_Sym sym;
>
> for (i = 0; i < ctx->sym_num; i++) {
> -   int type = GELF_ST_TYPE(sym.st_info);
> +   int type;
>
> if (gelf_getsym(ctx->sym_tab, i, ) != )
> continue;
>
> +   type = GELF_ST_TYPE(sym.st_info);
> if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
> (type != STT_NOTYPE && type != STT_OBJECT) ||
> sym.st_shndx != ctx->sec_maps)
> @@ -1931,10 +1933,12 @@ static int bpf_map_verify_all_offs(struct bpf_elf_ctx 
> *ctx, int end)
>  * the table again.
>  */
> for (i = 0; i < ctx->sym_num; i++) {
> -   int type = GELF_ST_TYPE(sym.st_info);
> +   int type;
>
> if (gelf_getsym(ctx->sym_tab, i, ) != )
> continue;
> +
> +   type = GELF_ST_TYPE(sym.st_info);
> if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
> (type != STT_NOTYPE && type != STT_OBJECT) ||
> sym.st_shndx != ctx->sec_maps)
> --
> 2.7.4
>


Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name

2018-11-19 Thread Y Song
On Mon, Nov 19, 2018 at 4:52 PM Stanislav Fomichev  wrote:
>
> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> the name") fixed this issue for maps, let's do the same for programs.]
>
> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> for programs. Pre v4.14 kernels don't know about programs names and
> return an error about unexpected non-zero data. Retry sys_bpf without
> a program name to cover older kernels.
>
> Signed-off-by: Stanislav Fomichev 

Acked-by: Yonghong Song 


Re: [PATCH bpf-next 2/2] bpf: adding tests for mapinmap helpber in libbpf

2018-11-19 Thread Y Song
On Mon, Nov 19, 2018 at 4:13 PM Nikita V. Shirokov  wrote:
>
> adding test/example of bpf_map__add_inner_map_fd usage
>
> Signed-off-by: Nikita V. Shirokov 
> ---
>  tools/testing/selftests/bpf/Makefile|  3 +-
>  tools/testing/selftests/bpf/test_mapinmap.c | 53 
>  tools/testing/selftests/bpf/test_maps.c | 76 
> +
>  3 files changed, 131 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index 57b4712a6276..a3ea69dc9bdf 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
> test_tcp_estats.o test
> test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
> test_lirc_mode2_kern.o \
> get_cgroup_id_kern.o socket_cookie_prog.o 
> test_select_reuseport_kern.o \
> test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
> -   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o 
> test_stack_map.o
> +   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o 
> test_stack_map.o \
> +   test_mapinmap.o
>
>  # Order correspond to 'make run_tests' order
>  TEST_PROGS := test_kmod.sh \
> diff --git a/tools/testing/selftests/bpf/test_mapinmap.c 
> b/tools/testing/selftests/bpf/test_mapinmap.c
> new file mode 100644
> index ..8aef6c652c9c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_mapinmap.c
> @@ -0,0 +1,53 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Facebook */
> +#include 
> +#include 
> +#include 
> +#include "bpf_helpers.h"
> +
> +
nit: extra new line

> +struct bpf_map_def SEC("maps") mim_array = {
> +   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
> +   .key_size = sizeof(int),
> +   /* must be sizeof(__u32) for map in map */
> +   .value_size = sizeof(__u32),
> +   .max_entries = 1,
> +   .map_flags = 0,
> +};
> +
> +struct bpf_map_def SEC("maps") mim_hash = {
> +   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
> +   .key_size = sizeof(int),
> +   /* must be sizeof(__u32) for map in map */
> +   .value_size = sizeof(__u32),
> +   .max_entries = 1,
> +   .map_flags = 0,
> +};
> +
> +
> +
nit: extra new lines.
> +SEC("xdp_mimtest")
> +int xdp_mimtest0(struct xdp_md *ctx)
> +{
> +   int value = 123;
> +   int key = 0;
> +   void *map;
> +
> +   map = bpf_map_lookup_elem(_array, );
> +   if (!map)
> +   return XDP_DROP;
> +
> +   bpf_map_update_elem(map, , , 0);
> +
> +   map = bpf_map_lookup_elem(_hash, );
> +   if (!map)
> +   return XDP_DROP;
> +
> +   bpf_map_update_elem(map, , , 0);
> +
> +   return XDP_PASS;
> +}
> +
> +
nit: extra new line
> +int _version SEC("version") = 1;
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_maps.c 
> b/tools/testing/selftests/bpf/test_maps.c
> index 4db2116e52be..a49ab294971d 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -1080,6 +1080,80 @@ static void test_sockmap(int tasks, void *data)
> exit(1);
>  }
>
> +#define MAPINMAP_PROG "./test_mapinmap.o"
> +static void test_mapinmap(void)
> +{
> +   struct bpf_program *prog;
> +   struct bpf_object *obj;
> +   struct bpf_map *map;
> +   int mim_fd, fd;
> +   int pos = 0;
> +
> +   obj = bpf_object__open(MAPINMAP_PROG);
> +
> +   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
> +   2, 0);
> +   if (fd < 0) {
> +   printf("Failed to create hashmap '%s'!\n", strerror(errno));
> +   exit(1);
> +   }
> +   printf("fd is %d\n", fd);
not sure whether you need to print fd in the console or not. Any particular
reason?
> +
> +   map = bpf_object__find_map_by_name(obj, "mim_array");
> +   if (IS_ERR(map)) {
> +   printf("Failed to load array of maps from test prog\n");
> +   goto out_mapinmap;
> +   }
> +   bpf_map__add_inner_map_fd(map, fd);
> +
> +   map = bpf_object__find_map_by_name(obj, "mim_hash");
> +   if (IS_ERR(map)) {
> +   printf("Failed to load hash of maps from test prog\n");
> +   goto out_mapinmap;
> +   }
> +   bpf_map__add_inner_map_fd(map, fd);
> +
> +
nit: extra new line
> +   bpf_object__for_each_program(prog, obj) {
> +   bpf_program__set_xdp(prog);
> +   }
> +   bpf_object__load(obj);
> +
> +   map = bpf_object__find_map_by_name(obj, "mim_array");
> +   if (IS_ERR(map)) {
> +   printf("Failed to load array of maps from test prog\n");
> +   goto out_mapinmap;
> +   }
> +   mim_fd = bpf_map__fd(map);
> +   if (mim_fd < 0) {
> +   printf("Failed to get descriptor 

Re: [PATCH bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-19 Thread Y Song
On Mon, Nov 19, 2018 at 4:13 PM Nikita V. Shirokov  wrote:
>
> idea is pretty simple. for specified map (pointed by struct bpf_map)
> we would provide descriptor of already loaded map, which is going to be
> used as a prototype for inner map. proposed workflow:
> 1) open bpf's object (bpf_object__open)
> 2) create bpf's map which is going to be used as a prototype
> 3) find (by name) map-in-map which you want to load and update w/
> descriptor of inner map w/ a new helper from this patch
> 4) load bpf program w/ bpf_object__load
>
> inner_map_fd is ignored by any other maps asidef from (hash|array) of
> maps
>
> Signed-off-by: Nikita V. Shirokov 
> ---
>  tools/lib/bpf/libbpf.c | 7 +++
>  tools/lib/bpf/libbpf.h | 2 ++
>  2 files changed, 9 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index a01eb9584e52..a2ee1b1a93b6 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -163,6 +163,7 @@ struct bpf_map {
> char *name;
> size_t offset;
> int map_ifindex;
> +   int inner_map_fd;
> struct bpf_map_def def;
> __u32 btf_key_type_id;
> __u32 btf_value_type_id;
> @@ -1146,6 +1147,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> create_attr.btf_fd = 0;
> create_attr.btf_key_type_id = 0;
> create_attr.btf_value_type_id = 0;
> +   create_attr.inner_map_fd = map->inner_map_fd;
>
> if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> create_attr.btf_fd = btf__fd(obj->btf);
> @@ -2562,6 +2564,11 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
> ifindex)
> map->map_ifindex = ifindex;
>  }
>
> +void bpf_map__add_inner_map_fd(struct bpf_map *map, const int fd)

Do we need "const" attribute here?

> +{
> +   map->inner_map_fd = fd;
> +}
> +
>  static struct bpf_map *
>  __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
>  {
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index b1686a787102..7cb00cd41789 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, 
> __u32 ifindex);
>  LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
>  LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
>
> +LIBBPF_API void bpf_map__add_inner_map_fd(struct bpf_map *map, const int fd);
> +
>  LIBBPF_API long libbpf_get_error(const void *ptr);
>
>  struct bpf_prog_load_attr {
> --
> 2.15.1
>


Re: [PATCH 0/3] Fix unsafe BPF_PROG_TEST_RUN interface

2018-11-17 Thread Y Song
On Fri, Nov 16, 2018 at 12:54 PM Lorenz Bauer  wrote:
>
> Right now, there is no safe way to use BPF_PROG_TEST_RUN with data_out.
> This is because bpf_test_finish copies the output buffer to user space
> without checking its size. This can lead to the kernel overwriting
> data in user space after the buffer if xdp_adjust_head and friends are
> in play.
>
> Fix this by using bpf_attr.test.data_size_out as a size hint. The old
> behaviour is retained if size_hint is zero.

There is a slight change of user space behavior for this patch.
Without this patch, the value bpf_attr.test.data_size_out is output only.
For example,
   output buffer : out_buf (user allocated size 10)
   data_size_out is a random value (e.g., 1),

The actual data to copy is 5.

In today's implementation, the kernel will copy 5 and set data_size_out is 5.

With this patch, the kernel will copy 1 and set data_size_out is 5.

I am not 100% sure at this time whether we CAN overload data_size_out
since it MAY break existing applications.

Alternativley, we can append another field to bpf_attr.test
  __u32 data_out_size;
this will provide the data_out buffer size.
Inside kernel, if the user input attr is smaller than kernel and does not
have data_out_size, the current behavior should be used. Otherwise,
data_out_size is data_out buffer size.

Daniel and Alexei, which option do you think is reasonable?

>
> Interestingly, do_test_single() in test_verifier.c already assumes
> that this is the intended use of data_size_out, and sets it to the
> output buffer size.
>
> Lorenz Bauer (3):
>   bpf: respect size hint to BPF_PROG_TEST_RUN if present
>   libbpf: require size hint in bpf_prog_test_run
>   selftests: add a test for bpf_prog_test_run output size
>
>  net/bpf/test_run.c   |  9 -
>  tools/lib/bpf/bpf.c  |  4 ++-
>  tools/testing/selftests/bpf/test_progs.c | 44 
>  3 files changed, 55 insertions(+), 2 deletions(-)
>
> --
> 2.17.1
>


Re: [PATCH 3/3] selftests: add a test for bpf_prog_test_run output size

2018-11-17 Thread Y Song
On Fri, Nov 16, 2018 at 12:55 PM Lorenz Bauer  wrote:
>
> Make sure that bpf_prog_test_run returns the correct length
> in the size_out argument and that the kernel respects the
> output size hint.
>
> Signed-off-by: Lorenz Bauer 
> ---
>  tools/testing/selftests/bpf/test_progs.c | 34 
>  1 file changed, 34 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/test_progs.c 
> b/tools/testing/selftests/bpf/test_progs.c
> index 560d7527b86b..6ab98e10e86f 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -124,6 +124,39 @@ static void test_pkt_access(void)
> bpf_object__close(obj);
>  }
>
> +static void test_output_size_hint(void)
> +{
> +   const char *file = "./test_pkt_access.o";
> +   struct bpf_object *obj;
> +   __u32 retval, size, duration;
> +   int err, prog_fd;
> +   char buf[10];
> +
> +   err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, , _fd);
> +   if (err) {
> +   error_cnt++;
> +   return;
> +   }
CHECK can also be used here.
if (CHECK(...)) {
   goto done;
}
where label "done" is right before bpf_object__close.
> +
> +   memset(buf, 0, sizeof(buf));
> +
> +   size = 5;
> +   err = bpf_prog_test_run(prog_fd, 1, _v4, sizeof(pkt_v4),
> +   buf, , , );
> +   CHECK(err || retval, "run",
> + "err %d errno %d retval %d\n",
> + err, errno, retval);
> +
> +   CHECK(size != sizeof(pkt_v4), "out_size",
> + "incorrect output size, want %lu have %u\n",
> + sizeof(pkt_v4), size);
> +
> +   CHECK(buf[5] != 0, "overflow",
> + "prog_test_run ignored size hint\n");
> +
> +   bpf_object__close(obj);
> +}
> +
>  static void test_xdp(void)
>  {
> struct vip key4 = {.protocol = 6, .family = AF_INET};
> @@ -1847,6 +1880,7 @@ int main(void)
> jit_enabled = is_jit_enabled();
>
> test_pkt_access();
> +   test_output_size_hint();
> test_xdp();
> test_xdp_adjust_tail();
> test_l4lb_all();
> --
> 2.17.1
>


Re: [PATCH 2/3] libbpf: require size hint in bpf_prog_test_run

2018-11-17 Thread Y Song
On Fri, Nov 16, 2018 at 12:54 PM Lorenz Bauer  wrote:
>
> Require size_out to be non-NULL if data_out is given. This prevents
> accidental overwriting of process memory after the output buffer.
>
> Adjust callers of bpf_prog_test_run to this behaviour.
>
> Signed-off-by: Lorenz Bauer 
> ---
>  tools/lib/bpf/bpf.c  |  4 +++-
>  tools/testing/selftests/bpf/test_progs.c | 10 ++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 03f9bcc4ef50..127a9aa6170e 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -403,10 +403,12 @@ int bpf_prog_test_run(int prog_fd, int repeat, void 
> *data, __u32 size,
> attr.test.data_in = ptr_to_u64(data);
> attr.test.data_out = ptr_to_u64(data_out);
> attr.test.data_size_in = size;
> +   if (data_out)
> +   attr.test.data_size_out = *size_out;

Maybe it is better to return error (-EINVAL) instead of segfault if
size_out is NULL?
we should try to avoid segfault inside the library. This will change
original API behavior, but
I think it is okay since it is in the user space.

> attr.test.repeat = repeat;
>
> ret = sys_bpf(BPF_PROG_TEST_RUN, , sizeof(attr));
> -   if (size_out)
> +   if (data_out)
> *size_out = attr.test.data_size_out;
> if (retval)
> *retval = attr.test.retval;
> diff --git a/tools/testing/selftests/bpf/test_progs.c 
> b/tools/testing/selftests/bpf/test_progs.c
> index 2d3c04f45530..560d7527b86b 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -150,6 +150,7 @@ static void test_xdp(void)
> bpf_map_update_elem(map_fd, , , 0);
> bpf_map_update_elem(map_fd, , , 0);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, 1, _v4, sizeof(pkt_v4),
> buf, , , );
>
> @@ -158,6 +159,7 @@ static void test_xdp(void)
>   "err %d errno %d retval %d size %d\n",
>   err, errno, retval, size);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, 1, _v6, sizeof(pkt_v6),
> buf, , , );
> CHECK(err || retval != XDP_TX || size != 114 ||
> @@ -182,6 +184,7 @@ static void test_xdp_adjust_tail(void)
> return;
> }
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, 1, _v4, sizeof(pkt_v4),
> buf, , , );
>
> @@ -189,6 +192,7 @@ static void test_xdp_adjust_tail(void)
>   "ipv4", "err %d errno %d retval %d size %d\n",
>   err, errno, retval, size);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, 1, _v6, sizeof(pkt_v6),
> buf, , , );
> CHECK(err || retval != XDP_TX || size != 54,
> @@ -252,6 +256,7 @@ static void test_l4lb(const char *file)
> goto out;
> bpf_map_update_elem(map_fd, _num, _def, 0);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, NUM_ITER, _v4, sizeof(pkt_v4),
> buf, , , );
> CHECK(err || retval != 7/*TC_ACT_REDIRECT*/ || size != 54 ||
> @@ -259,6 +264,7 @@ static void test_l4lb(const char *file)
>   "err %d errno %d retval %d size %d magic %x\n",
>   err, errno, retval, size, *magic);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, NUM_ITER, _v6, sizeof(pkt_v6),
> buf, , , );
> CHECK(err || retval != 7/*TC_ACT_REDIRECT*/ || size != 74 ||
> @@ -341,6 +347,7 @@ static void test_xdp_noinline(void)
> goto out;
> bpf_map_update_elem(map_fd, _num, _def, 0);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, NUM_ITER, _v4, sizeof(pkt_v4),
> buf, , , );
> CHECK(err || retval != 1 || size != 54 ||
> @@ -348,6 +355,7 @@ static void test_xdp_noinline(void)
>   "err %d errno %d retval %d size %d magic %x\n",
>   err, errno, retval, size, *magic);
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, NUM_ITER, _v6, sizeof(pkt_v6),
> buf, , , );
> CHECK(err || retval != 1 || size != 74 ||
> @@ -1795,6 +1803,7 @@ static void test_queue_stack_map(int type)
> pkt_v4.iph.saddr = vals[MAP_SIZE - 1 - i] * 5;
> }
>
> +   size = sizeof(buf);
> err = bpf_prog_test_run(prog_fd, 1, _v4, sizeof(pkt_v4),
> buf, , , );
> if (err || retval || size != sizeof(pkt_v4) ||
> @@ -1808,6 +1817,7 @@ static void test_queue_stack_map(int type)
>   err, errno, retval, size, iph->daddr);
>
> /* Queue is empty, program should return 

Re: [PATCH 1/3] bpf: respect size hint to BPF_PROG_TEST_RUN if present

2018-11-17 Thread Y Song
On Fri, Nov 16, 2018 at 12:54 PM Lorenz Bauer  wrote:
>
> Use data_size_out as a size hint when copying test output to user space.
> A program using BPF_PERF_OUTPUT can compare its own buffer length with
> data_size_out after the syscall to detect whether truncation has taken
> place. Callers which so far did not set data_size_in are not affected.
>
> Signed-off-by: Lorenz Bauer 
> ---
>  net/bpf/test_run.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index c89c22c49015..30c57b7f4ba4 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -74,8 +74,15 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>  {
> void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
> int err = -EFAULT;
> +   u32 copy_size = size;
>
> -   if (data_out && copy_to_user(data_out, data, size))
> +   /* Clamp copy if the user has provided a size hint, but copy the full
> +* buffer if not to retain old behaviour.
> +*/
> +   if (kattr->test.data_size_out && copy_size > 
> kattr->test.data_size_out)
> +   copy_size = kattr->test.data_size_out;
> +
> +   if (data_out && copy_to_user(data_out, data, copy_size))
> goto out;
> if (copy_to_user(>test.data_size_out, , sizeof(size)))
> goto out;

if copy_size < size, maybe we should return -ENOSPC so user space is aware
of insufficient size and takes proper action? This behavior will then
be consistent
with BPF_PROG_QUERY subcommand where prog_cnt is the in/out parameter and
-ENOSPC is returned if kernel does not have enough space to copy out
the whole data.

Also, since data_size_out field now has more complex semantics, could you add
some comments in uapi/linux/bpf.h so it will be relatively clear from
uapi header
how this field will be used?

> --
> 2.17.1
>


Re: [PATCH bpf-next 2/2] samples: bpf: get ifindex from ifname

2018-10-18 Thread Y Song
On Thu, Oct 18, 2018 at 1:48 PM Matteo Croce  wrote:
>
> Find the ifindex via ioctl(SIOCGIFINDEX) instead of requiring the
> numeric ifindex.

Maybe use if_nametoindex which is simpler?

>
> Signed-off-by: Matteo Croce 
> ---
>  samples/bpf/xdp1_user.c | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
> index 4f3d824fc044..a1d0c5dcee9c 100644
> --- a/samples/bpf/xdp1_user.c
> +++ b/samples/bpf/xdp1_user.c
> @@ -15,6 +15,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>
>  #include "bpf_util.h"
>  #include "bpf/bpf.h"
> @@ -59,7 +62,7 @@ static void poll_stats(int map_fd, int interval)
>  static void usage(const char *prog)
>  {
> fprintf(stderr,
> -   "usage: %s [OPTS] IFINDEX\n\n"
> +   "usage: %s [OPTS] IFACE\n\n"
> "OPTS:\n"
> "-Suse skb-mode\n"
> "-Nenforce native mode\n",
> @@ -74,9 +77,11 @@ int main(int argc, char **argv)
> };
> const char *optstr = "SN";
> int prog_fd, map_fd, opt;
> +   struct ifreq ifr = { 0 };
> struct bpf_object *obj;
> struct bpf_map *map;
> char filename[256];
> +   int sock;
>
> while ((opt = getopt(argc, argv, optstr)) != -1) {
> switch (opt) {
> @@ -102,7 +107,24 @@ int main(int argc, char **argv)
> return 1;
> }
>
> -   ifindex = strtoul(argv[optind], NULL, 0);
> +   sock = socket(AF_UNIX, SOCK_DGRAM, 0);
> +   if (sock == -1) {
> +   perror("socket");
> +   return 1;
> +   }
> +
> +   if (strlen(argv[optind]) >= IFNAMSIZ) {
> +   printf("invalid ifname '%s'\n", argv[optind]);
> +   return 1;
> +   }
> +
> +   strcpy(ifr.ifr_name, argv[optind]);
> +   if (ioctl(sock, SIOCGIFINDEX, ) < 0) {
> +   perror("SIOCGIFINDEX");
> +   return 1;
> +   }
> +   close(sock);
> +   ifindex = ifr.ifr_ifindex;
>
> snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> prog_load_attr.file = filename;
> --
> 2.19.1
>


Re: [PATCH bpf-next 1/2] samples: bpf: improve xdp1 example

2018-10-18 Thread Y Song
On Thu, Oct 18, 2018 at 1:48 PM Matteo Croce  wrote:
>
> Store only the total packet count for every protocol, instead of the
> whole per-cpu array.
> Use bpf_map_get_next_key() to iterate the map, instead of looking up
> all the protocols.
>
> Signed-off-by: Matteo Croce 

Acked-by: Yonghong Song 


Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id

2018-09-19 Thread Y Song
On Wed, Sep 19, 2018 at 12:15 AM Magnus Karlsson
 wrote:
>
> On Tue, Sep 18, 2018 at 7:23 PM Y Song  wrote:
> >
> > On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson
> >  wrote:
> > >
> > > Previously, the xsk code did not record which umem was bound to a
> > > specific queue id. This was not required if all drivers were zero-copy
> > > enabled as this had to be recorded in the driver anyway. So if a user
> > > tried to bind two umems to the same queue, the driver would say
> > > no. But if copy-mode was first enabled and then zero-copy mode (or the
> > > reverse order), we mistakenly enabled both of them on the same umem
> > > leading to buggy behavior. The main culprit for this is that we did
> > > not store the association of umem to queue id in the copy case and
> > > only relied on the driver reporting this. As this relation was not
> > > stored in the driver for copy mode (it does not rely on the AF_XDP
> > > NDOs), this obviously could not work.
> > >
> > > This patch fixes the problem by always recording the umem to queue id
> > > relationship in the netdev_queue and netdev_rx_queue structs. This way
> > > we always know what kind of umem has been bound to a queue id and can
> > > act appropriately at bind time.
> > >
> > > Signed-off-by: Magnus Karlsson 
> > > ---
> > >  net/xdp/xdp_umem.c | 87 
> > > +++---
> > >  net/xdp/xdp_umem.h |  2 +-
> > >  2 files changed, 71 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > > index b3b632c..12300b5 100644
> > > --- a/net/xdp/xdp_umem.c
> > > +++ b/net/xdp/xdp_umem.c
> > > @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct 
> > > xdp_sock *xs)
> > > }
> > >  }
> > >
> > > +/* The umem is stored both in the _rx struct and the _tx struct as we do
> > > + * not know if the device has more tx queues than rx, or the opposite.
> > > + * This might also change during run time.
> > > + */
> > > +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem 
> > > *umem,
> > > +   u16 queue_id)
> > > +{
> > > +   if (queue_id < dev->real_num_rx_queues)
> > > +   dev->_rx[queue_id].umem = umem;
> > > +   if (queue_id < dev->real_num_tx_queues)
> > > +   dev->_tx[queue_id].umem = umem;
> > > +}
> > > +
> > > +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> > > + u16 queue_id)
> > > +{
> > > +   if (queue_id < dev->real_num_rx_queues)
> > > +   return dev->_rx[queue_id].umem;
> > > +   if (queue_id < dev->real_num_tx_queues)
> > > +   return dev->_tx[queue_id].umem;
> > > +
> > > +   return NULL;
> > > +}
> > > +
> > > +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> > > +{
> > > +   /* Zero out the entry independent on how many queues are 
> > > configured
> > > +* at this point in time, as it might be used in the future.
> > > +*/
> > > +   if (queue_id < dev->num_rx_queues)
> > > +   dev->_rx[queue_id].umem = NULL;
> > > +   if (queue_id < dev->num_tx_queues)
> > > +   dev->_tx[queue_id].umem = NULL;
> > > +}
> > > +
> >
> > I am sure whether the following scenario can happen or not.
> > Could you clarify?
> >1. suppose initially we have num_rx_queues = num_tx_queues = 10
> >xdp_reg_umem_at_qid() set umem1 to queue_id = 8
> >2. num_tx_queues is changed to 5
>
> You probably mean real_num_tx_queues here. This is the current number
> of queues configured via e.g. ethtool. num_tx_queues will not change
> unless you change device (or device driver).

I am actually not really familiar with this area of
code. Your explanation definitely helps my understanding.

>
> >3. xdp_clear_umem_at_qid() is called for queue_id = 8,
> >and dev->_rx[8].umum = 0.
>
> At this point both _rx[8].umem and _tx[8].umem are set to NULL as the
> test is against num_[rx|tx]_queues which is the max allowed for the
> device, not the current allocated one which is real_num_tx_queues.
> With this in mind, the scenario 

Re: [PATCH bpf-next] flow_dissector: fix build failure without CONFIG_NET

2018-09-18 Thread Y Song
On Tue, Sep 18, 2018 at 1:20 PM Willem de Bruijn
 wrote:
>
> From: Willem de Bruijn 
>
> If boolean CONFIG_BPF_SYSCALL is enabled, kernel/bpf/syscall.c will
> call flow_dissector functions from net/core/flow_dissector.c.
>
> This causes this build failure if CONFIG_NET is disabled:
>
> kernel/bpf/syscall.o: In function `__x64_sys_bpf':
> syscall.c:(.text+0x3278): undefined reference to
> `skb_flow_dissector_bpf_prog_attach'
> syscall.c:(.text+0x3310): undefined reference to
> `skb_flow_dissector_bpf_prog_detach'
> kernel/bpf/syscall.o:(.rodata+0x3f0): undefined reference to
> `flow_dissector_prog_ops'
> kernel/bpf/verifier.o:(.rodata+0x250): undefined reference to
> `flow_dissector_verifier_ops'
>
> Analogous to other optional BPF program types in syscall.c, add stubs
> if the relevant functions are not compiled and move the BPF_PROG_TYPE
> definition in the #ifdef CONFIG_NET block.
>
> Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> Reported-by: Randy Dunlap 
> Signed-off-by: Willem de Bruijn 

Acked-by: Yonghong Song 


Re: [PATCH bpf] tools: bpf: fix license for a compat header file

2018-09-18 Thread Y Song
On Tue, Sep 18, 2018 at 10:15 AM Jakub Kicinski
 wrote:
>
> libc_compat.h is used by libbpf so make sure it's licensed under
> LGPL or BSD license.  The license change should be OK, I'm the only
> author of the file.
>
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 

Acked-by: Yonghong Song 


Re: [PATCH bpf-next 2/2] xsk: fix bug when trying to use both copy and zero-copy on one queue id

2018-09-18 Thread Y Song
On Tue, Sep 18, 2018 at 3:13 AM Magnus Karlsson
 wrote:
>
> Previously, the xsk code did not record which umem was bound to a
> specific queue id. This was not required if all drivers were zero-copy
> enabled as this had to be recorded in the driver anyway. So if a user
> tried to bind two umems to the same queue, the driver would say
> no. But if copy-mode was first enabled and then zero-copy mode (or the
> reverse order), we mistakenly enabled both of them on the same umem
> leading to buggy behavior. The main culprit for this is that we did
> not store the association of umem to queue id in the copy case and
> only relied on the driver reporting this. As this relation was not
> stored in the driver for copy mode (it does not rely on the AF_XDP
> NDOs), this obviously could not work.
>
> This patch fixes the problem by always recording the umem to queue id
> relationship in the netdev_queue and netdev_rx_queue structs. This way
> we always know what kind of umem has been bound to a queue id and can
> act appropriately at bind time.
>
> Signed-off-by: Magnus Karlsson 
> ---
>  net/xdp/xdp_umem.c | 87 
> +++---
>  net/xdp/xdp_umem.h |  2 +-
>  2 files changed, 71 insertions(+), 18 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index b3b632c..12300b5 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -42,6 +42,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct 
> xdp_sock *xs)
> }
>  }
>
> +/* The umem is stored both in the _rx struct and the _tx struct as we do
> + * not know if the device has more tx queues than rx, or the opposite.
> + * This might also change during run time.
> + */
> +static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem 
> *umem,
> +   u16 queue_id)
> +{
> +   if (queue_id < dev->real_num_rx_queues)
> +   dev->_rx[queue_id].umem = umem;
> +   if (queue_id < dev->real_num_tx_queues)
> +   dev->_tx[queue_id].umem = umem;
> +}
> +
> +static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
> + u16 queue_id)
> +{
> +   if (queue_id < dev->real_num_rx_queues)
> +   return dev->_rx[queue_id].umem;
> +   if (queue_id < dev->real_num_tx_queues)
> +   return dev->_tx[queue_id].umem;
> +
> +   return NULL;
> +}
> +
> +static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
> +{
> +   /* Zero out the entry independent on how many queues are configured
> +* at this point in time, as it might be used in the future.
> +*/
> +   if (queue_id < dev->num_rx_queues)
> +   dev->_rx[queue_id].umem = NULL;
> +   if (queue_id < dev->num_tx_queues)
> +   dev->_tx[queue_id].umem = NULL;
> +}
> +

I am sure whether the following scenario can happen or not.
Could you clarify?
   1. suppose initially we have num_rx_queues = num_tx_queues = 10
   xdp_reg_umem_at_qid() set umem1 to queue_id = 8
   2. num_tx_queues is changed to 5
   3. xdp_clear_umem_at_qid() is called for queue_id = 8,
   and dev->_rx[8].umum = 0.
   4. xdp_reg_umem_at_qid() is called gain to set for queue_id = 8
   dev->_rx[8].umem = umem2
   5. num_tx_queues is changed to 10
  Now dev->_rx[8].umem != dev->_tx[8].umem, is this possible and is it
a problem?

>  int xdp_umem_query(struct net_device *dev, u16 queue_id)
>  {
> struct netdev_bpf bpf;
> @@ -58,11 +93,11 @@ int xdp_umem_query(struct net_device *dev, u16 queue_id)
>  }
>
>  int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
> -   u32 queue_id, u16 flags)
> +   u16 queue_id, u16 flags)
>  {
> bool force_zc, force_copy;
> struct netdev_bpf bpf;
> -   int err;
> +   int err = 0;
>
> force_zc = flags & XDP_ZEROCOPY;
> force_copy = flags & XDP_COPY;
> @@ -70,18 +105,28 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct 
> net_device *dev,
> if (force_zc && force_copy)
> return -EINVAL;
>
> +   rtnl_lock();
> +   if (xdp_get_umem_from_qid(dev, queue_id)) {
> +   err = -EBUSY;
> +   goto rtnl_unlock;
> +   }
> +
> +   xdp_reg_umem_at_qid(dev, umem, queue_id);
> +   umem->dev = dev;
> +   umem->queue_id = queue_id;
> if (force_copy)
> -   return 0;
> +   /* For copy-mode, we are done. */
> +   goto rtnl_unlock;
>
> -   if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
> -   return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
> +   if (!dev->netdev_ops->ndo_bpf ||
> +   !dev->netdev_ops->ndo_xsk_async_xmit) {
> +   err = -EOPNOTSUPP;
> +   goto err_unreg_umem;
> +   }
>
> -   rtnl_lock();
> err = xdp_umem_query(dev, queue_id);
> 

Re: [bpf PATCH v3 3/3] bpf: test_maps, only support ESTABLISHED socks

2018-09-17 Thread Y Song
On Mon, Sep 17, 2018 at 9:38 PM John Fastabend  wrote:
>
> Ensure that sockets added to a sock{map|hash} that is not in the
> ESTABLISHED state is rejected.
>
> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
> Signed-off-by: John Fastabend 

Acked-by: Yonghong Song 


Re: [bpf PATCH v3 2/3] bpf: sockmap, fix transition through disconnect without close

2018-09-17 Thread Y Song
On Mon, Sep 17, 2018 at 9:39 PM John Fastabend  wrote:
>
> It is possible (via shutdown()) for TCP socks to go trough TCP_CLOSE
> state via tcp_disconnect() without actually calling tcp_close which
> would then call our bpf_tcp_close() callback. Because of this a user
> could disconnect a socket then put it in a LISTEN state which would
> break our assumptions about sockets always being ESTABLISHED state.
>
> To resolve this rely on the unhash hook, which is called in the
> disconnect case, to remove the sock from the sockmap.
>
> Reported-by: Eric Dumazet 
> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
> Signed-off-by: John Fastabend 
> ---
>  0 files changed
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 1f97b55..5680d65 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -132,6 +132,7 @@ struct smap_psock {
> struct work_struct gc_work;
>
> struct proto *sk_proto;
> +   void (*save_unhash)(struct sock *sk);
> void (*save_close)(struct sock *sk, long timeout);
> void (*save_data_ready)(struct sock *sk);
> void (*save_write_space)(struct sock *sk);
> @@ -143,6 +144,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr 
> *msg, size_t len,
>  static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
>  static int bpf_tcp_sendpage(struct sock *sk, struct page *page,
> int offset, size_t size, int flags);
> +static void bpf_tcp_unhash(struct sock *sk);
>  static void bpf_tcp_close(struct sock *sk, long timeout);
>
>  static inline struct smap_psock *smap_psock_sk(const struct sock *sk)
> @@ -184,6 +186,7 @@ static void build_protos(struct proto 
> prot[SOCKMAP_NUM_CONFIGS],
>  struct proto *base)
>  {
> prot[SOCKMAP_BASE]  = *base;
> +   prot[SOCKMAP_BASE].unhash   = bpf_tcp_unhash;
> prot[SOCKMAP_BASE].close= bpf_tcp_close;
> prot[SOCKMAP_BASE].recvmsg  = bpf_tcp_recvmsg;
> prot[SOCKMAP_BASE].stream_memory_read   = bpf_tcp_stream_read;
> @@ -217,6 +220,7 @@ static int bpf_tcp_init(struct sock *sk)
> return -EBUSY;
> }
>
> +   psock->save_unhash = sk->sk_prot->unhash;
> psock->save_close = sk->sk_prot->close;
> psock->sk_proto = sk->sk_prot;
>
> @@ -305,30 +309,12 @@ static struct smap_psock_map_entry 
> *psock_map_pop(struct sock *sk,
> return e;
>  }
>
> -static void bpf_tcp_close(struct sock *sk, long timeout)
> +static void bpf_tcp_remove(struct sock *sk, struct smap_psock *psock)
>  {
> -   void (*close_fun)(struct sock *sk, long timeout);
> struct smap_psock_map_entry *e;
> struct sk_msg_buff *md, *mtmp;
> -   struct smap_psock *psock;
> struct sock *osk;
>
> -   lock_sock(sk);
> -   rcu_read_lock();
> -   psock = smap_psock_sk(sk);
> -   if (unlikely(!psock)) {
> -   rcu_read_unlock();
> -   release_sock(sk);
> -   return sk->sk_prot->close(sk, timeout);
> -   }
> -
> -   /* The psock may be destroyed anytime after exiting the RCU critial
> -* section so by the time we use close_fun the psock may no longer
> -* be valid. However, bpf_tcp_close is called with the sock lock
> -* held so the close hook and sk are still valid.
> -*/
> -   close_fun = psock->save_close;
> -
> if (psock->cork) {
> free_start_sg(psock->sock, psock->cork, true);
> kfree(psock->cork);
> @@ -379,6 +365,42 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
> kfree(e);
> e = psock_map_pop(sk, psock);
> }
> +}
> +
> +static void bpf_tcp_unhash(struct sock *sk)
> +{
> +   void (*unhash_fun)(struct sock *sk);
> +   struct smap_psock *psock;
> +
> +   rcu_read_lock();
> +   psock = smap_psock_sk(sk);
> +   if (unlikely(!psock)) {
> +   rcu_read_unlock();
> +   if (sk->sk_prot->unhash)
> +   return sk->sk_prot->unhash(sk);
> +   return;
Nit: the above code can be
if (sk->sk_prot->unhash)
   sk->sk_prot->unhash(sk);
return;
Other than this,
Acked-by: Yonghong Song 

> +   }
> +   unhash_fun = psock->save_unhash;
> +   bpf_tcp_remove(sk, psock);
> +   rcu_read_unlock();
> +   unhash_fun(sk);
> +}
> +
> +static void bpf_tcp_close(struct sock *sk, long timeout)
> +{
> +   void (*close_fun)(struct sock *sk, long timeout);
> +   struct smap_psock *psock;
> +
> +   lock_sock(sk);
> +   rcu_read_lock();
> +   psock = smap_psock_sk(sk);
> +   if (unlikely(!psock)) {
> +   rcu_read_unlock();
> +   release_sock(sk);
> +   return sk->sk_prot->close(sk, timeout);
> +   }
> +   close_fun = psock->save_close;
> +   

Re: [bpf PATCH 3/3] bpf: test_maps, only support ESTABLISHED socks

2018-09-17 Thread Y Song
On Mon, Sep 17, 2018 at 10:33 AM John Fastabend
 wrote:
>
> Ensure that sockets added to a sock{map|hash} that is not in the
> ESTABLISHED state is rejected.
>
> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
> Signed-off-by: John Fastabend 
> ---
>  tools/testing/selftests/bpf/test_maps.c |   10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c 
> b/tools/testing/selftests/bpf/test_maps.c
> index 6f54f84..0f2090f 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -580,7 +580,11 @@ static void test_sockmap(int tasks, void *data)
> /* Test update without programs */
> for (i = 0; i < 6; i++) {
> err = bpf_map_update_elem(fd, , [i], BPF_ANY);
> -   if (err) {
> +   if (i < 2 && !err) {
> +   printf("Allowed update sockmap '%i:%i' not in 
> ESTABLISHED\n",
> +  i, sfd[i]);
> +   goto out_sockmap;
> +   } else if (i > 1 && err) {

Just a nit. Maybe "i >= 2" since it will be more clear since it is
opposite of "i < 2"?

> printf("Failed noprog update sockmap '%i:%i'\n",
>i, sfd[i]);
> goto out_sockmap;
> @@ -741,7 +745,7 @@ static void test_sockmap(int tasks, void *data)
> }
>
> /* Test map update elem afterwards fd lives in fd and map_fd */
> -   for (i = 0; i < 6; i++) {
> +   for (i = 2; i < 6; i++) {
> err = bpf_map_update_elem(map_fd_rx, , [i], BPF_ANY);
> if (err) {
> printf("Failed map_fd_rx update sockmap %i '%i:%i'\n",
> @@ -845,7 +849,7 @@ static void test_sockmap(int tasks, void *data)
> }
>
> /* Delete the elems without programs */
> -   for (i = 0; i < 6; i++) {
> +   for (i = 2; i < 6; i++) {
> err = bpf_map_delete_elem(fd, );
> if (err) {
> printf("Failed delete sockmap %i '%i:%i'\n",
>


Re: [bpf PATCH 2/3] bpf: sockmap, fix transition through disconnect without close

2018-09-17 Thread Y Song
On Mon, Sep 17, 2018 at 10:32 AM John Fastabend
 wrote:
>
> It is possible (via shutdown()) for TCP socks to go trough TCP_CLOSE
> state via tcp_disconnect() without actually calling tcp_close which
> would then call our bpf_tcp_close() callback. Because of this a user
> could disconnect a socket then put it in a LISTEN state which would
> break our assumptions about sockets always being ESTABLISHED state.
>
> To resolve this rely on the unhash hook, which is called in the
> disconnect case, to remove the sock from the sockmap.
>
> Reported-by: Eric Dumazet 
> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
> Signed-off-by: John Fastabend 
> ---
>  kernel/bpf/sockmap.c |   71 
> +-
>  1 file changed, 52 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 998b7bd..f6ab7f3 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -132,6 +132,7 @@ struct smap_psock {
> struct work_struct gc_work;
>
> struct proto *sk_proto;
> +   void (*save_unhash)(struct sock *sk);
> void (*save_close)(struct sock *sk, long timeout);
> void (*save_data_ready)(struct sock *sk);
> void (*save_write_space)(struct sock *sk);
> @@ -143,6 +144,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr 
> *msg, size_t len,
>  static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
>  static int bpf_tcp_sendpage(struct sock *sk, struct page *page,
> int offset, size_t size, int flags);
> +static void bpf_tcp_unhash(struct sock *sk);
>  static void bpf_tcp_close(struct sock *sk, long timeout);
>
>  static inline struct smap_psock *smap_psock_sk(const struct sock *sk)
> @@ -184,6 +186,7 @@ static void build_protos(struct proto 
> prot[SOCKMAP_NUM_CONFIGS],
>  struct proto *base)
>  {
> prot[SOCKMAP_BASE]  = *base;
> +   prot[SOCKMAP_BASE].unhash   = bpf_tcp_unhash;
> prot[SOCKMAP_BASE].close= bpf_tcp_close;
> prot[SOCKMAP_BASE].recvmsg  = bpf_tcp_recvmsg;
> prot[SOCKMAP_BASE].stream_memory_read   = bpf_tcp_stream_read;
> @@ -217,6 +220,7 @@ static int bpf_tcp_init(struct sock *sk)
> return -EBUSY;
> }
>
> +   psock->save_unhash = sk->sk_prot->unhash;
> psock->save_close = sk->sk_prot->close;
> psock->sk_proto = sk->sk_prot;
>
> @@ -305,30 +309,12 @@ static struct smap_psock_map_entry 
> *psock_map_pop(struct sock *sk,
> return e;
>  }
>
> -static void bpf_tcp_close(struct sock *sk, long timeout)
> +static void bpf_tcp_remove(struct sock *sk, struct smap_psock *psock)
>  {
> -   void (*close_fun)(struct sock *sk, long timeout);
> struct smap_psock_map_entry *e;
> struct sk_msg_buff *md, *mtmp;
> -   struct smap_psock *psock;
> struct sock *osk;
>
> -   lock_sock(sk);
> -   rcu_read_lock();
> -   psock = smap_psock_sk(sk);
> -   if (unlikely(!psock)) {
> -   rcu_read_unlock();
> -   release_sock(sk);
> -   return sk->sk_prot->close(sk, timeout);
> -   }
> -
> -   /* The psock may be destroyed anytime after exiting the RCU critial
> -* section so by the time we use close_fun the psock may no longer
> -* be valid. However, bpf_tcp_close is called with the sock lock
> -* held so the close hook and sk are still valid.
> -*/
> -   close_fun = psock->save_close;
> -
> if (psock->cork) {
> free_start_sg(psock->sock, psock->cork, true);
> kfree(psock->cork);
> @@ -379,6 +365,53 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
> kfree(e);
> e = psock_map_pop(sk, psock);
> }
> +}
> +
> +static void bpf_tcp_unhash(struct sock *sk)
> +{
> +   void (*unhash_fun)(struct sock *sk);
> +   struct smap_psock *psock;
> +
> +   rcu_read_lock();
> +   psock = smap_psock_sk(sk);
> +   if (unlikely(!psock)) {
> +   rcu_read_unlock();
> +   release_sock(sk);

Can socket be released here?

> +   return sk->sk_prot->unhash(sk);
> +   }
> +
> +   /* The psock may be destroyed anytime after exiting the RCU critial
> +* section so by the time we use close_fun the psock may no longer
> +* be valid. However, bpf_tcp_close is called with the sock lock
> +* held so the close hook and sk are still valid.
> +*/

the comments above are not correct. A copy-paste mistake?

> +   unhash_fun = psock->save_unhash;
> +   bpf_tcp_remove(sk, psock);
> +   rcu_read_unlock();
> +   unhash_fun(sk);
> +}
> +
> +static void bpf_tcp_close(struct sock *sk, long timeout)
> +{
> +   void (*close_fun)(struct sock *sk, long timeout);
> +   struct smap_psock 

Re: [bpf PATCH 1/3] bpf: sockmap only allow ESTABLISHED sock state

2018-09-17 Thread Y Song
On Mon, Sep 17, 2018 at 10:32 AM John Fastabend
 wrote:
>
> After this patch we only allow socks that are in ESTABLISHED state or
> are being added via a sock_ops event that is transitioning into an
> ESTABLISHED state. By allowing sock_ops events we allow users to
> manage sockmaps directly from sock ops programs. The two supported
> sock_ops ops are BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB and
> BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB.
>
> Similar to TLS ULP this ensures sk_user_data is correct.
>
> Reported-by: Eric Dumazet 
> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
> Signed-off-by: John Fastabend 

Acked-by: Yonghong Song 


Re: [bpf-next, v4 0/5] Introduce eBPF flow dissector

2018-09-14 Thread Y Song
On Fri, Sep 14, 2018 at 12:24 PM Alexei Starovoitov
 wrote:
>
> On Fri, Sep 14, 2018 at 07:46:17AM -0700, Petar Penkov wrote:
> > From: Petar Penkov 
> >
> > This patch series hardens the RX stack by allowing flow dissection in BPF,
> > as previously discussed [1]. Because of the rigorous checks of the BPF
> > verifier, this provides significant security guarantees. In particular, the
> > BPF flow dissector cannot get inside of an infinite loop, as with
> > CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
> > read outside of packet bounds, because all memory accesses are checked.
> > Also, with BPF the administrator can decide which protocols to support,
> > reducing potential attack surface. Rarely encountered protocols can be
> > excluded from dissection and the program can be updated without kernel
> > recompile or reboot if a bug is discovered.
> >
> > Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
> > This includes a new BPF program and attach type.
> >
> > Patch 2 adds the new BPF flow dissector definitions to tools/uapi.
> >
> > Patch 3 adds support for the new BPF program type to libbpf and bpftool.
> >
> > Patch 4 adds a flow dissector program in BPF. This parses most protocols in
> > __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
> > and address types).
> >
> > Patch 5 adds a selftest that attaches the BPF program to the flow dissector
> > and sends traffic with different levels of encapsulation.
> >
> > Performance Evaluation:
> > The in-kernel implementation was compared against the demo program from
> > patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
> >   $perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
> >   -t 10
>
> Looks great. Applied to bpf-next with one extra patch:
>  SEC("dissect")
> -int dissect(struct __sk_buff *skb)
> +int _dissect(struct __sk_buff *skb)
>
> otherwise the test doesn't build.
> I'm not sure how it builds for you. Which llvm did you use?

This is a known issue. IIRC, llvm <= 4 should be okay and llvm >= 5 would fail.

>
> Also above command works and ipv4 test in ./test_flow_dissector.sh
> is passing as well, but it still fails at the end for me:
> ./test_flow_dissector.sh
> bpffs not mounted. Mounting...
> 0: IP
> 1: IPV6
> 2: IPV6OP
> 3: IPV6FR
> 4: MPLS
> 5: VLAN
> Testing IPv4...
> inner.dest4: 127.0.0.1
> inner.source4: 127.0.0.3
> pkts: tx=10 rx=10
> inner.dest4: 127.0.0.1
> inner.source4: 127.0.0.3
> pkts: tx=10 rx=0
> inner.dest4: 127.0.0.1
> inner.source4: 127.0.0.3
> pkts: tx=10 rx=10
> Testing IPIP...
> tunnels before test:
> tunl0: any/ip remote any local any ttl inherit nopmtudisc
> sit_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
> ipip_test_LV5N: any/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
> sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
> gre_test_LV5N: gre/ip remote 127.0.0.2 local 127.0.0.1 dev lo ttl inherit
> gre0: gre/ip remote any local any ttl inherit nopmtudisc
> inner.dest4: 192.168.0.1
> inner.source4: 1.1.1.1
> encap proto:   4
> outer.dest4: 127.0.0.1
> outer.source4: 127.0.0.2
> pkts: tx=10 rx=0
> tunnels after test:
> tunl0: any/ip remote any local any ttl inherit nopmtudisc
> sit0: ipv6/ip remote any local any ttl 64 nopmtudisc
> gre0: gre/ip remote any local any ttl inherit nopmtudisc
> selftests: test_flow_dissector [FAILED]
>
> is it something in my setup or test is broken?
>


Re: [bpf PATCH] samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM

2018-08-15 Thread Y Song
On Wed, Aug 15, 2018 at 7:57 AM, Jesper Dangaard Brouer
 wrote:
> It is common XDP practice to unload/deattach the XDP bpf program,
> when the XDP sample program is Ctrl-C interrupted (SIGINT) or
> killed (SIGTERM).
>
> The samples/bpf programs xdp_redirect_cpu and xdp_rxq_info,
> forgot to trap signal SIGTERM (which is the default signal used
> by the kill command).
>
> This was discovered by Red Hat QA, which automated scripts depend
> on killing the XDP sample program after a timeout period.
>
> Fixes: fad3917e361b ("samples/bpf: add cpumap sample program 
> xdp_redirect_cpu")
> Fixes: 0fca931a6f21 ("samples/bpf: program demonstrating access to 
> xdp_rxq_info")
> Reported-by: Jean-Tsung Hsiao 
> Signed-off-by: Jesper Dangaard Brouer 

Acked-by: Yonghong Song 

> ---
>  samples/bpf/xdp_redirect_cpu_user.c |3 ++-
>  samples/bpf/xdp_rxq_info_user.c |3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)


Re: [PATCH bpf-next] bpf: decouple btf from seq bpf fs dump and enable more maps

2018-08-12 Thread Y Song
On Sat, Aug 11, 2018 at 4:59 PM, Daniel Borkmann  wrote:
> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
> the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty
> print for hash/lru_hash maps") enabled support for BTF and
> dumping via BPF fs for array and hash/lru map. However, both
> can be decoupled from each other such that regular BPF maps
> can be supported for attaching BTF key/value information,
> while not all maps necessarily need to dump via map_seq_show_elem()
> callback.
>
> The basic sanity check which is a prerequisite for all maps
> is that key/value size has to match in any case, and some maps
> can have extra checks via map_check_btf() callback, e.g.
> probing certain types or indicating no support in general. With
> that we can also enable retrieving BTF info for per-cpu map
> types and lpm.
>
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Thanks for the fix. Looks good to me.
For bpftool, BTF pretty print support is missing
for per-cpu maps. bpffs print for per-cpu hash/array maps
need to be added as well. Will add them later.

Acked-by: Yonghong Song 


Re: [PATCH bpf] selftests/bpf: update test_lwt_seg6local.sh according to iproute2

2018-08-01 Thread Y Song
On Wed, Aug 1, 2018 at 8:34 AM, Mathieu Xhonneux  wrote:
> The shell file for test_lwt_seg6local contains an early iproute2 syntax
> for installing a seg6local End.BPF route. iproute2 support for this
> feature has recently been upstreamed, but with an additional keyword
> required. This patch updates test_lwt_seg6local.sh to the definitive
> iproute2 syntax
>
> Signed-off-by: Mathieu Xhonneux 

Acked-by: Yonghong Song 


Re: [PATCH bpf-next] lwt_bpf: remove unnecessary rcu_read_lock in run_lwt_bpf

2018-07-30 Thread Y Song
On Mon, Jul 30, 2018 at 6:22 AM, Taehee Yoo  wrote:
> run_lwt_bpf is called by bpf_{input/output/xmit}.
> These functions are already protected by rcu_read_lock.
> because lwtunnel_{input/output/xmit} holds rcu_read_lock
> and then calls bpf_{input/output/xmit}.
> So that rcu_read_lock in the run_lwt_bpf is unnecessary.
>
> Signed-off-by: Taehee Yoo 

Acked-by: Yonghong Song 


Re: [PATCH bpf-next 0/4] samples: bpf: convert two more samples to libbpf

2018-07-26 Thread Y Song
On Thu, Jul 26, 2018 at 2:32 PM, Jakub Kicinski
 wrote:
> Hi!
>
> This set converts xdpsock_user.c and xdp_fwd_user.c to use libbpf instead
> of bpf_load.o.  First two patches are minor improvements to libbpf to make
> the conversion (and use of libbpf in general) nicer.
>
> Jakub Kicinski (4):
>   tools: libbpf: handle NULL program gracefully in bpf_program__nth_fd()
>   tools: libbpf: add bpf_object__find_program_by_title()
>   samples: bpf: convert xdp_fwd_user.c to libbpf
>   samples: bpf: convert xdpsock_user.c to libbpf

LGTM. Ack for the whole series.
Acked-by: Yonghong Song 

>
>  samples/bpf/Makefile   |  4 ++--
>  samples/bpf/xdp_fwd_user.c | 34 +++---
>  samples/bpf/xdpsock_user.c | 38 +-
>  tools/lib/bpf/libbpf.c | 15 +++
>  tools/lib/bpf/libbpf.h |  3 +++
>  5 files changed, 72 insertions(+), 22 deletions(-)
>
> --
> 2.17.1
>


Re: [PATCH] bpf: verifier: BPF_MOV don't mark dst reg if src == dst

2018-07-26 Thread Y Song
On Thu, Jul 26, 2018 at 9:52 AM, Arthur Fabre  wrote:
> Oops, gmail seems to have mangled everything. Will resend using git
> send-email.
>
> I've added the test cases for mov64, but I'm not sure of the expected mov32
> behavior.

The interpreter has below:

ALU_MOV_X:
DST = (u32) SRC;
CONT;
...
ALU64_MOV_X:
DST = SRC;
CONT;

The later verifier code does seem to mark dst_reg properly for both
ALU and ALU64.

> Currently coerce_reg_to_size() is called after mark_reg_unknown(),
> which sets the bounds to 64bits. coerce_reg_to_size() resets the bounds
> again,
> as they're too "wide" to fit the new size. It sets SMIN = UMIN = 0,
> which seems weird. Shouldn't SMIN be 1 << (size * 8 - 1)? Same applies for
> SMAX.

The SMIN/UMIN still should be 0 since there is no negative here due to
smaller width?

> Should mov32 always mark the dst as unbounded?

We can do better than unbounded for dst register of mov32, which is
the code already
doing?

>
>
> On Thu, Jul 26, 2018 at 1:42 AM, Daniel Borkmann 
> wrote:
>>
>> On 07/26/2018 12:08 AM, Arthur Fabre wrote:
>> > When check_alu_op() handles a BPF_MOV between two registers,
>> > it calls check_reg_arg() on the dst register, marking it as unbounded.
>> > If the src and dst register are the same, this marks the src as
>> > unbounded, which can lead to unexpected errors for further checks that
>> > rely on bounds info.

Could you explain (and add to the commit messages eventually) what
are these unexpected errors?

>> >
>> > check_alu_op() now only marks the dst register as unbounded if it
>> > different from the src register.
>> >
>> > Signed-off-by: Arthur Fabre 
>> > ---
>> >  kernel/bpf/verifier.c | 5 +++--
>> >  1 file changed, 3 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> > index 63aaac52a265..ddfe3c544a80 100644
>> > --- a/kernel/bpf/verifier.c
>> > +++ b/kernel/bpf/verifier.c
>> > @@ -3238,8 +3238,9 @@ static int check_alu_op(struct bpf_verifier_env
>> > *env, struct bpf_insn *insn)
>> > }
>> > }
>> >
>> > -   /* check dest operand */
>> > -   err = check_reg_arg(env, insn->dst_reg, DST_OP);
>> > +   /* check dest operand, only mark if dest != src */
>> > +   err = check_reg_arg(env, insn->dst_reg,
>> > +   insn->dst_reg == insn->src_reg ?
>> > DST_OP_NO_MARK : DST_OP);
>> > if (err)
>> > return err;
>> >
>>
>> Thanks a lot for the patch! Looks like it's corrupted wrt newline.
>>
>> Please also add test cases to tools/testing/selftests/bpf/test_verifier.c
>> for the cases of mov64 and mov32 where in each src==dst and src!=dst;
>> mov32
>> should mark it as unbounded but not former, so would be good to keep
>> tracking
>> that in selftests.
>
>


Re: [PATCH bpf v2] xdp: Fix handling of devmap in generic XDP

2018-06-13 Thread Y Song
On Wed, Jun 13, 2018 at 8:40 PM, Toshiaki Makita
 wrote:
> On 2018/06/14 11:56, Y Song wrote:
> ...
>>> @@ -586,6 +589,15 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, 
>>> struct xdp_buff *xdp,
>>> return 0;
>>>  }
>>>
>>> +struct sk_buff;
>>> +
>>> +static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst,
>>> +  struct sk_buff *skb,
>>> +  struct bpf_prog *xdp_prog)
>>> +{
>>> +   return 0;
>>
>> should you return an error code here?
>
> My understanding is that this function never be called if
> CONFIG_BPF_SYSCALL is not set so any value is OK.
> I aligned it with other functions of devmap, specifically
> dev_map_enqueue() which returns 0.
>
>>
>>> +}
>>> +
>>>  static inline
>>>  struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 
>>> key)
>>>  {
>>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>>> index 45fc0f5..8ddff1f 100644
>>> --- a/include/linux/filter.h
>>> +++ b/include/linux/filter.h
>>> @@ -19,6 +19,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include 
>>>
>>> @@ -786,6 +787,21 @@ static inline bool bpf_dump_raw_ok(void)
>>>  struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
>>>const struct bpf_insn *patch, u32 
>>> len);
>>>
>>> +static inline int __xdp_generic_ok_fwd_dev(struct sk_buff *skb,
>>> +  struct net_device *fwd)
>>
>> Previously this function is only used in filter.c and now it is only
>> used in devmap.c. Maybe this function should be in devmap.c
>> until it will be used cross different files?
>
> This function is also called from xdp_do_generic_redirect() in
> net/core/filter.c, so I can't move it to devmap.c.

Thanks for the explanation. Make sense.
Acked-by: Yonghong Song 


Re: [PATCH bpf v2] xdp: Fix handling of devmap in generic XDP

2018-06-13 Thread Y Song
On Wed, Jun 13, 2018 at 7:07 PM, Toshiaki Makita
 wrote:
> Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
> the return value type of __devmap_lookup_elem() from struct net_device *
> to struct bpf_dtab_netdev * but forgot to modify generic XDP code
> accordingly.
> Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
> net_device is expected, then skb->dev was set to invalid value.
>
> v2:
> - Fix compiler warning without CONFIG_BPF_SYSCALL.
>
> Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
> Signed-off-by: Toshiaki Makita 
> ---
>  include/linux/bpf.h| 12 
>  include/linux/filter.h | 16 
>  kernel/bpf/devmap.c| 14 ++
>  net/core/filter.c  | 21 -
>  4 files changed, 46 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 995c3b1..7df32a3 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -488,12 +488,15 @@ static inline void bpf_long_memcpy(void *dst, const 
> void *src, u32 size)
>
>  /* Map specifics */
>  struct xdp_buff;
> +struct sk_buff;
>
>  struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
>  void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
>  void __dev_map_flush(struct bpf_map *map);
>  int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> struct net_device *dev_rx);
> +int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff 
> *skb,
> +struct bpf_prog *xdp_prog);
>
>  struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 
> key);
>  void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
> @@ -586,6 +589,15 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct 
> xdp_buff *xdp,
> return 0;
>  }
>
> +struct sk_buff;
> +
> +static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst,
> +  struct sk_buff *skb,
> +  struct bpf_prog *xdp_prog)
> +{
> +   return 0;

should you return an error code here?

> +}
> +
>  static inline
>  struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
>  {
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 45fc0f5..8ddff1f 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>
> @@ -786,6 +787,21 @@ static inline bool bpf_dump_raw_ok(void)
>  struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
>const struct bpf_insn *patch, u32 len);
>
> +static inline int __xdp_generic_ok_fwd_dev(struct sk_buff *skb,
> +  struct net_device *fwd)

Previously this function is only used in filter.c and now it is only
used in devmap.c. Maybe this function should be in devmap.c
until it will be used cross different files?

> +{
> +   unsigned int len;
> +
> +   if (unlikely(!(fwd->flags & IFF_UP)))
> +   return -ENETDOWN;
> +
> +   len = fwd->mtu + fwd->hard_header_len + VLAN_HLEN;
> +   if (skb->len > len)
> +   return -EMSGSIZE;
> +
> +   return 0;
> +}
> +
>  /* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
>   * same cpu context. Further for best results no more than a single map
>   * for the do_redirect/do_flush pair should be used. This limitation is
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index a7cc7b3..642c97f 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -345,6 +345,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct 
> xdp_buff *xdp,
> return bq_enqueue(dst, xdpf, dev_rx);
>  }
>
> +int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff 
> *skb,
> +struct bpf_prog *xdp_prog)
> +{
> +   int err;
> +
> +   err = __xdp_generic_ok_fwd_dev(skb, dst->dev);
> +   if (unlikely(err))
> +   return err;
> +   skb->dev = dst->dev;
> +   generic_xdp_tx(skb, xdp_prog);
> +
> +   return 0;
> +}
> +
>  static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
>  {
> struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3d9ba7e..e7f12e9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3214,20 +3214,6 @@ int xdp_do_redirect(struct net_device *dev, struct 
> xdp_buff *xdp,
>  }
>  EXPORT_SYMBOL_GPL(xdp_do_redirect);
>
> -static int __xdp_generic_ok_fwd_dev(struct sk_buff *skb, struct net_device 
> *fwd)
> -{
> -   unsigned int len;
> -
> -   if (unlikely(!(fwd->flags & IFF_UP)))
> -   return -ENETDOWN;
> -
> -   len = fwd->mtu + fwd->hard_header_len + VLAN_HLEN;
> -   if (skb->len 

Re: [PATCH 1/1] selftest: check tunnel type more accurately

2018-06-13 Thread Y Song
On Wed, Jun 13, 2018 at 5:03 AM, Wang Jian  wrote:
> Grep tunnel type directly to make sure 'ip' command supports it.
>
> Signed-off-by: Jian Wang 

Acked-by: Yonghong Song 


Re: [bpf PATCH] bpf: selftest fix for sockmap

2018-06-11 Thread Y Song
On Mon, Jun 11, 2018 at 11:47 AM, John Fastabend
 wrote:
> In selftest test_maps the sockmap test case attempts to add a socket
> in listening state to the sockmap. This is no longer a valid operation
> so it fails as expected. However, the test wrongly reports this as an
> error now. Fix the test to avoid adding sockets in listening state.
>
> Fixes: 945ae430aa44 ("bpf: sockmap only allow ESTABLISHED sock state")

I cannot reproduce the failure and I cannot find the above "Fixes" commit.
I checked latest bpf/bpf-next/net-next trees. Maybe I missed something?

> Signed-off-by: John Fastabend 
> ---
>  tools/testing/selftests/bpf/test_maps.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_maps.c 
> b/tools/testing/selftests/bpf/test_maps.c
> index 6c25334..9fed5f0 100644
> --- a/tools/testing/selftests/bpf/test_maps.c
> +++ b/tools/testing/selftests/bpf/test_maps.c
> @@ -564,7 +564,7 @@ static void test_sockmap(int tasks, void *data)
> }
>
> /* Test update without programs */
> -   for (i = 0; i < 6; i++) {
> +   for (i = 2; i < 6; i++) {
> err = bpf_map_update_elem(fd, , [i], BPF_ANY);
> if (err) {
> printf("Failed noprog update sockmap '%i:%i'\n",
> @@ -727,7 +727,7 @@ static void test_sockmap(int tasks, void *data)
> }
>
> /* Test map update elem afterwards fd lives in fd and map_fd */
> -   for (i = 0; i < 6; i++) {
> +   for (i = 2; i < 6; i++) {
> err = bpf_map_update_elem(map_fd_rx, , [i], BPF_ANY);
> if (err) {
> printf("Failed map_fd_rx update sockmap %i '%i:%i'\n",
>


Re: [PATCH bpf] xsk: silence warning on memory allocation failure

2018-06-11 Thread Y Song
On Mon, Jun 11, 2018 at 4:57 AM, Björn Töpel  wrote:
> From: Björn Töpel 
>
> syzkaller reported a warning from xdp_umem_pin_pages():
>
>   WARNING: CPU: 1 PID: 4537 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 
> mm/slab_common.c:996
>   ...
>   __do_kmalloc mm/slab.c:3713 [inline]
>   __kmalloc+0x25/0x760 mm/slab.c:3727
>   kmalloc_array include/linux/slab.h:634 [inline]
>   kcalloc include/linux/slab.h:645 [inline]
>   xdp_umem_pin_pages net/xdp/xdp_umem.c:205 [inline]
>   xdp_umem_reg net/xdp/xdp_umem.c:318 [inline]
>   xdp_umem_create+0x5c9/0x10f0 net/xdp/xdp_umem.c:349
>   xsk_setsockopt+0x443/0x550 net/xdp/xsk.c:531
>   __sys_setsockopt+0x1bd/0x390 net/socket.c:1935
>   __do_sys_setsockopt net/socket.c:1946 [inline]
>   __se_sys_setsockopt net/socket.c:1943 [inline]
>   __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1943
>   do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> This is a warning about attempting to allocate more than
> KMALLOC_MAX_SIZE memory. The request originates from userspace, and if
> the request is too big, the kernel is free to deny its allocation. In
> this patch, the failed allocation attempt is silenced with
> __GFP_NOWARN.
>
> Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt")
> Reported-by: syzbot+4abadc5d69117b346...@syzkaller.appspotmail.com
> Signed-off-by: Björn Töpel 

Acked-by: Yonghong Song 


Re: [PATCH bpf] bpf: reject passing modified ctx to helper functions

2018-06-07 Thread Y Song
On Thu, Jun 7, 2018 at 8:40 AM, Daniel Borkmann  wrote:
> As commit 28e33f9d78ee ("bpf: disallow arithmetic operations on
> context pointer") already describes, f1174f77b50c ("bpf/verifier:
> rework value tracking") removed the specific white-listed cases
> we had previously where we would allow for pointer arithmetic in
> order to further generalize it, and allow e.g. context access via
> modified registers. While the dereferencing of modified context
> pointers had been forbidden through 28e33f9d78ee, syzkaller did
> recently manage to trigger several KASAN splats for slab out of
> bounds access and use after frees by simply passing a modified
> context pointer to a helper function which would then do the bad
> access since verifier allowed it in adjust_ptr_min_max_vals().
>
> Rejecting arithmetic on ctx pointer in adjust_ptr_min_max_vals()
> generally could break existing programs as there's a valid use
> case in tracing in combination with passing the ctx to helpers as
> bpf_probe_read(), where the register then becomes unknown at
> verification time due to adding a non-constant offset to it. An
> access sequence may look like the following:
>
>   offset = args->filename;  /* field __data_loc filename */
>   bpf_probe_read(, len, (char *)args + offset); // args is ctx
>
> There are two options: i) we could special case the ctx and as
> soon as we add a constant or bounded offset to it (hence ctx type
> wouldn't change) we could turn the ctx into an unknown scalar, or
> ii) we generalize the sanity test for ctx member access into a
> small helper and assert it on the ctx register that was passed
> as a function argument. Fwiw, latter is more obvious and less
> complex at the same time, and one case that may potentially be
> legitimate in future for ctx member access at least would be for
> ctx to carry a const offset. Therefore, fix follows approach
> from ii) and adds test cases to BPF kselftests.
>
> Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> Reported-by: syzbot+3d0b2441dbb717516...@syzkaller.appspotmail.com
> Reported-by: syzbot+c8504affd4fdd0c1b...@syzkaller.appspotmail.com
> Reported-by: syzbot+e5190cb881d8660fb...@syzkaller.appspotmail.com
> Reported-by: syzbot+efae31b384d5badbd...@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Tested with bcc for the above ctx + unknown_offset usage for bpf_probe_read.
No breakage.

Acked-by: Yonghong Song 

> ---
>  kernel/bpf/verifier.c   | 48 +++-
>  tools/testing/selftests/bpf/test_verifier.c | 58 
> -
>  2 files changed, 88 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d6403b5..cced0c1 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1617,6 +1617,30 @@ static int get_callee_stack_depth(struct 
> bpf_verifier_env *env,
>  }
>  #endif
>
> +static int check_ctx_reg(struct bpf_verifier_env *env,
> +const struct bpf_reg_state *reg, int regno)
> +{
> +   /* Access to ctx or passing it to a helper is only allowed in
> +* its original, unmodified form.
> +*/
> +
> +   if (reg->off) {
> +   verbose(env, "dereference of modified ctx ptr R%d off=%d 
> disallowed\n",
> +   regno, reg->off);
> +   return -EACCES;
> +   }
> +
> +   if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
> +   char tn_buf[48];
> +
> +   tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> +   verbose(env, "variable ctx access var_off=%s disallowed\n", 
> tn_buf);
> +   return -EACCES;
> +   }
> +
> +   return 0;
> +}
> +
>  /* truncate register to smaller size (in bytes)
>   * must be called with size < BPF_REG_SIZE
>   */
> @@ -1686,24 +1710,11 @@ static int check_mem_access(struct bpf_verifier_env 
> *env, int insn_idx, u32 regn
> verbose(env, "R%d leaks addr into ctx\n", 
> value_regno);
> return -EACCES;
> }
> -   /* ctx accesses must be at a fixed offset, so that we can
> -* determine what type of data were returned.
> -*/
> -   if (reg->off) {
> -   verbose(env,
> -   "dereference of modified ctx ptr R%d 
> off=%d+%d, ctx+const is allowed, ctx+const+const is not\n",
> -   regno, reg->off, off - reg->off);
> -   return -EACCES;
> -   }
> -   if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
> -   char tn_buf[48];
>
> -   tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> -   verbose(env,
> -   "variable ctx access var_off=%s off=%d 
> size=%d",
> -   tn_buf, off, 

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-25 Thread Y Song
On Fri, May 25, 2018 at 8:21 AM, Alban Crequy <al...@kinvolk.io> wrote:
> On Wed, May 23, 2018 at 4:34 AM Y Song <ys114...@gmail.com> wrote:
>
>> I did a quick prototyping and the above interface seems working fine.
>
> Thanks! I gave your kernel patch & userspace program a try and it works for
> me on cgroup-v2.
>
> Also, I found out how to get my containers to use both cgroup-v1 and
> cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
> '--exec-opt native.cgroupdriver=systemd' option). So I should be able to
> use the BPF helper function without having to add support for all the
> cgroup-v1 hierarchies.

Great. Will submit a formal patch soon.

>
>> The kernel change:
>> ===
>
>> [yhs@localhost bpf-next]$ git diff
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bbe2ca5..669b7383fddb 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1976,7 +1976,8 @@ union bpf_attr {
>>  FN(fib_lookup), \
>>  FN(sock_hash_update),   \
>>  FN(msg_redirect_hash),  \
>> -   FN(sk_redirect_hash),
>> +   FN(sk_redirect_hash),   \
>> +   FN(get_current_cgroup_id),
>
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which
> helper
>>* function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbff27e4..e11e3298f911 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
>> bpf_current_task_under_cgroup_proto = {
>>  .arg2_type  = ARG_ANYTHING,
>>   };
>
>> +BPF_CALL_0(bpf_get_current_cgroup_id)
>> +{
>> +   struct cgroup *cgrp = task_dfl_cgroup(current);
>> +   if (!cgrp)
>> +   return -EINVAL;
>> +
>> +   return cgrp->kn->id.id;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>> +   .func   = bpf_get_current_cgroup_id,
>> +   .gpl_only   = false,
>> +   .ret_type   = RET_INTEGER,
>> +};
>> +
>>   BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
>> const void *, unsafe_ptr)
>>   {
>> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
>> struct bpf_prog *prog)
>>  return _get_prandom_u32_proto;
>>  case BPF_FUNC_probe_read_str:
>>  return _probe_read_str_proto;
>> +   case BPF_FUNC_get_current_cgroup_id:
>> +   return _get_current_cgroup_id_proto;
>>  default:
>>  return NULL;
>>  }
>
>> The following program can be used to print out a cgroup id given a cgroup
> path.
>> [yhs@localhost cg]$ cat get_cgroup_id.c
>> #define _GNU_SOURCE
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>
>> int main(int argc, char **argv)
>> {
>>  int dirfd, err, flags, mount_id, fhsize;
>>  struct file_handle *fhp;
>>  char *pathname;
>
>>  if (argc != 2) {
>>  printf("usage: %s \n", argv[0]);
>>  return 1;
>>  }
>
>>  pathname = argv[1];
>>  dirfd = AT_FDCWD;
>>  flags = 0;
>
>>  fhsize = sizeof(*fhp);
>>  fhp = malloc(fhsize);
>>  if (!fhp)
>>  return 1;
>
>>  err = name_to_handle_at(dirfd, pathname, fhp, _id, flags);
>>  if (err >= 0) {
>>  printf("error\n");
>>  return 1;
>>  }
>
>>  fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
>>  fhp = realloc(fhp, fhsize);
>>  if (!fhp)
>>  return 1;
>
>>  err = name_to_handle_at(dirfd, pathname, fhp, _id, flags);
>>  if (err < 0)
>>  perror("name_to_handle_at");
>>  else {
>>  int i;
>
>>  printf("dir = %s, mount_id = %d\n", pathname, mount_id);
>>  printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
>>  fhp->handle_type);
>>  if (fhp->handle_bytes != 8)
>>  return 1;
>
>>  printf("cgroup_id = 0x%llx\n", *(unsigned long long
> *)fhp->f_handle);
>>  }
>
>>  return 0;
>> }
>> [yhs@localhost cg]$
>
>>

Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local

2018-05-25 Thread Y Song
On Fri, May 25, 2018 at 9:16 AM, Y Song <ys114...@gmail.com> wrote:
> On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonn...@gmail.com> 
> wrote:
>> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
>> some UAPI headers in tools/.
>>
>> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
>> /usr/local/include -idirafter
>> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
>> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>>  -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |  \
>> llc -march=bpf -mcpu=generic  -filetype=obj -o
>> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
>> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>>  ^~~~
>> 1 error generated.
>> make: Leaving directory
>> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>>
>> Reported-by: Y Song <ys114...@gmail.com>
>> Signed-off-by: Mathieu Xhonneux <m.xhonn...@gmail.com>
>> ---
>>  .../selftests/bpf/include/uapi/linux/seg6.h| 55 +++
>>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 
>> ++
>>  2 files changed, 135 insertions(+)
>>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>>  create mode 100644 
>> tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>
> Thanks for fixing the issue.
>
> Acked-by: Y Song <ys114...@gmail.com>

Although it fixed the issue, the file is placed in
tools/testing/selftests/bpf/include/uapi/linux
directory. Considering the file is really coming from
linux/include/uapi/linux directory, should it
be placed in tools/include/uapi/linux directory instead?


Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local

2018-05-25 Thread Y Song
On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonn...@gmail.com> wrote:
> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
> some UAPI headers in tools/.
>
> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
> /usr/local/include -idirafter
> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>  -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |  \
> llc -march=bpf -mcpu=generic  -filetype=obj -o
> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>  ^~~~
> 1 error generated.
> make: Leaving directory
> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>
> Reported-by: Y Song <ys114...@gmail.com>
> Signed-off-by: Mathieu Xhonneux <m.xhonn...@gmail.com>
> ---
>  .../selftests/bpf/include/uapi/linux/seg6.h| 55 +++
>  .../selftests/bpf/include/uapi/linux/seg6_local.h  | 80 
> ++
>  2 files changed, 135 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>  create mode 100644 
> tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h

Thanks for fixing the issue.

Acked-by: Y Song <ys114...@gmail.com>


Re: [PATCH bpf-next v7 6/6] selftests/bpf: test for seg6local End.BPF action

2018-05-24 Thread Y Song
When compiling latest bpf-next, I hit the following compilation error:

clang -I. -I./include/uapi -I../../../include/uapi -idirafter
/usr/local/include -idirafter
/data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types \
 -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - |  \
llc -march=bpf -mcpu=generic  -filetype=obj -o
/data/users/yhs/work/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
#include 
 ^~~~
1 error generated.
make: Leaving directory
`/data/users/yhs/work/net-next/tools/testing/selftests/bpf'

Should the seg6_local.h be copied to tools/ directory?

On Sun, May 20, 2018 at 6:58 AM, Mathieu Xhonneux  wrote:
> Add a new test for the seg6local End.BPF action. The following helpers
> are also tested:
>
> - bpf_lwt_push_encap within the LWT BPF IN hook
> - bpf_lwt_seg6_action
> - bpf_lwt_seg6_adjust_srh
> - bpf_lwt_seg6_store_bytes
>
> A chain of End.BPF actions is built. The SRH is injected through a LWT
> BPF IN hook before entering this chain. Each End.BPF action validates
> the previous one, otherwise the packet is dropped. The test succeeds
> if the last node in the chain receives the packet and the UDP datagram
> contained can be retrieved from userspace.
>
> Signed-off-by: Mathieu Xhonneux 
> ---
>  tools/include/uapi/linux/bpf.h|  97 -
>  tools/testing/selftests/bpf/Makefile  |   6 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |  12 +
>  tools/testing/selftests/bpf/test_lwt_seg6local.c  | 437 
> ++
>  tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++
>  5 files changed, 689 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
>  create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 97446bbe2ca5..b217a33d80a4 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -141,6 +141,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SK_MSG,
> BPF_PROG_TYPE_RAW_TRACEPOINT,
> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +   BPF_PROG_TYPE_LWT_SEG6LOCAL,
>  };
>
>  enum bpf_attach_type {
> @@ -1902,6 +1903,90 @@ union bpf_attr {
>   * egress otherwise). This is the only flag supported for now.
>   * Return
>   * **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
> + * Description
> + * Encapsulate the packet associated to *skb* within a Layer 3
> + * protocol header. This header is provided in the buffer at
> + * address *hdr*, with *len* its size in bytes. *type* indicates
> + * the protocol of the header and can be one of:
> + *
> + * **BPF_LWT_ENCAP_SEG6**
> + * IPv6 encapsulation with Segment Routing Header
> + * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
> + * the IPv6 header is computed by the kernel.
> + * **BPF_LWT_ENCAP_SEG6_INLINE**
> + * Only works if *skb* contains an IPv6 packet. Insert a
> + * Segment Routing Header (**struct ipv6_sr_hdr**) inside
> + * the IPv6 header.
> + *
> + * A call to this helper is susceptible to change the underlaying
> + * packet buffer. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again, if the helper is used in combination with
> + * direct packet access.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const void 
> *from, u32 len)
> + * Description
> + * Store *len* bytes from address *from* into the packet
> + * associated to *skb*, at *offset*. Only the flags, tag and TLVs
> + * inside the outermost IPv6 Segment Routing Header can be
> + * modified through this helper.
> + *
> + * A call to this helper is susceptible to change the underlaying
> + * packet buffer. Therefore, at load time, all checks on pointers
> + * previously done by the verifier are invalidated and must be
> + * performed again, if the helper is used in combination with
> + * direct packet access.
> + * Return
> + * 0 on success, or a negative error in case of failure.
> + *
> + * int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
> + * Description
> + *   

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-22 Thread Y Song
On Tue, May 22, 2018 at 8:35 PM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Tue, May 22, 2018 at 08:33:24PM -0700, Y Song wrote:
>> +   struct cgroup *cgrp = task_dfl_cgroup(current);
>> +   if (!cgrp)
>> +   return -EINVAL;
>
> why this check is needed?

No reason :-) Originally I am concerned whether it is possible cgrp
could be NULL.
By looking at the code, it SEEMS to me that it could not be NULL, but I am not
100% sure (as I am not a cgroup expert). Since you are asking,
probably means it cannot be NULL, so will remove it in formal upstream patch.


Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-22 Thread Y Song
I did a quick prototyping and the above interface seems working fine.

The kernel change:
===

[yhs@localhost bpf-next]$ git diff
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97446bbe2ca5..669b7383fddb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1976,7 +1976,8 @@ union bpf_attr {
FN(fib_lookup), \
FN(sock_hash_update),   \
FN(msg_redirect_hash),  \
-   FN(sk_redirect_hash),
+   FN(sk_redirect_hash),   \
+   FN(get_current_cgroup_id),

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ce2cbbff27e4..e11e3298f911 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -493,6 +493,21 @@ static const struct bpf_func_proto
bpf_current_task_under_cgroup_proto = {
.arg2_type  = ARG_ANYTHING,
 };

+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+   struct cgroup *cgrp = task_dfl_cgroup(current);
+   if (!cgrp)
+   return -EINVAL;
+
+   return cgrp->kn->id.id;
+}
+
+static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+   .func   = bpf_get_current_cgroup_id,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+};
+
 BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
   const void *, unsafe_ptr)
 {
@@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
struct bpf_prog *prog)
return _get_prandom_u32_proto;
case BPF_FUNC_probe_read_str:
return _probe_read_str_proto;
+   case BPF_FUNC_get_current_cgroup_id:
+   return _get_current_cgroup_id_proto;
default:
return NULL;
}

The following program can be used to print out a cgroup id given a cgroup path.
[yhs@localhost cg]$ cat get_cgroup_id.c
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
int dirfd, err, flags, mount_id, fhsize;
struct file_handle *fhp;
char *pathname;

if (argc != 2) {
printf("usage: %s \n", argv[0]);
return 1;
}

pathname = argv[1];
dirfd = AT_FDCWD;
flags = 0;

fhsize = sizeof(*fhp);
fhp = malloc(fhsize);
if (!fhp)
return 1;

err = name_to_handle_at(dirfd, pathname, fhp, _id, flags);
if (err >= 0) {
printf("error\n");
return 1;
}

fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
fhp = realloc(fhp, fhsize);
if (!fhp)
return 1;

err = name_to_handle_at(dirfd, pathname, fhp, _id, flags);
if (err < 0)
perror("name_to_handle_at");
else {
int i;

printf("dir = %s, mount_id = %d\n", pathname, mount_id);
printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
fhp->handle_type);
if (fhp->handle_bytes != 8)
return 1;

printf("cgroup_id = 0x%llx\n", *(unsigned long long *)fhp->f_handle);
}

return 0;
}
[yhs@localhost cg]$

Given a cgroup path, the user can get cgroup_id and use it in their bpf
program for filtering purpose.

I run a simple program t.c
   int main() { while(1) sleep(1); return 0; }
in the cgroup v2 directory /home/yhs/tmp/yhs
   none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)

$ ./get_cgroup_id /home/yhs/tmp/yhs
dir = /home/yhs/tmp/yhs, mount_id = 124
handle_bytes = 8, handle_type = 1
cgroup_id = 0x106b2

// the below command to get cgroup_id from the kernel for the
// process compiled with t.c and ran under /home/yhs/tmp/yhs:
$ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
PID TID COMMFUNC -
40674067a.out   __x64_sys_nanosleep cgid = 106b2
40674067a.out   __x64_sys_nanosleep cgid = 106b2
40674067a.out   __x64_sys_nanosleep cgid = 106b2
^C[yhs@localhost tools]$

The kernel and user space cgid matches. Will provide a
formal patch later.




On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114...@gmail.com> wrote:
> On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
> <alexei.starovoi...@gmail.com> wrote:
>> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>>>
>>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>>> +{
>>> + // TODO: pick the correct hierarchy instead of the mem controller
>>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>>> +
>>> + if (unlikely(!cgrp))
>>> + return -EINVAL;
>>> + if (unlikely(hierarchy))
>>> + return -EINVAL;
>>> + if (u

Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-21 Thread Y Song
On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
 wrote:
> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>>
>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>> +{
>> + // TODO: pick the correct hierarchy instead of the mem controller
>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>> +
>> + if (unlikely(!cgrp))
>> + return -EINVAL;
>> + if (unlikely(hierarchy))
>> + return -EINVAL;
>> + if (unlikely(flags))
>> + return -EINVAL;
>> +
>> + return cgrp->kn->id.ino;
>
> ino only is not enough to identify cgroup. It needs generation number too.
> I don't quite see how hierarchy and flags can be used in the future.
> Also why limit it to memcg?
>
> How about something like this instead:
>
> BPF_CALL_2(bpf_get_current_cgroup_id)
> {
> struct cgroup *cgrp = task_dfl_cgroup(current);
>
> return cgrp->kn->id.id;
> }
> The user space can use fhandle api to get the same 64-bit id.

I think this should work. This will also be useful to bcc as user
space can encode desired id
in the bpf program and compared that id to the current cgroup id, so we can have
cgroup level tracing (esp. stat collection) support. To cope with
cgroup hierarchy, user can use
cgroup-array based approach or explicitly compare against multiple cgroup id's.


Re: [PATCH bpf-next v2 7/7] tools/bpftool: add perf subcommand

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 1:51 PM, Jakub Kicinski
 wrote:
> On Thu, 17 May 2018 22:03:10 -0700, Yonghong Song wrote:
>> The new command "bpftool perf [show | list]" will traverse
>> all processes under /proc, and if any fd is associated
>> with a perf event, it will print out related perf event
>> information. Documentation is also added.
>
> Thanks for the changes, it looks good with some minor nits which can be
> addressed as follow up if there is no other need to respin.  Please
> consider it:
>
> Reviewed-by: Jakub Kicinski 

Most likely will need respin. Will make suggested changes then.

>
>> Below is an example to show the results using bcc commands.
>> Running the following 4 bcc commands:
>>   kprobe: trace.py '__x64_sys_nanosleep'
>>   kretprobe:  trace.py 'r::__x64_sys_nanosleep'
>>   tracepoint: trace.py 't:syscalls:sys_enter_nanosleep'
>>   uprobe: trace.py 'p:/home/yhs/a.out:main'
>>
>> The bpftool command line and result:
>>
>>   $ bpftool perf
>>   pid 21711  fd 5: prog_id 5  kprobe  func __x64_sys_write  offset 0
>>   pid 21765  fd 5: prog_id 7  kretprobe  func __x64_sys_nanosleep  offset 0
>>   pid 21767  fd 5: prog_id 8  tracepoint  sys_enter_nanosleep
>>   pid 21800  fd 5: prog_id 9  uprobe  filename /home/yhs/a.out  offset 1159
>>
>>   $ bpftool -j perf
>>   
>> {"pid":21711,"fd":5,"prog_id":5,"attach_info":"kprobe","func":"__x64_sys_write","offset":0},
>>  \
>>   
>> {"pid":21765,"fd":5,"prog_id":7,"attach_info":"kretprobe","func":"__x64_sys_nanosleep","offset":0},
>>  \
>>   
>> {"pid":21767,"fd":5,"prog_id":8,"attach_info":"tracepoint","tracepoint":"sys_enter_nanosleep"},
>>  \
>>   
>> {"pid":21800,"fd":5,"prog_id":9,"attach_info":"uprobe","filename":"/home/yhs/a.out","offset":1159}
>
> nit: this is now an array

Sorry, this is probably updated in middle of work. Will make the change in
the next revision.

>
>>   $ bpftool prog
>>   5: kprobe  name probe___x64_sys  tag e495a0c82f2c7a8d  gpl
>> loaded_at 2018-05-15T04:46:37-0700  uid 0
>> xlated 200B  not jited  memlock 4096B  map_ids 4
>>   7: kprobe  name probe___x64_sys  tag f2fdee479a503abf  gpl
>> loaded_at 2018-05-15T04:48:32-0700  uid 0
>> xlated 200B  not jited  memlock 4096B  map_ids 7
>>   8: tracepoint  name tracepoint__sys  tag 5390badef2395fcf  gpl
>> loaded_at 2018-05-15T04:48:48-0700  uid 0
>> xlated 200B  not jited  memlock 4096B  map_ids 8
>>   9: kprobe  name probe_main_1  tag 0a87bdc2e2953b6d  gpl
>> loaded_at 2018-05-15T04:49:52-0700  uid 0
>> xlated 200B  not jited  memlock 4096B  map_ids 9
>>
>>   $ ps ax | grep "python ./trace.py"
>>   21711 pts/0T  0:03 python ./trace.py __x64_sys_write
>>   21765 pts/0S+ 0:00 python ./trace.py r::__x64_sys_nanosleep
>>   21767 pts/2S+ 0:00 python ./trace.py t:syscalls:sys_enter_nanosleep
>>   21800 pts/3S+ 0:00 python ./trace.py p:/home/yhs/a.out:main
>>   22374 pts/1S+ 0:00 grep --color=auto python ./trace.py
>>
>> Signed-off-by: Yonghong Song 
>
>> diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
>> b/tools/bpf/bpftool/bash-completion/bpftool
>> index b301c9b..3680ad4 100644
>> --- a/tools/bpf/bpftool/bash-completion/bpftool
>> +++ b/tools/bpf/bpftool/bash-completion/bpftool
>> @@ -448,6 +448,15 @@ _bpftool()
>>  ;;
>>  esac
>>  ;;
>> +cgroup)
>
> s/cgroup/perf/ :)

A mistake in my side to consolidate different version of code.
I did have "perf" in one of my versions and tested it properly.

>
>> +case $command in
>> +*)
>> +[[ $prev == $object ]] && \
>> +COMPREPLY=( $( compgen -W 'help \
>> +show list' -- "$cur" ) )
>> +;;
>> +esac
>> +;;
>>  esac
>>  } &&
>>  complete -F _bpftool bpftool
>
>> +static int show_proc(const char *fpath, const struct stat *sb,
>> +  int tflag, struct FTW *ftwbuf)
>> +{
>> + __u64 probe_offset, probe_addr;
>> + __u32 prog_id, attach_info;
>> + int err, pid = 0, fd = 0;
>> + const char *pch;
>> + char buf[4096];
>> +
>> + /* prefix always /proc */
>> + pch = fpath + 5;
>> + if (*pch == '\0')
>> + return 0;
>> +
>> + /* pid should be all numbers */
>> + pch++;
>> + while (isdigit(*pch)) {
>> + pid = pid * 10 + *pch - '0';
>> + pch++;
>> + }
>> + if (*pch == '\0')
>> + return 0;
>> + if (*pch != '/')
>> + return FTW_SKIP_SUBTREE;
>> +
>> + /* check /proc//fd directory */
>> + pch++;
>> + if (strncmp(pch, "fd", 2))
>> + return FTW_SKIP_SUBTREE;
>> + pch += 2;
>> + if (*pch == '\0')
>> + return 0;
>> + if (*pch != '/')
>> + return FTW_SKIP_SUBTREE;
>> +
>> + /* check 

Re: [PATCH v4 3/3] bpf: add selftest for lirc_mode2 type program

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 1:17 PM, Y Song <ys114...@gmail.com> wrote:
> On Fri, May 18, 2018 at 7:07 AM, Sean Young <s...@mess.org> wrote:
>> This is simple test over rc-loopback.
>>
>> Signed-off-by: Sean Young <s...@mess.org>
>
> Acked-by: Yonghong Song <y...@fb.com>

Just one minor thing. You need to add "test_lirc_mode2_user"
in tools/testing/selftests/bpf/.gitignore
so it will not show up when you do "git status".

If the patch needs respin, you can add this in the new revision.
Otherwise, I think a followup patch to fix this should be fine.

>
>> ---
>>  tools/bpf/bpftool/prog.c  |   1 +
>>  tools/include/uapi/linux/bpf.h|  53 -
>>  tools/include/uapi/linux/lirc.h   | 217 ++
>>  tools/lib/bpf/libbpf.c|   1 +
>>  tools/testing/selftests/bpf/Makefile  |   8 +-
>>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>>  .../testing/selftests/bpf/test_lirc_mode2.sh  |  28 +++
>>  .../selftests/bpf/test_lirc_mode2_kern.c  |  23 ++
>>  .../selftests/bpf/test_lirc_mode2_user.c  | 154 +
>>  9 files changed, 487 insertions(+), 4 deletions(-)
>>  create mode 100644 tools/include/uapi/linux/lirc.h
>>  create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
>>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
>>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c
>>
>> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>> index 9bdfdf2d3fbe..07f1ace39a46 100644
>> --- a/tools/bpf/bpftool/prog.c
>> +++ b/tools/bpf/bpftool/prog.c
>> @@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
>> [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
>> [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
>> [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
>> +   [BPF_PROG_TYPE_LIRC_MODE2]  = "lirc_mode2",
>>  };
>>
>>  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index d94d333a8225..8227832b713e 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -141,6 +141,7 @@ enum bpf_prog_type {
>> BPF_PROG_TYPE_SK_MSG,
>> BPF_PROG_TYPE_RAW_TRACEPOINT,
>> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
>> +   BPF_PROG_TYPE_LIRC_MODE2,
>>  };
>>
>>  enum bpf_attach_type {
>> @@ -158,6 +159,7 @@ enum bpf_attach_type {
>> BPF_CGROUP_INET6_CONNECT,
>> BPF_CGROUP_INET4_POST_BIND,
>> BPF_CGROUP_INET6_POST_BIND,
>> +   BPF_LIRC_MODE2,
>> __MAX_BPF_ATTACH_TYPE
>>  };
>>
>> @@ -1902,6 +1904,53 @@ union bpf_attr {
>>   * egress otherwise). This is the only flag supported for now.
>>   * Return
>>   * **SK_PASS** on success, or **SK_DROP** on error.
>> + *
>> + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
>> + * Description
>> + * This helper is used in programs implementing IR decoding, to
>> + * report a successfully decoded key press with *scancode*,
>> + * *toggle* value in the given *protocol*. The scancode will be
>> + * translated to a keycode using the rc keymap, and reported as
>> + * an input key down event. After a period a key up event is
>> + * generated. This period can be extended by calling either
>> + * **bpf_rc_keydown** () with the same values, or calling
>> + * **bpf_rc_repeat** ().
>> + *
>> + * Some protocols include a toggle bit, in case the button
>> + * was released and pressed again between consecutive scancodes
>> + *
>> + * The *ctx* should point to the lirc sample as passed into
>> + * the program.
>> + *
>> + * The *protocol* is the decoded protocol number (see
>> + * **enum rc_proto** for some predefined values).
>> + *
>> + * This helper is only available is the kernel was compiled with
>> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
>> + * "**y**".
>> + *
>> + * Return
>> + * 0
>> + *
>> + * int bpf_rc_repeat(void *ctx)
>> + * Description
>> + * This helper is used in programs implem

Re: [PATCH v4 3/3] bpf: add selftest for lirc_mode2 type program

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
> This is simple test over rc-loopback.
>
> Signed-off-by: Sean Young 

Acked-by: Yonghong Song 

> ---
>  tools/bpf/bpftool/prog.c  |   1 +
>  tools/include/uapi/linux/bpf.h|  53 -
>  tools/include/uapi/linux/lirc.h   | 217 ++
>  tools/lib/bpf/libbpf.c|   1 +
>  tools/testing/selftests/bpf/Makefile  |   8 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>  .../testing/selftests/bpf/test_lirc_mode2.sh  |  28 +++
>  .../selftests/bpf/test_lirc_mode2_kern.c  |  23 ++
>  .../selftests/bpf/test_lirc_mode2_user.c  | 154 +
>  9 files changed, 487 insertions(+), 4 deletions(-)
>  create mode 100644 tools/include/uapi/linux/lirc.h
>  create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..07f1ace39a46 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
> [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
> [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
> [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
> +   [BPF_PROG_TYPE_LIRC_MODE2]  = "lirc_mode2",
>  };
>
>  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..8227832b713e 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -141,6 +141,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SK_MSG,
> BPF_PROG_TYPE_RAW_TRACEPOINT,
> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +   BPF_PROG_TYPE_LIRC_MODE2,
>  };
>
>  enum bpf_attach_type {
> @@ -158,6 +159,7 @@ enum bpf_attach_type {
> BPF_CGROUP_INET6_CONNECT,
> BPF_CGROUP_INET4_POST_BIND,
> BPF_CGROUP_INET6_POST_BIND,
> +   BPF_LIRC_MODE2,
> __MAX_BPF_ATTACH_TYPE
>  };
>
> @@ -1902,6 +1904,53 @@ union bpf_attr {
>   * egress otherwise). This is the only flag supported for now.
>   * Return
>   * **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
> + * Description
> + * This helper is used in programs implementing IR decoding, to
> + * report a successfully decoded key press with *scancode*,
> + * *toggle* value in the given *protocol*. The scancode will be
> + * translated to a keycode using the rc keymap, and reported as
> + * an input key down event. After a period a key up event is
> + * generated. This period can be extended by calling either
> + * **bpf_rc_keydown** () with the same values, or calling
> + * **bpf_rc_repeat** ().
> + *
> + * Some protocols include a toggle bit, in case the button
> + * was released and pressed again between consecutive scancodes
> + *
> + * The *ctx* should point to the lirc sample as passed into
> + * the program.
> + *
> + * The *protocol* is the decoded protocol number (see
> + * **enum rc_proto** for some predefined values).
> + *
> + * This helper is only available is the kernel was compiled with
> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + * "**y**".
> + *
> + * Return
> + * 0
> + *
> + * int bpf_rc_repeat(void *ctx)
> + * Description
> + * This helper is used in programs implementing IR decoding, to
> + * report a successfully decoded repeat key message. This delays
> + * the generation of a key up event for previously generated
> + * key down event.
> + *
> + * Some IR protocols like NEC have a special IR message for
> + * repeating last button, for when a button is held down.
> + *
> + * The *ctx* should point to the lirc sample as passed into
> + * the program.
> + *
> + * This helper is only available is the kernel was compiled with
> + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
> + * "**y**".
> + *
> + * Return
> + * 0
>   */
>  #define __BPF_FUNC_MAPPER(FN)  \
> FN(unspec), \
> @@ -1976,7 +2025,9 @@ union bpf_attr {
> FN(fib_lookup), \
> FN(sock_hash_update),   \
> FN(msg_redirect_hash),  \
> -   FN(sk_redirect_hash),
> +   FN(sk_redirect_hash),   \
> +   

Re: [PATCH v4 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
> Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
>
> Signed-off-by: Sean Young 

Acked-by: Yonghong Song 

> ---
>  drivers/media/rc/Kconfig|  13 ++
>  drivers/media/rc/Makefile   |   1 +
>  drivers/media/rc/bpf-lirc.c | 308 
>  drivers/media/rc/lirc_dev.c |  30 
>  drivers/media/rc/rc-core-priv.h |  22 +++
>  drivers/media/rc/rc-ir-raw.c|  12 +-
>  include/linux/bpf_rcdev.h   |  30 
>  include/linux/bpf_types.h   |   3 +
>  include/uapi/linux/bpf.h|  53 +-
>  kernel/bpf/syscall.c|   7 +
>  10 files changed, 476 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/media/rc/bpf-lirc.c
>  create mode 100644 include/linux/bpf_rcdev.h
>
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..d5b35a6ba899 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -25,6 +25,19 @@ config LIRC
>passes raw IR to and from userspace, which is needed for
>IR transmitting (aka "blasting") and for the lirc daemon.
>
> +config BPF_LIRC_MODE2
> +   bool "Support for eBPF programs attached to lirc devices"
> +   depends on BPF_SYSCALL
> +   depends on RC_CORE=y
> +   depends on LIRC
> +   help
> +  Allow attaching eBPF programs to a lirc device using the bpf(2)
> +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
> +  receivers.
> +
> +  These eBPF programs can be used to decode IR into scancodes, for
> +  IR protocols not supported by the kernel decoders.
> +
>  menuconfig RC_DECODERS
> bool "Remote controller decoders"
> depends on RC_CORE
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..e0340d043fe8 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_BPF_LIRC_MODE2) += bpf-lirc.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> new file mode 100644
> index ..c9673df2d9cd
> --- /dev/null
> +++ b/drivers/media/rc/bpf-lirc.c
> @@ -0,0 +1,308 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// bpf-lirc.c - handles bpf
> +//
> +// Copyright (C) 2018 Sean Young 
> +
> +#include 
> +#include 
> +#include 
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR
> + */
> +const struct bpf_prog_ops lirc_mode2_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, u32*, sample)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
> +
> +   rc_repeat(ctrl->dev);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +   .func  = bpf_rc_repeat,
> +   .gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +/*
> + * Currently rc-core does not support 64-bit scancodes, but there are many
> + * known protocols with more than 32 bits. So, define the interface as u64
> + * as a future-proof.
> + */
> +BPF_CALL_4(bpf_rc_keydown, u32*, sample, u32, protocol, u64, scancode,
> +  u32, toggle)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
> +
> +   rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +   .func  = bpf_rc_keydown,
> +   .gpl_only  = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +   .arg2_type = ARG_ANYTHING,
> +   .arg3_type = ARG_ANYTHING,
> +   .arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *
> +lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_rc_repeat:
> +   return _repeat_proto;
> +   case BPF_FUNC_rc_keydown:
> +   return _keydown_proto;
> +   case BPF_FUNC_map_lookup_elem:
> +   return _map_lookup_elem_proto;
> +   case BPF_FUNC_map_update_elem:
> +   return _map_update_elem_proto;
> +   

Re: [PATCH v4 1/3] bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found

2018-05-18 Thread Y Song
On Fri, May 18, 2018 at 7:07 AM, Sean Young  wrote:
> This makes is it possible for bpf prog detach to return -ENOENT.
>
> Signed-off-by: Sean Young 

Acked-by: Yonghong Song 


Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Y Song
On Thu, May 17, 2018 at 2:45 PM, Sean Young <s...@mess.org> wrote:
> Hi,
>
> Again thanks for a thoughtful review. This will definitely will improve
> the code.
>
> On Thu, May 17, 2018 at 10:02:52AM -0700, Y Song wrote:
>> On Wed, May 16, 2018 at 2:04 PM, Sean Young <s...@mess.org> wrote:
>> > Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
>> > rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
>> > that the last key should be repeated.
>> >
>> > The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
>> > the target_fd must be the /dev/lircN device.
>> >
>> > Signed-off-by: Sean Young <s...@mess.org>
>> > ---
>> >  drivers/media/rc/Kconfig   |  13 ++
>> >  drivers/media/rc/Makefile  |   1 +
>> >  drivers/media/rc/bpf-rawir-event.c | 363 +
>> >  drivers/media/rc/lirc_dev.c|  24 ++
>> >  drivers/media/rc/rc-core-priv.h|  24 ++
>> >  drivers/media/rc/rc-ir-raw.c   |  14 +-
>> >  include/linux/bpf_rcdev.h  |  30 +++
>> >  include/linux/bpf_types.h  |   3 +
>> >  include/uapi/linux/bpf.h   |  55 -
>> >  kernel/bpf/syscall.c   |   7 +
>> >  10 files changed, 531 insertions(+), 3 deletions(-)
>> >  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>> >  create mode 100644 include/linux/bpf_rcdev.h
>> >
>> > diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
>> > index eb2c3b6eca7f..2172d65b0213 100644
>> > --- a/drivers/media/rc/Kconfig
>> > +++ b/drivers/media/rc/Kconfig
>> > @@ -25,6 +25,19 @@ config LIRC
>> >passes raw IR to and from userspace, which is needed for
>> >IR transmitting (aka "blasting") and for the lirc daemon.
>> >
>> > +config BPF_RAWIR_EVENT
>> > +   bool "Support for eBPF programs attached to lirc devices"
>> > +   depends on BPF_SYSCALL
>> > +   depends on RC_CORE=y
>> > +   depends on LIRC
>> > +   help
>> > +  Allow attaching eBPF programs to a lirc device using the bpf(2)
>> > +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
>> > +  receivers.
>> > +
>> > +  These eBPF programs can be used to decode IR into scancodes, for
>> > +  IR protocols not supported by the kernel decoders.
>> > +
>> >  menuconfig RC_DECODERS
>> > bool "Remote controller decoders"
>> > depends on RC_CORE
>> > diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
>> > index 2e1c87066f6c..74907823bef8 100644
>> > --- a/drivers/media/rc/Makefile
>> > +++ b/drivers/media/rc/Makefile
>> > @@ -5,6 +5,7 @@ obj-y += keymaps/
>> >  obj-$(CONFIG_RC_CORE) += rc-core.o
>> >  rc-core-y := rc-main.o rc-ir-raw.o
>> >  rc-core-$(CONFIG_LIRC) += lirc_dev.o
>> > +rc-core-$(CONFIG_BPF_RAWIR_EVENT) += bpf-rawir-event.o
>> >  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>> >  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>> >  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
>> > diff --git a/drivers/media/rc/bpf-rawir-event.c 
>> > b/drivers/media/rc/bpf-rawir-event.c
>> > new file mode 100644
>> > index ..7cb48b8d87b5
>> > --- /dev/null
>> > +++ b/drivers/media/rc/bpf-rawir-event.c
>> > @@ -0,0 +1,363 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +// bpf-rawir-event.c - handles bpf
>> > +//
>> > +// Copyright (C) 2018 Sean Young <s...@mess.org>
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include "rc-core-priv.h"
>> > +
>> > +/*
>> > + * BPF interface for raw IR
>> > + */
>> > +const struct bpf_prog_ops rawir_event_prog_ops = {
>> > +};
>> > +
>> > +BPF_CALL_1(bpf_rc_repeat, struct bpf_rawir_event*, event)
>> > +{
>> > +   struct ir_raw_event_ctrl *ctrl;
>> > +
>> > +   ctrl = container_of(event, struct ir_raw_event_ctrl, 
>> > bpf_rawir_event);
>> > +
>> > +   rc_repeat(ctrl->dev);
>> > +
>> > +   return 0;
>> > +}
>> > +
>> > +static const struct bpf_func_proto rc_repeat_proto = {
>> > +   .func  = bpf_rc_repeat,
>> 

Re: [PATCH v3 2/2] bpf: add selftest for rawir_event type program

2018-05-17 Thread Y Song
On Wed, May 16, 2018 at 2:04 PM, Sean Young  wrote:
> This is simple test over rc-loopback.
>
> Signed-off-by: Sean Young 
> ---
>  tools/bpf/bpftool/prog.c  |   1 +
>  tools/include/uapi/linux/bpf.h|  57 +++-
>  tools/lib/bpf/libbpf.c|   1 +
>  tools/testing/selftests/bpf/Makefile  |   8 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>  tools/testing/selftests/bpf/test_rawir.sh |  37 +
>  .../selftests/bpf/test_rawir_event_kern.c |  26 
>  .../selftests/bpf/test_rawir_event_user.c | 130 ++
>  8 files changed, 261 insertions(+), 5 deletions(-)
>  create mode 100755 tools/testing/selftests/bpf/test_rawir.sh
>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_user.c

Could you copy include/uapi/linux/lirc.h file to tools directory as well.
Otherwise, I will get the following compilation error:

gcc -Wall -O2 -I../../../include/uapi -I../../../lib
-I../../../lib/bpf -I../../../../include/generated  -I../../../include
   test_rawir_event_user.c
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/libbpf.a -lcap
-lelf -lrt -lpthread -o
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/test_rawir_event_user
test_rawir_event_user.c: In function ‘main’:
test_rawir_event_user.c:60:15: error: ‘LIRC_MODE_SCANCODE’ undeclared
(first use in this function); did you mean ‘LIRC_MODE_LIRCCODE’?
mode = LIRC_MODE_SCANCODE;
   ^~
   LIRC_MODE_LIRCCODE
test_rawir_event_user.c:60:15: note: each undeclared identifier is
reported only once for each function it appears in
test_rawir_event_user.c:93:29: error: storage size of ‘lsc’ isn’t known
struct lirc_scancode lsc;
 ^~~
test_rawir_event_user.c:93:29: warning: unused variable ‘lsc’
[-Wunused-variable]

>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..8889a4ee8577 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
> [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
> [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
> [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
> +   [BPF_PROG_TYPE_RAWIR_EVENT] = "rawir_event",
>  };
>
>  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 1205d86a7a29..243e141e8a5b 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -141,6 +141,7 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SK_MSG,
> BPF_PROG_TYPE_RAW_TRACEPOINT,
> BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> +   BPF_PROG_TYPE_RAWIR_EVENT,
>  };
>
>  enum bpf_attach_type {
> @@ -158,6 +159,7 @@ enum bpf_attach_type {
> BPF_CGROUP_INET6_CONNECT,
> BPF_CGROUP_INET4_POST_BIND,
> BPF_CGROUP_INET6_POST_BIND,
> +   BPF_RAWIR_EVENT,
> __MAX_BPF_ATTACH_TYPE
>  };
>
> @@ -1829,7 +1831,6 @@ union bpf_attr {
>   * Return
>   * 0 on success, or a negative error in case of failure.
>   *
> - *
>   * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, 
> u32 flags)
>   * Description
>   * Do FIB lookup in kernel tables using parameters in *params*.
> @@ -1856,6 +1857,7 @@ union bpf_attr {
>   * Egress device index on success, 0 if packet needs to continue
>   * up the stack for further processing or a negative error in 
> case
>   * of failure.
> + *
>   * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
> *map, void *key, u64 flags)
>   * Description
>   * Add an entry to, or update a sockhash *map* referencing 
> sockets.
> @@ -1902,6 +1904,35 @@ union bpf_attr {
>   * egress otherwise). This is the only flag supported for now.
>   * Return
>   * **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u32 scancode, u32 toggle)
> + * Description
> + * Report decoded scancode with toggle value. For use in
> + * BPF_PROG_TYPE_RAWIR_EVENT, to report a successfully
> + * decoded scancode. This is will generate a keydown event,
> + * and a keyup event once the scancode is no longer repeated.
> + *
> + * *ctx* pointer to bpf_rawir_event, *protocol* is decoded
> + * protocol (see RC_PROTO_* enum).
> + *
> + * Some protocols include a toggle bit, in case the button
> + * was released and pressed again between consecutive scancodes,
> + * copy this bit into *toggle* if it exists, else set to 0.
> + *
> + * Return
> + *  

Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Y Song
On Wed, May 16, 2018 at 2:04 PM, Sean Young  wrote:
> Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
>
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/Kconfig   |  13 ++
>  drivers/media/rc/Makefile  |   1 +
>  drivers/media/rc/bpf-rawir-event.c | 363 +
>  drivers/media/rc/lirc_dev.c|  24 ++
>  drivers/media/rc/rc-core-priv.h|  24 ++
>  drivers/media/rc/rc-ir-raw.c   |  14 +-
>  include/linux/bpf_rcdev.h  |  30 +++
>  include/linux/bpf_types.h  |   3 +
>  include/uapi/linux/bpf.h   |  55 -
>  kernel/bpf/syscall.c   |   7 +
>  10 files changed, 531 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>  create mode 100644 include/linux/bpf_rcdev.h
>
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..2172d65b0213 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -25,6 +25,19 @@ config LIRC
>passes raw IR to and from userspace, which is needed for
>IR transmitting (aka "blasting") and for the lirc daemon.
>
> +config BPF_RAWIR_EVENT
> +   bool "Support for eBPF programs attached to lirc devices"
> +   depends on BPF_SYSCALL
> +   depends on RC_CORE=y
> +   depends on LIRC
> +   help
> +  Allow attaching eBPF programs to a lirc device using the bpf(2)
> +  syscall command BPF_PROG_ATTACH. This is supported for raw IR
> +  receivers.
> +
> +  These eBPF programs can be used to decode IR into scancodes, for
> +  IR protocols not supported by the kernel decoders.
> +
>  menuconfig RC_DECODERS
> bool "Remote controller decoders"
> depends on RC_CORE
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..74907823bef8 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_BPF_RAWIR_EVENT) += bpf-rawir-event.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/bpf-rawir-event.c 
> b/drivers/media/rc/bpf-rawir-event.c
> new file mode 100644
> index ..7cb48b8d87b5
> --- /dev/null
> +++ b/drivers/media/rc/bpf-rawir-event.c
> @@ -0,0 +1,363 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// bpf-rawir-event.c - handles bpf
> +//
> +// Copyright (C) 2018 Sean Young 
> +
> +#include 
> +#include 
> +#include 
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR
> + */
> +const struct bpf_prog_ops rawir_event_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, struct bpf_rawir_event*, event)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, bpf_rawir_event);
> +
> +   rc_repeat(ctrl->dev);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +   .func  = bpf_rc_repeat,
> +   .gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +BPF_CALL_4(bpf_rc_keydown, struct bpf_rawir_event*, event, u32, protocol,
> +  u32, scancode, u32, toggle)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, bpf_rawir_event);
> +
> +   rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +   .func  = bpf_rc_keydown,
> +   .gpl_only  = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
> +   .ret_type  = RET_INTEGER,
> +   .arg1_type = ARG_PTR_TO_CTX,
> +   .arg2_type = ARG_ANYTHING,
> +   .arg3_type = ARG_ANYTHING,
> +   .arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *
> +rawir_event_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_rc_repeat:
> +   return _repeat_proto;
> +   case BPF_FUNC_rc_keydown:
> +   return _keydown_proto;
> +   case BPF_FUNC_map_lookup_elem:
> +   return _map_lookup_elem_proto;
> +   case BPF_FUNC_map_update_elem:
> +   return _map_update_elem_proto;
> +   case BPF_FUNC_map_delete_elem:
> +   return _map_delete_elem_proto;
> +   case BPF_FUNC_ktime_get_ns:
> 

Re: [PATCH bpf-next] samples/bpf: Decrement ttl in fib forwarding example

2018-05-16 Thread Y Song
On Tue, May 15, 2018 at 4:20 PM, David Ahern  wrote:
> Only consider forwarding packets if ttl in received packet is > 1 and
> decrement ttl before handing off to bpf_redirect_map.
>
> Signed-off-by: David Ahern 

I did not test this patch, but it looks good to me with visual inspection.
Acked-by: Yonghong Song 


Re: [PATCH bpf-next v6 2/4] bpf: sockmap, add hash map support

2018-05-15 Thread Y Song
On Tue, May 15, 2018 at 12:01 PM, Daniel Borkmann  wrote:
> On 05/14/2018 07:00 PM, John Fastabend wrote:
>> Sockmap is currently backed by an array and enforces keys to be
>> four bytes. This works well for many use cases and was originally
>> modeled after devmap which also uses four bytes keys. However,
>> this has become limiting in larger use cases where a hash would
>> be more appropriate. For example users may want to use the 5-tuple
>> of the socket as the lookup key.
>>
>> To support this add hash support.
>>
>> Signed-off-by: John Fastabend 
>> Acked-by: David S. Miller 
>> ---
>>  include/linux/bpf.h   |   8 +
>>  include/linux/bpf_types.h |   1 +
>>  include/uapi/linux/bpf.h  |  52 -
>>  kernel/bpf/core.c |   1 +
>>  kernel/bpf/sockmap.c  | 494 
>> --
>>  kernel/bpf/verifier.c |  14 +-
>>  net/core/filter.c |  58 ++
>>  7 files changed, 610 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index a38e474..ed0122b 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -668,6 +668,7 @@ static inline void bpf_map_offload_map_free(struct 
>> bpf_map *map)
>>
>>  #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL) && 
>> defined(CONFIG_INET)
>>  struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
>> +struct sock  *__sock_hash_lookup_elem(struct bpf_map *map, void *key);
>>  int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
>>  #else
>>  static inline struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 
>> key)
>> @@ -675,6 +676,12 @@ static inline struct sock  
>> *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
>>   return NULL;
>>  }
>>
>> +static inline struct sock  *__sock_hash_lookup_elem(struct bpf_map *map,
>> + void *key)
>> +{
>> + return NULL;
>> +}
>> +
>>  static inline int sock_map_prog(struct bpf_map *map,
>>   struct bpf_prog *prog,
>>   u32 type)
>> @@ -724,6 +731,7 @@ static inline void __xsk_map_flush(struct bpf_map *map)
>>  extern const struct bpf_func_proto bpf_get_stackid_proto;
>>  extern const struct bpf_func_proto bpf_get_stack_proto;
>>  extern const struct bpf_func_proto bpf_sock_map_update_proto;
>> +extern const struct bpf_func_proto bpf_sock_hash_update_proto;
>>
>>  /* Shared helpers among cBPF and eBPF. */
>>  void bpf_user_rnd_init_once(void);
>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>> index d7df1b32..b67f879 100644
>> --- a/include/linux/bpf_types.h
>> +++ b/include/linux/bpf_types.h
>> @@ -47,6 +47,7 @@
>>  BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
>>  #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_INET)
>>  BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
>> +BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
>>  #endif
>>  BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
>>  #if defined(CONFIG_XDP_SOCKETS)
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 02e4112..1205d86 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -118,6 +118,7 @@ enum bpf_map_type {
>>   BPF_MAP_TYPE_SOCKMAP,
>>   BPF_MAP_TYPE_CPUMAP,
>>   BPF_MAP_TYPE_XSKMAP,
>> + BPF_MAP_TYPE_SOCKHASH,
>>  };
>>
>>  enum bpf_prog_type {
>> @@ -1855,6 +1856,52 @@ struct bpf_stack_build_id {
>>   * Egress device index on success, 0 if packet needs to continue
>>   * up the stack for further processing or a negative error in 
>> case
>>   * of failure.
>> + * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
>> *map, void *key, u64 flags)
>
> When you rebase please fix this up properly next time and add a newline in 
> between
> the helpers. I fixed this up while applying.

I guess the tools/include/uapi/linux/bpf.h may also need fixup to be
in sync with main bpf.h.


Re: [bpf-next PATCH 0/5] bpf, doc: convert Documentation/bpf to RST-formatting

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 6:42 AM, Jesper Dangaard Brouer
 wrote:
> The kernel is moving files under Documentation to use the RST
> (reStructuredText) format and Sphinx [1].  This patchset converts the
> files under Documentation/bpf/ into RST format.  The Sphinx
> integration is left as followup work.
>
> [1] https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html
>
> This patchset have been uploaded as branch bpf_doc10 on github[2], so
> reviewers can see how GitHub renders this.
>
> [2] https://github.com/netoptimizer/linux/tree/bpf_doc10/Documentation/bpf
>
> ---
>
> Jesper Dangaard Brouer (5):
>   bpf, doc: add basic README.rst file
>   bpf, doc: rename txt files to rst files
>   bpf, doc: convert bpf_design_QA.rst to use RST formatting
>   bpf, doc: convert bpf_devel_QA.rst to use RST formatting
>   bpf, doc: howto use/run the BPF selftests

This initial conversion from .txt to .rst files look good to me. Ack
for the whole series.
Acked-by: Yonghong Song 

>
>
>  Documentation/bpf/README.rst|   36 ++
>  Documentation/bpf/bpf_design_QA.rst |  221 
>  Documentation/bpf/bpf_design_QA.txt |  156 -
>  Documentation/bpf/bpf_devel_QA.rst  |  640 
> +++
>  Documentation/bpf/bpf_devel_QA.txt  |  570 ---
>  5 files changed, 897 insertions(+), 726 deletions(-)
>  create mode 100644 Documentation/bpf/README.rst
>  create mode 100644 Documentation/bpf/bpf_design_QA.rst
>  delete mode 100644 Documentation/bpf/bpf_design_QA.txt
>  create mode 100644 Documentation/bpf/bpf_devel_QA.rst
>  delete mode 100644 Documentation/bpf/bpf_devel_QA.txt
>
> --


Re: [PATCH v1 4/4] samples/bpf: an example of a raw IR decoder

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:11 PM, Sean Young  wrote:
> This implements the grundig-16 IR protocol.
>
> Signed-off-by: Sean Young 
> ---
>  samples/bpf/Makefile  |   4 +
>  samples/bpf/bpf_load.c|   9 +-
>  samples/bpf/grundig_decoder_kern.c| 112 ++
>  samples/bpf/grundig_decoder_user.c|  54 +++
>  tools/bpf/bpftool/prog.c  |   1 +
>  tools/include/uapi/linux/bpf.h|  17 +++-
>  tools/testing/selftests/bpf/bpf_helpers.h |   6 ++
>  7 files changed, 200 insertions(+), 3 deletions(-)
>  create mode 100644 samples/bpf/grundig_decoder_kern.c
>  create mode 100644 samples/bpf/grundig_decoder_user.c
>
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 4d6a6edd4bf6..c6fa111f103a 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
>  hostprogs-y += xdp_rxq_info
>  hostprogs-y += syscall_tp
>  hostprogs-y += cpustat
> +hostprogs-y += grundig_decoder
>
>  # Libbpf dependencies
>  LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
> @@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
>  xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
>  syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
>  cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
> +grundig_decoder-objs := bpf_load.o $(LIBBPF) grundig_decoder_user.o
>
>  # Tell kbuild to always build the programs
>  always := $(hostprogs-y)
> @@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
>  always += xdp2skb_meta_kern.o
>  always += syscall_tp_kern.o
>  always += cpustat_kern.o
> +always += grundig_decoder_kern.o
>
>  HOSTCFLAGS += -I$(objtree)/usr/include
>  HOSTCFLAGS += -I$(srctree)/tools/lib/
> @@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
>  HOSTLOADLIBES_xdp_rxq_info += -lelf
>  HOSTLOADLIBES_syscall_tp += -lelf
>  HOSTLOADLIBES_cpustat += -lelf
> +HOSTLOADLIBES_grundig_decoder += -lelf
>
>  # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
> cmdline:
>  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
> CLANG=~/git/llvm/build/bin/clang
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index bebe4188b4b3..0fd389e95bb9 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -69,6 +69,7 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
> bool is_sockops = strncmp(event, "sockops", 7) == 0;
> bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
> bool is_sk_msg = strncmp(event, "sk_msg", 6) == 0;
> +   bool is_ir_decoder = strncmp(event, "ir_decoder", 10) == 0;
> size_t insns_cnt = size / sizeof(struct bpf_insn);
> enum bpf_prog_type prog_type;
> char buf[256];
> @@ -102,6 +103,8 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
> prog_type = BPF_PROG_TYPE_SK_SKB;
> } else if (is_sk_msg) {
> prog_type = BPF_PROG_TYPE_SK_MSG;
> +   } else if (is_ir_decoder) {
> +   prog_type = BPF_PROG_TYPE_RAWIR_DECODER;
> } else {
> printf("Unknown event '%s'\n", event);
> return -1;
> @@ -116,7 +119,8 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
>
> prog_fd[prog_cnt++] = fd;
>
> -   if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
> +   if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk ||
> +   is_ir_decoder)
> return 0;
>
> if (is_socket || is_sockops || is_sk_skb || is_sk_msg) {
> @@ -607,7 +611,8 @@ static int do_load_bpf_file(const char *path, 
> fixup_map_cb fixup_map)
> memcmp(shname, "cgroup/", 7) == 0 ||
> memcmp(shname, "sockops", 7) == 0 ||
> memcmp(shname, "sk_skb", 6) == 0 ||
> -   memcmp(shname, "sk_msg", 6) == 0) {
> +   memcmp(shname, "sk_msg", 6) == 0 ||
> +   memcmp(shname, "ir_decoder", 10) == 0) {
> ret = load_and_attach(shname, data->d_buf,
>   data->d_size);
> if (ret != 0)
> diff --git a/samples/bpf/grundig_decoder_kern.c 
> b/samples/bpf/grundig_decoder_kern.c
> new file mode 100644
> index ..c80f2c9cc69a
> --- /dev/null
> +++ b/samples/bpf/grundig_decoder_kern.c
> @@ -0,0 +1,112 @@
> +
> +#include 
> +#include 
> +#include "bpf_helpers.h"
> +#include 
> +
> +enum grundig_state {
> +   STATE_INACTIVE,
> +   STATE_HEADER_SPACE,
> +   STATE_LEADING_PULSE,
> +   STATE_BITS_SPACE,
> +   STATE_BITS_PULSE,
> +};
> +
> +struct decoder_state {
> +   u32 bits;
> +   enum grundig_state state;
> +   u32 count;
> +   u32 last_space;
> +};
> +
> +struct bpf_map_def 

Re: [PATCH v1 3/4] media: rc bpf: move ir_raw_event to uapi

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:11 PM, Sean Young  wrote:
> The context provided to a BPF_PROG_RAWIR_DECODER is a struct ir_raw_event;
> ensure user space has a a definition.
>
> Signed-off-by: Sean Young 
> ---
>  include/media/rc-core.h| 19 +--
>  include/uapi/linux/bpf_rcdev.h | 24 

Patch #2 already referenced this file. So if Patches #1 and #2
applied, there will be
a compilation error. Could you re-arrange your patchset so that after
sequentially
applying each patch, there is no compilation error?

>  2 files changed, 25 insertions(+), 18 deletions(-)
>  create mode 100644 include/uapi/linux/bpf_rcdev.h
>
> diff --git a/include/media/rc-core.h b/include/media/rc-core.h
> index 6742fd86ff65..5d31e31d8ade 100644
> --- a/include/media/rc-core.h
> +++ b/include/media/rc-core.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /**
> @@ -299,24 +300,6 @@ void rc_keydown_notimeout(struct rc_dev *dev, enum 
> rc_proto protocol,
>  void rc_keyup(struct rc_dev *dev);
>  u32 rc_g_keycode_from_table(struct rc_dev *dev, u32 scancode);
>
> -/*
> - * From rc-raw.c
> - * The Raw interface is specific to InfraRed. It may be a good idea to
> - * split it later into a separate header.
> - */
> -struct ir_raw_event {
> -   union {
> -   u32 duration;
> -   u32 carrier;
> -   };
> -   u8  duty_cycle;
> -
> -   unsignedpulse:1;
> -   unsignedreset:1;
> -   unsignedtimeout:1;
> -   unsignedcarrier_report:1;
> -};
> -
>  #define DEFINE_IR_RAW_EVENT(event) struct ir_raw_event event = {}
>
>  static inline void init_ir_raw_event(struct ir_raw_event *ev)
> diff --git a/include/uapi/linux/bpf_rcdev.h b/include/uapi/linux/bpf_rcdev.h
> new file mode 100644
> index ..d8629ff2b960
> --- /dev/null
> +++ b/include/uapi/linux/bpf_rcdev.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2018 Sean Young 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +#ifndef _UAPI__LINUX_BPF_RCDEV_H__
> +#define _UAPI__LINUX_BPF_RCDEV_H__
> +
> +struct ir_raw_event {
> +   union {
> +   __u32   duration;
> +   __u32   carrier;
> +   };
> +   __u8duty_cycle;
> +
> +   unsignedpulse:1;
> +   unsignedreset:1;
> +   unsignedtimeout:1;
> +   unsignedcarrier_report:1;
> +};
> +
> +#endif /* _UAPI__LINUX_BPF_RCDEV_H__ */
> --
> 2.17.0
>


Re: [PATCH v1 2/4] media: bpf: allow raw IR decoder bpf programs to be used

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:10 PM, Sean Young  wrote:
> This implements attaching, detaching, querying and execution. The target
> fd has to be the /dev/lircN device.
>
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/ir-bpf-decoder.c | 191 ++
>  drivers/media/rc/lirc_dev.c   |  30 +
>  drivers/media/rc/rc-core-priv.h   |  15 +++
>  drivers/media/rc/rc-ir-raw.c  |   5 +
>  include/uapi/linux/bpf.h  |   1 +
>  kernel/bpf/syscall.c  |   7 ++
>  6 files changed, 249 insertions(+)
>
> diff --git a/drivers/media/rc/ir-bpf-decoder.c 
> b/drivers/media/rc/ir-bpf-decoder.c
> index aaa5e208b1a5..651590a14772 100644
> --- a/drivers/media/rc/ir-bpf-decoder.c
> +++ b/drivers/media/rc/ir-bpf-decoder.c
> @@ -91,3 +91,194 @@ const struct bpf_verifier_ops ir_decoder_verifier_ops = {
> .get_func_proto  = ir_decoder_func_proto,
> .is_valid_access = ir_decoder_is_valid_access
>  };
> +
> +#define BPF_MAX_PROGS 64
> +
> +int rc_dev_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog, u32 flags)

flags is not used in this function.

> +{
> +   struct ir_raw_event_ctrl *raw;
> +   struct bpf_prog_array __rcu *old_array;
> +   struct bpf_prog_array *new_array;
> +   int ret;
> +
> +   if (rcdev->driver_type != RC_DRIVER_IR_RAW)
> +   return -EINVAL;
> +
> +   ret = mutex_lock_interruptible(>lock);
> +   if (ret)
> +   return ret;
> +
> +   raw = rcdev->raw;
> +
> +   if (raw->progs && bpf_prog_array_length(raw->progs) >= BPF_MAX_PROGS) 
> {
> +   ret = -E2BIG;
> +   goto out;
> +   }
> +
> +   old_array = raw->progs;
> +   ret = bpf_prog_array_copy(old_array, NULL, prog, _array);
> +   if (ret < 0)
> +   goto out;
> +
> +   rcu_assign_pointer(raw->progs, new_array);
> +   bpf_prog_array_free(old_array);
> +out:
> +   mutex_unlock(>lock);
> +   return ret;
> +}
> +
> +int rc_dev_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog, u32 flags)

flags is not used in this function.

> +{
> +   struct ir_raw_event_ctrl *raw;
> +   struct bpf_prog_array __rcu *old_array;
> +   struct bpf_prog_array *new_array;
> +   int ret;
> +
> +   if (rcdev->driver_type != RC_DRIVER_IR_RAW)
> +   return -EINVAL;
> +
> +   ret = mutex_lock_interruptible(>lock);
> +   if (ret)
> +   return ret;
> +
> +   raw = rcdev->raw;
> +
> +   old_array = raw->progs;
> +   ret = bpf_prog_array_copy(old_array, prog, NULL, _array);
> +   if (ret < 0) {
> +   bpf_prog_array_delete_safe(old_array, prog);
> +   } else {
> +   rcu_assign_pointer(raw->progs, new_array);
> +   bpf_prog_array_free(old_array);
> +   }
> +
> +   bpf_prog_put(prog);
> +   mutex_unlock(>lock);
> +   return 0;
> +}
> +
> +void rc_dev_bpf_run(struct rc_dev *rcdev)
> +{
> +   struct ir_raw_event_ctrl *raw = rcdev->raw;
> +
> +   if (raw->progs)
> +   BPF_PROG_RUN_ARRAY(raw->progs, >prev_ev, BPF_PROG_RUN);
> +}
> +
> +void rc_dev_bpf_put(struct rc_dev *rcdev)
> +{
> +   struct bpf_prog_array *progs = rcdev->raw->progs;
> +   int i, size;
> +
> +   if (!progs)
> +   return;
> +
> +   size = bpf_prog_array_length(progs);
> +   for (i = 0; i < size; i++)
> +   bpf_prog_put(progs->progs[i]);
> +
> +   bpf_prog_array_free(rcdev->raw->progs);
> +}
> +
> +int rc_dev_prog_attach(const union bpf_attr *attr)
> +{
> +   struct bpf_prog *prog;
> +   struct rc_dev *rcdev;
> +   int ret;
> +
> +   if (attr->attach_flags & BPF_F_ALLOW_OVERRIDE)
> +   return -EINVAL;

Looks like you really did not use flags except here.
BPF_F_ALLOW_OVERRIDE is originally used for
cgroup type of attachment and the comment explicits
saying so.

In the query below, the flags value "0" is copied to userspace.

In your case, I think you can just disallow any value, i.g.,
attr->attach_flags must be 0, and then you further down
check that if the prog is already in the array, you just return an error.

> +
> +   prog = bpf_prog_get_type(attr->attach_bpf_fd,
> +BPF_PROG_TYPE_RAWIR_DECODER);
> +   if (IS_ERR(prog))
> +   return PTR_ERR(prog);
> +
> +   rcdev = rc_dev_get_from_fd(attr->target_fd);
> +   if (IS_ERR(rcdev)) {
> +   bpf_prog_put(prog);
> +   return PTR_ERR(rcdev);
> +   }
> +
> +   ret = rc_dev_bpf_attach(rcdev, prog, attr->attach_flags);
> +   if (ret)
> +   bpf_prog_put(prog);
> +
> +   put_device(>dev);
> +
> +   return ret;
> +}
> +
> +int rc_dev_prog_detach(const union bpf_attr *attr)
> +{
> +   struct bpf_prog *prog;
> +   struct rc_dev *rcdev;
> +   int ret;
> +
> +   if (attr->attach_flags & BPF_F_ALLOW_OVERRIDE)
> +   

Re: [PATCH v1 1/4] media: rc: introduce BPF_PROG_IR_DECODER

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:10 PM, Sean Young  wrote:
> Add support for BPF_PROG_IR_DECODER. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
>
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/Kconfig  |  8 +++
>  drivers/media/rc/Makefile |  1 +
>  drivers/media/rc/ir-bpf-decoder.c | 93 +++
>  include/linux/bpf_types.h |  3 +
>  include/uapi/linux/bpf.h  | 16 +-
>  5 files changed, 120 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/media/rc/ir-bpf-decoder.c
>
> diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
> index eb2c3b6eca7f..10ad6167d87c 100644
> --- a/drivers/media/rc/Kconfig
> +++ b/drivers/media/rc/Kconfig
> @@ -120,6 +120,14 @@ config IR_IMON_DECODER
>remote control and you would like to use it with a raw IR
>receiver, or if you wish to use an encoder to transmit this IR.
>
> +config IR_BPF_DECODER
> +   bool "Enable IR raw decoder using BPF"
> +   depends on BPF_SYSCALL
> +   depends on RC_CORE=y
> +   help
> +  Enable this option to make it possible to load custom IR
> +  decoders written in BPF.
> +
>  endif #RC_DECODERS
>
>  menuconfig RC_DEVICES
> diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
> index 2e1c87066f6c..12e1118430d0 100644
> --- a/drivers/media/rc/Makefile
> +++ b/drivers/media/rc/Makefile
> @@ -5,6 +5,7 @@ obj-y += keymaps/
>  obj-$(CONFIG_RC_CORE) += rc-core.o
>  rc-core-y := rc-main.o rc-ir-raw.o
>  rc-core-$(CONFIG_LIRC) += lirc_dev.o
> +rc-core-$(CONFIG_IR_BPF_DECODER) += ir-bpf-decoder.o
>  obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
>  obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
>  obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
> diff --git a/drivers/media/rc/ir-bpf-decoder.c 
> b/drivers/media/rc/ir-bpf-decoder.c
> new file mode 100644
> index ..aaa5e208b1a5
> --- /dev/null
> +++ b/drivers/media/rc/ir-bpf-decoder.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// ir-bpf-decoder.c - handles bpf decoders
> +//
> +// Copyright (C) 2018 Sean Young 
> +
> +#include 
> +#include 
> +#include "rc-core-priv.h"
> +
> +/*
> + * BPF interface for raw IR decoder
> + */
> +const struct bpf_prog_ops ir_decoder_prog_ops = {
> +};
> +
> +BPF_CALL_1(bpf_rc_repeat, struct ir_raw_event*, event)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, prev_ev);
> +
> +   rc_repeat(ctrl->dev);
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_repeat_proto = {
> +   .func  = bpf_rc_repeat,
> +   .gpl_only  = true, // rc_repeat is EXPORT_SYMBOL_GPL
> +   .ret_type  = RET_VOID,

I suggest using RET_INTEGER here since we do return an integer 0.
RET_INTEGER will also make it easy to extend if you want to return
a non-zero value for error code or other reasons.

> +   .arg1_type = ARG_PTR_TO_CTX,
> +};
> +
> +BPF_CALL_4(bpf_rc_keydown, struct ir_raw_event*, event, u32, protocol,
> +  u32, scancode, u32, toggle)
> +{
> +   struct ir_raw_event_ctrl *ctrl;
> +
> +   ctrl = container_of(event, struct ir_raw_event_ctrl, prev_ev);
> +   rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
> +   return 0;
> +}
> +
> +static const struct bpf_func_proto rc_keydown_proto = {
> +   .func  = bpf_rc_keydown,
> +   .gpl_only  = true, // rc_keydown is EXPORT_SYMBOL_GPL
> +   .ret_type  = RET_VOID,

ditto. RET_INTEGER is preferable.

> +   .arg1_type = ARG_PTR_TO_CTX,
> +   .arg2_type = ARG_ANYTHING,
> +   .arg3_type = ARG_ANYTHING,
> +   .arg4_type = ARG_ANYTHING,
> +};
> +
> +static const struct bpf_func_proto *ir_decoder_func_proto(enum bpf_func_id 
> func_id, const struct bpf_prog *prog)
> +{
> +   switch (func_id) {
> +   case BPF_FUNC_rc_repeat:
> +   return _repeat_proto;
> +   case BPF_FUNC_rc_keydown:
> +   return _keydown_proto;
> +   case BPF_FUNC_map_lookup_elem:
> +   return _map_lookup_elem_proto;
> +   case BPF_FUNC_map_update_elem:
> +   return _map_update_elem_proto;
> +   case BPF_FUNC_map_delete_elem:
> +   return _map_delete_elem_proto;
> +   case BPF_FUNC_ktime_get_ns:
> +   return _ktime_get_ns_proto;
> +   case BPF_FUNC_tail_call:
> +   return _tail_call_proto;
> +   case BPF_FUNC_get_prandom_u32:
> +   return _get_prandom_u32_proto;
> +   default:
> +   return NULL;
> +   }
> +}
> +
> +static bool ir_decoder_is_valid_access(int off, int size,
> +  enum bpf_access_type type,
> +  const struct bpf_prog *prog,
> +  struct 

Re: [PATCH bpf-next v2 8/8] bpf: add ld64 imm test cases

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 2:22 PM, Daniel Borkmann  wrote:
> Add test cases where we combine semi-random imm values, mainly for testing
> JITs when they have different encoding options for 64 bit immediates in
> order to reduce resulting image size.
>
> Signed-off-by: Daniel Borkmann 

Acked-by: Yonghong Song 


Re: [PATCH bpf-next] selftests/bpf: make sure build-id is on

2018-05-14 Thread Y Song
On Mon, May 14, 2018 at 5:11 PM, Alexei Starovoitov  wrote:
> --build-id may not be a default linker config.
> Make sure it's used when linking urandom_read test program.
> Otherwise test_stacktrace_build_id[_nmi] tests will be failling.
>
> Signed-off-by: Alexei Starovoitov 

Tested and the change looks good.
Acked-by: Yonghong Song 

> ---
>  tools/testing/selftests/bpf/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index 438d4f93875b..133ebc68cbe4 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -19,7 +19,7 @@ all: $(TEST_CUSTOM_PROGS)
>  $(TEST_CUSTOM_PROGS): urandom_read
>
>  urandom_read: urandom_read.c
> -   $(CC) -o $(TEST_CUSTOM_PROGS) -static $<
> +   $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
>
>  # Order correspond to 'make run_tests' order
>  TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
> test_progs \
> --
> 2.9.5
>


Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino

2018-05-14 Thread Y Song
On Sun, May 13, 2018 at 10:33 AM, Alban Crequy  wrote:
> From: Alban Crequy 
>
> bpf_get_current_cgroup_ino() allows BPF trace programs to get the inode
> of the cgroup where the current process resides.
>
> My use case is to get statistics about syscalls done by a specific
> Kubernetes container. I have a tracepoint on raw_syscalls/sys_enter and
> a BPF map containing the cgroup inode that I want to trace. I use
> bpf_get_current_cgroup_ino() and I quickly return from the tracepoint if
> the inode is not in the BPF hash map.

Alternatively, the kernel already has bpf_current_task_under_cgroup helper
which uses a cgroup array to store cgroup fd's. If the current task is
in the hierarchy of a particular cgroup, the helper will return true.

One difference between your helper and bpf_current_task_under_cgroup() is
that your helper tests against a particular cgroup, not including its
children, but
bpf_current_task_under_cgroup() will return true even the task is in a
nested cgroup.

Maybe this will work for you?

>
> Without this BPF helper, I would need to keep track of all pids in the
> container. The Netlink proc connector can be used to follow process
> creation and destruction but it is racy.
>
> This patch only looks at the memory cgroup, which was enough for me
> since each Kubernetes container is placed in a different mem cgroup.
> For a generic implementation, I'm not sure how to proceed: it seems I
> would need to use 'for_each_root(root)' (see example in
> proc_cgroup_show() from kernel/cgroup/cgroup.c) but I don't know if
> taking the cgroup mutex is possible in the BPF helper function. It might
> be ok in the tracepoint raw_syscalls/sys_enter but could the mutex
> already be taken in some other tracepoints?

mutex is not allowed in a helper since it can block.

>
> Signed-off-by: Alban Crequy 
> ---
>  include/uapi/linux/bpf.h | 11 ++-
>  kernel/trace/bpf_trace.c | 25 +
>  2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..38ac3959cdf3 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -755,6 +755,14 @@ union bpf_attr {
>   * @addr: pointer to struct sockaddr to bind socket to
>   * @addr_len: length of sockaddr structure
>   * Return: 0 on success or negative error code
> + *
> + * u64 bpf_get_current_cgroup_ino(hierarchy, flags)
> + * Get the cgroup{1,2} inode of current task under the specified 
> hierarchy.
> + * @hierarchy: cgroup hierarchy

Not sure what is the value to specify hierarchy here.
A cgroup directory fd?

> + * @flags: reserved for future use
> + * Return:
> + *   == 0 error

looks like < 0 means error.

> + *> 0 inode of the cgroup
   >= 0 means good?
>   */
>  #define __BPF_FUNC_MAPPER(FN)  \
> FN(unspec), \
> @@ -821,7 +829,8 @@ union bpf_attr {
> FN(msg_apply_bytes),\
> FN(msg_cork_bytes), \
> FN(msg_pull_data),  \
> -   FN(bind),
> +   FN(bind),   \
> +   FN(get_current_cgroup_ino),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 56ba0f2a01db..9bf92a786639 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -524,6 +524,29 @@ static const struct bpf_func_proto 
> bpf_probe_read_str_proto = {
> .arg3_type  = ARG_ANYTHING,
>  };
>
> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
> +{
> +   // TODO: pick the correct hierarchy instead of the mem controller
> +   struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
> +
> +   if (unlikely(!cgrp))
> +   return -EINVAL;
> +   if (unlikely(hierarchy))
> +   return -EINVAL;
> +   if (unlikely(flags))
> +   return -EINVAL;
> +
> +   return cgrp->kn->id.ino;
> +}
> +
> +static const struct bpf_func_proto bpf_get_current_cgroup_ino_proto = {
> +   .func   = bpf_get_current_cgroup_ino,
> +   .gpl_only   = false,
> +   .ret_type   = RET_INTEGER,
> +   .arg1_type  = ARG_DONTCARE,
> +   .arg2_type  = ARG_DONTCARE,
> +};
> +
>  static const struct bpf_func_proto *
>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -564,6 +587,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct 
> bpf_prog *prog)
> return _get_prandom_u32_proto;
> case BPF_FUNC_probe_read_str:
> return _probe_read_str_proto;
> +   case BPF_FUNC_get_current_cgroup_ino:
> +   return _get_current_cgroup_ino_proto;
> default:
> return NULL;
> }
> --

Re: [PATCH bpf-next 3/4] samples: bpf: fix build after move to compiling full libbpf.a

2018-05-14 Thread Y Song
On Fri, May 11, 2018 at 5:17 PM, Jakub Kicinski
 wrote:
> There are many ways users may compile samples, some of them got
> broken by commit 5f9380572b4b ("samples: bpf: compile and link
> against full libbpf").  Improve path resolution and make libbpf
> building a dependency of source files to force its build.
>
> Samples should now again build with any of:
>  cd samples/bpf; make
>  make samples/bpf
>  make -C samples/bpf
>  cd samples/bpf; make O=builddir
>  make samples/bpf O=builddir
>  make -C samples/bpf O=builddir

I typically built samples/bpf/ this way:
export KBUILD_OUTPUT=/home/yhs/linux-bld
at linux source directory:
make defconfig
in /home/yhs/linux-bld,
make -j100 && make headers_install && make samples/bpf/

With this patch, the build for samples/bpf/ still failed.

-bash-4.2$ make samples/bpf/
  CHK include/config/kernel.release
  Using /data/users/yhs/work/net-next as source for kernel
  GEN ./Makefile
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALL/data/users/yhs/work/net-next/scripts/checksyscalls.sh
  DESCEND  objtool
  CHK scripts/mod/devicetable-offsets.h
scripts/Makefile.host:106: target
`samples/bpf//data/users/yhs/work/net-next/samples/bpf/../../tools/lib/bpf/libbpf.a'
doesn't match the target pattern
make -C /data/users/yhs/work/net-next/samples/bpf/../../tools/lib/bpf/
O= srctree=/data/users/yhs/work/net-next/samples/bpf/../../

Auto-detecting system features:
...libelf: [ OFF ]
...   bpf: [ OFF ]

No libelf found
make[4]: *** [elfdep] Error 1
make[3]: *** 
[/data/users/yhs/work/net-next/samples/bpf/../../tools/lib/bpf/libbpf.a]
Error 2
make[2]: *** [samples/bpf/] Error 2
make[1]: *** [sub-make] Error 2
make: *** [__sub-make] Error 2
-bash-4.2$

I noticied that I might use BPF_SAMPLES_PATH variable to tell where is
the samples/bpf source.
Below is what I did:
-bash-4.2$ BPF_SAMPLES_PATH=../../../net-next/samples/bpf make
samples/bpf/
  CHK include/config/kernel.release
  Using /data/users/yhs/work/net-next as source for kernel
  GEN ./Makefile
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALL/data/users/yhs/work/net-next/scripts/checksyscalls.sh
  DESCEND  objtool
  CHK scripts/mod/devicetable-offsets.h
scripts/Makefile.host:106: target
`samples/bpf/../../../net-next/samples/bpf/../../tools/lib/bpf/libbpf.a'
doesn't match the target pattern
  CHK samples/bpf/syscall_nrs.h
  HOSTCC  samples/bpf/../../../net-next/samples/bpf/../../tools/lib/bpf/libbpf.a
gcc: fatal error: no input files
compilation terminated.
make[3]: *** 
[samples/bpf/../../../net-next/samples/bpf/../../tools/lib/bpf/libbpf.a]
Error 4
make[2]: *** [samples/bpf/] Error 2
make[1]: *** [sub-make] Error 2
make: *** [__sub-make] Error 2
-bash-4.2$

-bash-4.2$ pwd
/home/yhs/work
-bash-4.2$ ls
linux-bld  net-next
-bash-4.2$

>
> Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf")
> Reported-by: Björn Töpel 
> Signed-off-by: Jakub Kicinski 
> ---
>  samples/bpf/Makefile | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 9e255ca4059a..bed205ab1f81 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -1,4 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0
> +
> +BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
> +TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools
> +
>  # List of programs to build
>  hostprogs-y := test_lru_dist
>  hostprogs-y += sock_example
> @@ -49,7 +53,8 @@ hostprogs-y += xdpsock
>  hostprogs-y += xdp_fwd
>
>  # Libbpf dependencies
> -LIBBPF := ../../tools/lib/bpf/libbpf.a
> +LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
> +
>  CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
>  TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o
>
> @@ -233,15 +238,15 @@ CLANG_ARCH_ARGS = -target $(ARCH)
>  endif
>
>  # Trick to allow make to be run from this directory
> -all: $(LIBBPF)
> -   $(MAKE) -C ../../ $(CURDIR)/
> +all:
> +   $(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR)
>
>  clean:
> $(MAKE) -C ../../ M=$(CURDIR) clean
> @rm -f *~
>
>  $(LIBBPF): FORCE
> -   $(MAKE) -C $(dir $@)
> +   $(MAKE) -C $(dir $@) O= srctree=$(BPF_SAMPLES_PATH)/../../
>
>  $(obj)/syscall_nrs.s:  $(src)/syscall_nrs.c
> $(call if_changed_dep,cc_s_c)
> @@ -272,7 +277,8 @@ verify_target_bpf: verify_cmds
> exit 2; \
> else true; fi
>
> -$(src)/*.c: verify_target_bpf
> +$(BPF_SAMPLES_PATH)/*.c: verify_target_bpf 

Re: [PATCH bpf-next 3/6] bpf, x64: clean up retpoline emission slightly

2018-05-14 Thread Y Song
On Sun, May 13, 2018 at 5:26 PM, Daniel Borkmann  wrote:
> Make the RETPOLINE_{RA,ED}X_BPF_JIT() a bit more readable by
> cleaning up the macro, aligning comments and spacing.
>
> Signed-off-by: Daniel Borkmann 
> ---
>  arch/x86/include/asm/nospec-branch.h | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/nospec-branch.h 
> b/arch/x86/include/asm/nospec-branch.h
> index 2cd344d..2f700a1d 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -301,9 +301,9 @@ do {  
>   \
>   *jmp *%edx for x86_32
>   */
>  #ifdef CONFIG_RETPOLINE
> -#ifdef CONFIG_X86_64
> -# define RETPOLINE_RAX_BPF_JIT_SIZE17
> -# define RETPOLINE_RAX_BPF_JIT()   \
> +# ifdef CONFIG_X86_64
> +#  define RETPOLINE_RAX_BPF_JIT_SIZE   17
> +#  define RETPOLINE_RAX_BPF_JIT()  \
>  do {   \
> EMIT1_off32(0xE8, 7);/* callq do_rop */ \
> /* spec_trap: */\
> @@ -314,8 +314,8 @@ do {  
>   \
> EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */\
> EMIT1(0xC3); /* retq */ \
>  } while (0)
> -#else
> -# define RETPOLINE_EDX_BPF_JIT()   \
> +# else /* !CONFIG_X86_64 */
> +#  define RETPOLINE_EDX_BPF_JIT()  \
>  do {   \
> EMIT1_off32(0xE8, 7);/* call do_rop */  \
> /* spec_trap: */\
> @@ -326,17 +326,16 @@ do {
>   \
> EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */  \
> EMIT1(0xC3); /* ret */  \
>  } while (0)
> -#endif
> +# endif
>  #else /* !CONFIG_RETPOLINE */
> -
> -#ifdef CONFIG_X86_64
> -# define RETPOLINE_RAX_BPF_JIT_SIZE2
> -# define RETPOLINE_RAX_BPF_JIT()   \
> -   EMIT2(0xFF, 0xE0);   /* jmp *%rax */
> -#else
> -# define RETPOLINE_EDX_BPF_JIT()   \
> -   EMIT2(0xFF, 0xE2) /* jmp *%edx */
> -#endif
> +# ifdef CONFIG_X86_64
> +#  define RETPOLINE_RAX_BPF_JIT_SIZE   2
> +#  define RETPOLINE_RAX_BPF_JIT()  \
> +   EMIT2(0xFF, 0xE0);   /* jmp *%rax */
> +# else /* !CONFIG_X86_64 */
> +#  define RETPOLINE_EDX_BPF_JIT()  \
> +   EMIT2(0xFF, 0xE2)/* jmp *%edx */
> +# endif
>  #endif
>
>  #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
> --
> 2.9.5
>

Acked-by: Yonghong Song 


Re: [PATCH bpf-next] tools include uapi: Grab a copy of linux/erspan.h

2018-04-30 Thread Y Song
On Mon, Apr 30, 2018 at 7:33 AM, Daniel Borkmann  wrote:
> On 04/30/2018 04:26 PM, William Tu wrote:
>> Bring the erspan uapi header file so BPF tunnel helpers can use it.
>>
>> Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.")
>> Reported-by: Yonghong Song 
>> Signed-off-by: William Tu 
>
> Thanks for the patch, William! I also Cc'ed Yonghong here, so he has a
> chance to try it out.

Just tried it out. It works. Thanks for fixing!
Acked-by: Yonghong Song 


Re: [PATCH bpf-next] selftests/bpf: bpf tunnel test.

2018-04-30 Thread Y Song
Hi, William,

When compiled the selftests/bpf in my centos 7 based system, I have
the following failures,

clang -I. -I./include/uapi -I../../../include/uapi
-Wno-compare-distinct-pointer-types \
 -O2 -target bpf -emit-llvm -c test_tunnel_kern.c -o - |  \
llc -march=bpf -mcpu=generic  -filetype=obj -o
/data/users/yhs/work/net-next/tools/testing/selftests/bpf/test_tunnel_kern.o
test_tunnel_kern.c:21:10: fatal error: 'linux/erspan.h' file not found
#include 
 ^~~~
1 error generated.

Maybe I missed some packages to install?

Thanks!


On Thu, Apr 26, 2018 at 6:59 AM, William Tu  wrote:
> On Wed, Apr 25, 2018 at 8:01 AM, William Tu  wrote:
>> The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
>> and samples/bpf/test_tunnel_bpf.sh to selftests.  There are a couple
>> changes from the original:
>> 1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
>> 2) simplify the original ipip tests (remove iperf tests)
>> 3) improve documentation
>> 4) use bpf_ntoh* and bpf_hton* api
>>
>> In summary, 'test_tunnel_kern.o' contains the following bpf program:
>>   GRE: gre_set_tunnel, gre_get_tunnel
>>   IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
>>   ERSPAN: erspan_set_tunnel, erspan_get_tunnel
>>   IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
>>   VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
>>   IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
>>   GENEVE: geneve_set_tunnel, geneve_get_tunnel
>>   IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
>>   IPIP: ipip_set_tunnel, ipip_get_tunnel
>>   IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
>>  ip6ip6_set_tunnel, ip6ip6_get_tunnel
>>
>> Signed-off-by: William Tu 
>> ---
>
> I made a mistake by removing the recent XFRM helper test cases.
> I will send v2.
>
> William


Re: [PATCH bpf-next v8 09/10] tools/bpf: add a test for bpf_get_stack with raw tracepoint prog

2018-04-28 Thread Y Song
On Sat, Apr 28, 2018 at 9:56 AM, Alexei Starovoitov
 wrote:
> On Sat, Apr 28, 2018 at 12:02:04AM -0700, Yonghong Song wrote:
>> The test attached a raw_tracepoint program to sched/sched_switch.
>> It tested to get stack for user space, kernel space and user
>> space with build_id request. It also tested to get user
>> and kernel stack into the same buffer with back-to-back
>> bpf_get_stack helper calls.
>>
>> Whenever the kernel stack is available, the user space
>> application will check to ensure that the kernel function
>> for raw_tracepoint ___bpf_prog_run is part of the stack.
>>
>> Signed-off-by: Yonghong Song 
> ...
>> +static int get_stack_print_output(void *data, int size)
>> +{
>> + bool good_kern_stack = false, good_user_stack = false;
>> + const char *expected_func = "___bpf_prog_run";
>
> so the test works with interpreter only?
> I guess that's ok for now, but needs to fixed for
> configs with CONFIG_BPF_JIT_ALWAYS_ON=y

I did not test CONFIG_BPF_JIT_ALWAYS_ON=y.
I can have a followup patch for this if the patch set does not need respin.


Re: possible deadlock in perf_event_detach_bpf_prog

2018-04-08 Thread Y Song
On Thu, Mar 29, 2018 at 2:18 PM, Daniel Borkmann  wrote:
> On 03/29/2018 11:04 PM, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
>> Linux 4.16-rc7
>> syzbot dashboard link: 
>> https://syzkaller.appspot.com/bug?extid=dc5ca0e4c9bfafaf2bae
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output: 
>> https://syzkaller.appspot.com/x/log.txt?id=4742532743299072
>> Kernel config: 
>> https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
>> compiler: gcc (GCC) 7.1.1 20170620
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+dc5ca0e4c9bfafaf2...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for details.
>> If you forward the report, please keep this part and the footer.
>>
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.16.0-rc7+ #3 Not tainted
>> --
>> syz-executor7/24531 is trying to acquire lock:
>>  (bpf_event_mutex){+.+.}, at: [<8a849b07>] 
>> perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
>>
>> but task is already holding lock:
>>  (>mmap_sem){}, at: [<38768f87>] vm_mmap_pgoff+0x198/0x280 
>> mm/util.c:353
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (>mmap_sem){}:
>>__might_fault+0x13a/0x1d0 mm/memory.c:4571
>>_copy_to_user+0x2c/0xc0 lib/usercopy.c:25
>>copy_to_user include/linux/uaccess.h:155 [inline]
>>bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
>>perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
>
> Looks like we should move the two copy_to_user() outside of
> bpf_event_mutex section to avoid the deadlock.

This is introduced by one of my previous patches. The above suggested fix
makes sense. I will craft a patch and send to the mailing list for bpf branch
soon.

>
>>_perf_ioctl kernel/events/core.c:4750 [inline]
>>perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
>>vfs_ioctl fs/ioctl.c:46 [inline]
>>do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>>SYSC_ioctl fs/ioctl.c:701 [inline]
>>SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>>do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>
>> -> #0 (bpf_event_mutex){+.+.}:
>>lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
>>perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
>>_free_event+0xbdb/0x10f0 kernel/events/core.c:4116
>>put_event+0x24/0x30 kernel/events/core.c:4204
>>perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
>>remove_vma+0xb4/0x1b0 mm/mmap.c:172
>>remove_vma_list mm/mmap.c:2490 [inline]
>>do_munmap+0x82a/0xdf0 mm/mmap.c:2731
>>mmap_region+0x59e/0x15a0 mm/mmap.c:1646
>>do_mmap+0x6c0/0xe00 mm/mmap.c:1483
>>do_mmap_pgoff include/linux/mm.h:2223 [inline]
>>vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
>>SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
>>SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
>>SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>>SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
>>do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>
>> other info that might help us debug this:
>>
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(>mmap_sem);
>>lock(bpf_event_mutex);
>>lock(>mmap_sem);
>>   lock(bpf_event_mutex);
>>
>>  *** DEADLOCK ***
>>
>> 1 lock held by syz-executor7/24531:
>>  #0:  (>mmap_sem){}, at: [<38768f87>] 
>> vm_mmap_pgoff+0x198/0x280 mm/util.c:353
>>
>> stack backtrace:
>> CPU: 0 PID: 24531 Comm: syz-executor7 Not tainted 4.16.0-rc7+ #3
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>>  print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
>>  check_prev_add kernel/locking/lockdep.c:1863 [inline]
>>  check_prevs_add kernel/locking/lockdep.c:1976 [inline]
>>  validate_chain kernel/locking/lockdep.c:2417 [inline]
>>  __lock_acquire+0x30a8/0x3e00 

Re: [PATCH bpf] bpf: do not modify min/max bounds on scalars with constant values

2018-01-15 Thread Y Song
On Mon, Jan 15, 2018 at 2:40 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 01/15/2018 07:38 AM, Y Song wrote:
>> On Fri, Jan 12, 2018 at 11:23 AM, Daniel Borkmann <dan...@iogearbox.net> 
>> wrote:
> [...]
>>>
>>> I've been thinking to additionally reject arithmetic on ctx
>>> pointer in adjust_ptr_min_max_vals() right upfront as well
>>> since we reject actual access in such case later on anyway,
>>> but there's a use case in tracing (in bcc) in combination
>>> with passing such ctx to bpf_probe_read(), so we cannot do
>>> that part.
>>
>> There is a reason why bcc does this. For example, suppose that we want to
>> trace kernel tracepoint, sched_process_exec,
>>
>> TRACE_EVENT(sched_process_exec,
>>
>> TP_PROTO(struct task_struct *p, pid_t old_pid,
>>  struct linux_binprm *bprm),
>>
>> TP_ARGS(p, old_pid, bprm),
>>
>> TP_STRUCT__entry(
>> __string(   filename,   bprm->filename  )
>> __field(pid_t,  pid )
>> __field(pid_t,  old_pid )
>> ),
>>
>> TP_fast_assign(
>> __assign_str(filename, bprm->filename);
>> __entry->pid= p->pid;
>> __entry->old_pid= old_pid;
>> ),
>>
>> TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename),
>>   __entry->pid, __entry->old_pid)
>> );
>>
>> Eventually structure at
>> /sys/kernel/debug/tracing/event/sched/sched_process_exec/format:
>> ..
>> field:__data_loc char[] filename;   offset:8;
>> size:4; signed:1;
>> field:pid_t pid;offset:12;  size:4; signed:1;
>> field:pid_t old_pid;offset:16;  size:4; signed:1;
>>
>> and "data_loc filename" is the offset in the structure where
>> "filename" is stored.
>>
>> Therefore, in bcc, the access sequence is:
>> offset = args->filename;  /* field __data_loc filename */
>> bpf_probe_read(, len, (char *)args + offset);
>>
>> For this kind of dynamic array in the tracepoint, the offset to access
>> certain field in ctx will be unknown at verification time.
>
> Right, that is exactly what I said in above paragraph.
>
>> So I suggest to remove the above paragraph regarding to potential ctx+offset
>> rejection.
>
> I'm confused, I mentioned we cannot reject exactly because of this
> use-case, I thought it's worth having it in the log for future
> reference so we don't forget about it since it's not too obvious.

I thought it was irrelevant to the main subject of the patch (e.g., dealing with
constant min/max, etc.) But I agree that having a log for a reference is
better since it is not too obvious.

Could you describe the "use case in bcc" part a little more in detail then?
Otherwise, it may not be really clear for some people.

Sorry for causing your confusion!

Thanks,

>
> Cheers,
> Daniel


Re: [PATCH bpf] bpf: do not modify min/max bounds on scalars with constant values

2018-01-14 Thread Y Song
On Fri, Jan 12, 2018 at 11:23 AM, Daniel Borkmann  wrote:
> syzkaller generated a BPF proglet and triggered a warning with
> the following:
>
>   0: (b7) r0 = 0
>   1: (d5) if r0 s<= 0x0 goto pc+0
>R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
>   2: (1f) r0 -= r1
>R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
>   verifier internal error: known but bad sbounds
>
> What happens is that in the first insn, r0's min/max value are
> both 0 due to the immediate assignment, later in the jsle test
> the bounds are updated for the min value in the false path,
> meaning, they yield smin_val = 1 smax_val = 0, and when ctx
> pointer is subtracted from r0, verifier bails out with the
> internal error and throwing a WARN since smin_val != smax_val
> for the known constant.
>
> Fix is to not update any bounds of the register holding the
> constant, thus in reg_set_min_max() and reg_set_min_max_inv()
> we return right away, similarly as in the case when the dst
> register holds a pointer value. Reason for doing so is rather
> straight forward: when we have a register holding a constant
> as dst, then {s,u}min_val == {s,u}max_val, thus it cannot get
> any more specific in terms of upper/lower bounds than this.
>
> In reg_set_min_max() and reg_set_min_max_inv() it's fine to
> only check false_reg similarly as with __is_pointer_value()
> check since at this point in time, false_reg and true_reg
> both hold the same state, so we only need to pick one. This
> fixes the bug and properly rejects the program with an error
> of 'R0 tried to subtract pointer from scalar'.
>
> I've been thinking to additionally reject arithmetic on ctx
> pointer in adjust_ptr_min_max_vals() right upfront as well
> since we reject actual access in such case later on anyway,
> but there's a use case in tracing (in bcc) in combination
> with passing such ctx to bpf_probe_read(), so we cannot do
> that part.

There is a reason why bcc does this. For example, suppose that we want to
trace kernel tracepoint, sched_process_exec,

TRACE_EVENT(sched_process_exec,

TP_PROTO(struct task_struct *p, pid_t old_pid,
 struct linux_binprm *bprm),

TP_ARGS(p, old_pid, bprm),

TP_STRUCT__entry(
__string(   filename,   bprm->filename  )
__field(pid_t,  pid )
__field(pid_t,  old_pid )
),

TP_fast_assign(
__assign_str(filename, bprm->filename);
__entry->pid= p->pid;
__entry->old_pid= old_pid;
),

TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename),
  __entry->pid, __entry->old_pid)
);

Eventually structure at
/sys/kernel/debug/tracing/event/sched/sched_process_exec/format:
..
field:__data_loc char[] filename;   offset:8;
size:4; signed:1;
field:pid_t pid;offset:12;  size:4; signed:1;
field:pid_t old_pid;offset:16;  size:4; signed:1;

and "data_loc filename" is the offset in the structure where
"filename" is stored.

Therefore, in bcc, the access sequence is:
offset = args->filename;  /* field __data_loc filename */
bpf_probe_read(, len, (char *)args + offset);

For this kind of dynamic array in the tracepoint, the offset to access
certain field in ctx will be unknown at verification time.

So I suggest to remove the above paragraph regarding to potential ctx+offset
rejection.

>
> Reported-by: syzbot+6d362cadd45dc0a12...@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann 
> ---
>  kernel/bpf/verifier.c   | 11 
>  tools/testing/selftests/bpf/test_verifier.c | 95 
> +
>  2 files changed, 106 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index b414d6b..6bf1275 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2617,6 +2617,14 @@ static void reg_set_min_max(struct bpf_reg_state 
> *true_reg,
>  */
> if (__is_pointer_value(false, false_reg))
> return;
> +   /* Same goes for constant values. They have {s,u}min == {s,u}max,
> +* it cannot get narrower than this. Same here, at the branch
> +* point false_reg and true_reg have the same type, so we only
> +* check false_reg here as well.
> +*/
> +   if (false_reg->type == SCALAR_VALUE &&
> +   tnum_is_const(false_reg->var_off))
> +   return;
>
> switch (opcode) {
> case BPF_JEQ:
> @@ -2689,6 +2697,9 @@ static void reg_set_min_max_inv(struct bpf_reg_state 
> *true_reg,
>  {
> if (__is_pointer_value(false, false_reg))
> return;
> +   if (false_reg->type == SCALAR_VALUE &&
> +   tnum_is_const(false_reg->var_off))
> +   return;
>
> switch (opcode) {
> case BPF_JEQ:
> diff --git 

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 4:29 PM, Atish Patra <atish.pa...@oracle.com> wrote:
> On 11/07/2017 04:42 PM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 2:04 PM, Alexei Starovoitov <a...@fb.com> wrote:
>>>
>>> On 11/8/17 6:47 AM, Y Song wrote:
>>>>
>>>>
>>>> On Tue, Nov 7, 2017 at 1:39 PM, Alexei Starovoitov <a...@fb.com> wrote:
>>>>>
>>>>>
>>>>> On 11/8/17 6:14 AM, Y Song wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>>>>>> <naveen.n@linux.vnet.ibm.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Alexei Starovoitov wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I thought such struct shouldn't change layout.
>>>>>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>>>>>> anon struct as well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We considered that, but it looked to be very dependent on the
>>>>>>>>> version
>>>>>>>>> of
>>>>>>>>> gcc used to build the kernel. But, this may be a simpler approach
>>>>>>>>> for
>>>>>>>>> the shorter term.
>>>>>>>>>
>>>>>>>>
>>>>>>>> why it would depend on version of gcc?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From what I can see, randomized_struct_fields_start is defined only
>>>>>>> for
>>>>>>> gcc
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> structure. We may not care for older gcc versions, but..
>>>>>>>
>>>>>>> The other issue was that __randomize_layout maps to __designated_init
>>>>>>> when
>>>>>>> randstruct plugin is not enabled, which is in turn an attribute on
>>>>>>> gcc
>>>>>>>>
>>>>>>>> =
>>>>>>>
>>>>>>> v5.1, but not otherwise.
>>>>>>>
>>>>>>>> We just need this, no?
>>>>>>>>
>>>>>>>> diff --git a/include/linux/compiler-clang.h
>>>>>>>> b/include/linux/compiler-clang.h
>>>>>>>> index de179993e039..4e29ab6187cb 100644
>>>>>>>> --- a/include/linux/compiler-clang.h
>>>>>>>> +++ b/include/linux/compiler-clang.h
>>>>>>>> @@ -15,3 +15,6 @@
>>>>>>>>* with any version that can compile the kernel
>>>>>>>>*/
>>>>>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>>>>>> __COUNTER__)
>>>>>>>> +
>>>>>>>> +#define randomized_struct_fields_start struct {
>>>>>>>> +#define randomized_struct_fields_end   };
>>>>>>>>
>>>>>>>> since offsets are mandated by C standard.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is what we're testing with and is probably sufficient for
>>>>>>> our
>>>>>>> purposes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Just tested this with bcc. bcc actually complains. the rewriter
>>>>>> is not able to rewrite prev->pid where prev is "struct task_struct
>>>>>> *prev".
>>>>>> I will change bcc rewriter to see whether the field value is correct
>>>>>> or
>>>>>> not.
>>>>>>
>>>>>> Not sure my understanding is correct or not, but I am afraid that
>>>>>> the above approach for clang compiler change may not work.
>>>>>> If clang calculates the field offset based on header file, the offset
>>>>>> may not be the same as kernel one
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> why is that?
>>>>> When randomization is off both gcc and clang must generate the same
>>>>> offsets, since it's C standard.
>>>>
>>>>
>>>>
>>>> The patch changed compiler-clang.h, so gcc still do randomization.
>>>
>>>
>>>
>>> gcc_plugins are off by default and randomization will not be
>>> turned on for any sane distro or datacenter that cares about
>>> performance and stability.
>>> So imo above compiler-clang.h patch together with bcc fix would
>>> be enough.
>>
>>
>> Agree that short time the suggested fix should be enough.
>> Long time, disto could become "insane" someday :-)
>>
>
> Yup it works with clang compiler & bcc hack. Thanks :).
> I verified it with task_switch.py & runqlat.py.
>
> Are you going to push the bcc hacks to github repo ?

I will need to have proper implementation than a hack. Yes, once done, will
push into bcc repo. Should be done soon.

>
> Regards,
> Atish


Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 2:04 PM, Alexei Starovoitov <a...@fb.com> wrote:
> On 11/8/17 6:47 AM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 1:39 PM, Alexei Starovoitov <a...@fb.com> wrote:
>>>
>>> On 11/8/17 6:14 AM, Y Song wrote:
>>>>
>>>>
>>>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>>>> <naveen.n@linux.vnet.ibm.com> wrote:
>>>>>
>>>>>
>>>>> Alexei Starovoitov wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I thought such struct shouldn't change layout.
>>>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>>>> anon struct as well.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We considered that, but it looked to be very dependent on the version
>>>>>>> of
>>>>>>> gcc used to build the kernel. But, this may be a simpler approach for
>>>>>>> the shorter term.
>>>>>>>
>>>>>>
>>>>>> why it would depend on version of gcc?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> From what I can see, randomized_struct_fields_start is defined only for
>>>>> gcc
>>>>>>
>>>>>>
>>>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>>>
>>>>>
>>>>> structure. We may not care for older gcc versions, but..
>>>>>
>>>>> The other issue was that __randomize_layout maps to __designated_init
>>>>> when
>>>>> randstruct plugin is not enabled, which is in turn an attribute on gcc
>>>>> >=
>>>>> v5.1, but not otherwise.
>>>>>
>>>>>> We just need this, no?
>>>>>>
>>>>>> diff --git a/include/linux/compiler-clang.h
>>>>>> b/include/linux/compiler-clang.h
>>>>>> index de179993e039..4e29ab6187cb 100644
>>>>>> --- a/include/linux/compiler-clang.h
>>>>>> +++ b/include/linux/compiler-clang.h
>>>>>> @@ -15,3 +15,6 @@
>>>>>>* with any version that can compile the kernel
>>>>>>*/
>>>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>>>> __COUNTER__)
>>>>>> +
>>>>>> +#define randomized_struct_fields_start struct {
>>>>>> +#define randomized_struct_fields_end   };
>>>>>>
>>>>>> since offsets are mandated by C standard.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yes, this is what we're testing with and is probably sufficient for our
>>>>> purposes.
>>>>
>>>>
>>>>
>>>> Just tested this with bcc. bcc actually complains. the rewriter
>>>> is not able to rewrite prev->pid where prev is "struct task_struct
>>>> *prev".
>>>> I will change bcc rewriter to see whether the field value is correct or
>>>> not.
>>>>
>>>> Not sure my understanding is correct or not, but I am afraid that
>>>> the above approach for clang compiler change may not work.
>>>> If clang calculates the field offset based on header file, the offset
>>>> may not be the same as kernel one
>>>
>>>
>>>
>>> why is that?
>>> When randomization is off both gcc and clang must generate the same
>>> offsets, since it's C standard.
>>
>>
>> The patch changed compiler-clang.h, so gcc still do randomization.
>
>
> gcc_plugins are off by default and randomization will not be
> turned on for any sane distro or datacenter that cares about
> performance and stability.
> So imo above compiler-clang.h patch together with bcc fix would
> be enough.

Agree that short time the suggested fix should be enough.
Long time, disto could become "insane" someday :-)


Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 1:39 PM, Alexei Starovoitov <a...@fb.com> wrote:
> On 11/8/17 6:14 AM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>> <naveen.n@linux.vnet.ibm.com> wrote:
>>>
>>> Alexei Starovoitov wrote:
>>>>
>>>>
>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>
>>>>>>
>>>>>> I thought such struct shouldn't change layout.
>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>> anon struct as well.
>>>>>
>>>>>
>>>>>
>>>>> We considered that, but it looked to be very dependent on the version
>>>>> of
>>>>> gcc used to build the kernel. But, this may be a simpler approach for
>>>>> the shorter term.
>>>>>
>>>>
>>>> why it would depend on version of gcc?
>>>
>>>
>>>
>>> From what I can see, randomized_struct_fields_start is defined only for
>>> gcc
>>>>
>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>
>>> structure. We may not care for older gcc versions, but..
>>>
>>> The other issue was that __randomize_layout maps to __designated_init
>>> when
>>> randstruct plugin is not enabled, which is in turn an attribute on gcc >=
>>> v5.1, but not otherwise.
>>>
>>>> We just need this, no?
>>>>
>>>> diff --git a/include/linux/compiler-clang.h
>>>> b/include/linux/compiler-clang.h
>>>> index de179993e039..4e29ab6187cb 100644
>>>> --- a/include/linux/compiler-clang.h
>>>> +++ b/include/linux/compiler-clang.h
>>>> @@ -15,3 +15,6 @@
>>>>* with any version that can compile the kernel
>>>>*/
>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>> __COUNTER__)
>>>> +
>>>> +#define randomized_struct_fields_start struct {
>>>> +#define randomized_struct_fields_end   };
>>>>
>>>> since offsets are mandated by C standard.
>>>
>>>
>>>
>>> Yes, this is what we're testing with and is probably sufficient for our
>>> purposes.
>>
>>
>> Just tested this with bcc. bcc actually complains. the rewriter
>> is not able to rewrite prev->pid where prev is "struct task_struct *prev".
>> I will change bcc rewriter to see whether the field value is correct or
>> not.
>>
>> Not sure my understanding is correct or not, but I am afraid that
>> the above approach for clang compiler change may not work.
>> If clang calculates the field offset based on header file, the offset
>> may not be the same as kernel one
>
>
> why is that?
> When randomization is off both gcc and clang must generate the same
> offsets, since it's C standard.

The patch changed compiler-clang.h, so gcc still do randomization.

>
> bcc rewriter issue is odd. I suspect it was broken from day one.
> Meaning that bcc didn't support poking into anonymous union and structs.

This seems right.

>
>> I verified that the drawf info with randomized structure config does not
>> match randomized structure member offset. Specifically, I tried
>> linux/proc_ns.h struct proc_ns_operations,
>> dwarf says:
>>   field name: offset 0
>>   field real_ns_name: offset 8
>> But if you print out the real offset at runtime, you get 40 and 16
>> respectively.
>
>
> thanks for confirming. It means that gcc randomization plugin is broken
> and has to be fixed with regard to adjusting debug info while
> randomizing the fields.
>


Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 1:31 PM, Atish Patra <atish.pa...@oracle.com> wrote:
> On 11/07/2017 03:14 PM, Y Song wrote:
>>
>> On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
>> <naveen.n@linux.vnet.ibm.com> wrote:
>>>
>>> Alexei Starovoitov wrote:
>>>>
>>>>
>>>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:
>>>>>>
>>>>>>
>>>>>> I thought such struct shouldn't change layout.
>>>>>> If it is we need to fix include/linux/compiler-clang.h to do that
>>>>>> anon struct as well.
>>>>>
>>>>>
>>>>>
>>>>> We considered that, but it looked to be very dependent on the version
>>>>> of
>>>>> gcc used to build the kernel. But, this may be a simpler approach for
>>>>> the shorter term.
>>>>>
>>>>
>>>> why it would depend on version of gcc?
>>>
>>>
>>>
>>> From what I can see, randomized_struct_fields_start is defined only for
>>> gcc
>>>>
>>>> = 4.6. For older versions, it does not get mapped to an anonymous
>>>
>>> structure. We may not care for older gcc versions, but..
>>>
>>> The other issue was that __randomize_layout maps to __designated_init
>>> when
>>> randstruct plugin is not enabled, which is in turn an attribute on gcc >=
>>> v5.1, but not otherwise.
>>>
>>>> We just need this, no?
>>>>
>>>> diff --git a/include/linux/compiler-clang.h
>>>> b/include/linux/compiler-clang.h
>>>> index de179993e039..4e29ab6187cb 100644
>>>> --- a/include/linux/compiler-clang.h
>>>> +++ b/include/linux/compiler-clang.h
>>>> @@ -15,3 +15,6 @@
>>>>* with any version that can compile the kernel
>>>>*/
>>>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>>>> __COUNTER__)
>>>> +
>>>> +#define randomized_struct_fields_start struct {
>>>> +#define randomized_struct_fields_end   };
>>>>
>>>> since offsets are mandated by C standard.
>>>
>>>
>>>
>>> Yes, this is what we're testing with and is probably sufficient for our
>>> purposes.
>>
>>
>> Just tested this with bcc. bcc actually complains. the rewriter
>> is not able to rewrite prev->pid where prev is "struct task_struct *prev".
>> I will change bcc rewriter to see whether the field value is correct or
>> not.

Just verified in bcc with the following hack:
--- a/examples/tracing/task_switch.py
+++ b/examples/tracing/task_switch.py
@@ -20,7 +20,8 @@ int count_sched(struct pt_regs *ctx, struct
task_struct *prev) {
   u64 zero = 0, *val;

   key.curr_pid = bpf_get_current_pid_tgid();
-  key.prev_pid = prev->pid;
+  bpf_probe_read(_pid, 4, >pid);
+  // key.prev_pid = prev->pid;

   val = stats.lookup_or_init(, );
   (*val)++;
diff --git a/src/cc/frontends/clang/b_frontend_action.cc
b/src/cc/frontends/clang/b_frontend_action.cc
index c11d89e..df48115 100644
--- a/src/cc/frontends/clang/b_frontend_action.cc
+++ b/src/cc/frontends/clang/b_frontend_action.cc
@@ -827,6 +827,7 @@ void
BTypeConsumer::HandleTranslationUnit(ASTContext ) {
*/
   for (it = DC->decls_begin(); it != DC->decls_end(); it++) {
 Decl *D = *it;
+#if 0
 if (FunctionDecl *F = dyn_cast(D)) {
   if (fe_.is_rewritable_ext_func(F)) {
 for (auto arg : F->parameters()) {
@@ -836,6 +837,7 @@ void
BTypeConsumer::HandleTranslationUnit(ASTContext ) {
 probe_visitor_.TraverseDecl(D);
   }
 }
+#endif

Basically, explicitly using bpf_probe_read instead of letting rewriter
to change it.
Also disable probe rewriting part in the frontend.

I confirmed that value of prev->pid is always 0 (wrong).

The linux patch I have is:
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 54dfef70a072..5d9609ea8e30 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -16,3 +16,6 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+
+#define randomized_struct_fields_start struct {
+#define randomized_struct_fields_end   };

Therefore, getting bpf program to access randomized structure
through either BTF or fixed dwarfinfo may be the best option.
I know dwarfinfo is too big so smaller BTF is more desirable.

>>
>> Not sure my understanding is correct or not, but I am afraid that
>> the above approach for clang compiler change may not work.
>> If 

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-07 Thread Y Song
On Tue, Nov 7, 2017 at 12:37 AM, Naveen N. Rao
 wrote:
> Alexei Starovoitov wrote:
>>
>> On 11/7/17 12:55 AM, Naveen N. Rao wrote:

 I thought such struct shouldn't change layout.
 If it is we need to fix include/linux/compiler-clang.h to do that
 anon struct as well.
>>>
>>>
>>> We considered that, but it looked to be very dependent on the version of
>>> gcc used to build the kernel. But, this may be a simpler approach for
>>> the shorter term.
>>>
>>
>> why it would depend on version of gcc?
>
>
> From what I can see, randomized_struct_fields_start is defined only for gcc
>>= 4.6. For older versions, it does not get mapped to an anonymous
> structure. We may not care for older gcc versions, but..
>
> The other issue was that __randomize_layout maps to __designated_init when
> randstruct plugin is not enabled, which is in turn an attribute on gcc >=
> v5.1, but not otherwise.
>
>> We just need this, no?
>>
>> diff --git a/include/linux/compiler-clang.h
>> b/include/linux/compiler-clang.h
>> index de179993e039..4e29ab6187cb 100644
>> --- a/include/linux/compiler-clang.h
>> +++ b/include/linux/compiler-clang.h
>> @@ -15,3 +15,6 @@
>>* with any version that can compile the kernel
>>*/
>>   #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix),
>> __COUNTER__)
>> +
>> +#define randomized_struct_fields_start struct {
>> +#define randomized_struct_fields_end   };
>>
>> since offsets are mandated by C standard.
>
>
> Yes, this is what we're testing with and is probably sufficient for our
> purposes.

Just tested this with bcc. bcc actually complains. the rewriter
is not able to rewrite prev->pid where prev is "struct task_struct *prev".
I will change bcc rewriter to see whether the field value is correct or not.

Not sure my understanding is correct or not, but I am afraid that
the above approach for clang compiler change may not work.
If clang calculates the field offset based on header file, the offset
may not be the same as kernel one

I verified that the drawf info with randomized structure config does not
match randomized structure member offset. Specifically, I tried
linux/proc_ns.h struct proc_ns_operations,
dwarf says:
  field name: offset 0
  field real_ns_name: offset 8
But if you print out the real offset at runtime, you get 40 and 16 respectively.

>
> - Naveen
>
>


Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Y Song
On Fri, Sep 22, 2017 at 9:23 AM, Edward Cree  wrote:
> On 22/09/17 16:16, Alexei Starovoitov wrote:
>> looks like we're converging on
>> "be16/be32/be64/le16/le32/le64 #register" for BPF_END.
>> I guess it can live with that. I would prefer more C like syntax
>> to match the rest, but llvm parsing point is a strong one.
> Yep, agreed.  I'll post a v2 once we've settled BPF_NEG.
>> For BPG_NEG I prefer to do it in C syntax like interpreter does:
>> ALU_NEG:
>> DST = (u32) -DST;
>> ALU64_NEG:
>> DST = -DST;
>> Yonghong, does it mean that asmparser will equally suffer?
> Correction to my earlier statements: verifier will currently disassemble
>  neg as:
> (87) r0 neg 0
> (84) (u32) r0 neg (u32) 0
>  because it pretends 'neg' is a compound-assignment operator like +=.
> The analogy with be16 and friends would be to use
> neg64 r0
> neg32 r0
>  whereas the analogy with everything else would be
> r0 = -r0
> r0 = (u32) -r0
>  as Alexei says.
> I'm happy to go with Alexei's version if it doesn't cause problems for llvm.

I got some time to do some prototyping in llvm and it looks like that
I am able to
resolve the issue and we are able to use more C-like syntax. That is:
for bswap:
 r1 = (be16) (u16) r1
 or
 r1 = (be16) r1
 or
 r1 = be16 r1
for neg:
 r0 = -r0
 (for 32bit support, llvm may output "w0 = -w0" in the future. But
since it is not
  enabled yet, you can continue to output "r0 = (u32) -r0".)

Not sure which syntax is best for bswap. The "r1 = (be16) (u16) r1" is most
explicit in its intention.

Attaching my llvm patch as well and cc'ing Jiong and Jakub so they can see my
implementation and the relative discussion here. (In this patch, I did
not implement
bswap for little endian yet.) Maybe they can provide additional comments.


0001-bpf-add-support-for-neg-insn-and-change-format-of-bs.patch
Description: Binary data


Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Y Song
On Fri, Sep 22, 2017 at 7:11 AM, Y Song <ys114...@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree <ec...@solarflare.com> wrote:
>> On 22/09/17 00:11, Y Song wrote:
>>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ec...@solarflare.com> wrote:
>>>> On 21/09/17 20:44, Alexei Starovoitov wrote:
>>>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>>>>> More intuitive, but agree on the from_be/le. Maybe we should
>>>>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>>>>> it's not surrounded by braces, it's also not a cast but rather an op.
>>>> That works for me.
>>>>> 'be16 r4' is ambiguous regarding upper bits.
>>>>>
>>>>> what about my earlier suggestion:
>>>>> r4 = (be16) (u16) r4
>>>>> r4 = (le64) (u64) r4
>>>>>
>>>>> It will be pretty clear what instruction is doing (that upper bits become 
>>>>> zero).
>>>> Trouble with that is that's very *not* what C will do with those casts
>>>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>>>  but that's quite an ugly mongrel.
>>>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>>>  cleanest and clearest.  Should it be
>>>> r4 = be16 r4
>>>>  or just
>>>> be16 r4
>>>> ?  Personally I incline towards the latter, but admit it doesn't really
>>>>  match the syntax of other opcodes.
>>> I did some quick prototyping in llvm to make sure we have a syntax
>>> llvm is happy. Apparently, llvm does not like the syntax
>>>r4 = be16 r4orr4 = (be16) (u16) r4.
>>>
>>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>>>
>>> // Verify that any operand is only mentioned once.
>> Wait, how do you deal with (totally legal) r4 += r4?
>> Or r4 = *(r4 +0)?
>> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.
>
> We are talking about dag node here. The above "r4", although using the same
> register, will be different dag nodes. So it will be okay.
>
> The "r4 = be16 r4" tries to use the *same* dag node as both source and
> destination
> in the asm output which is prohibited.

With second thought, we may allow "r4 = be16 r4" by using different dag nodes.
(I need to do experiment for this.) But we do have constraints that
the two "r4" must
be the same register.  "r5 = be16 r4"  is not allowed. So from that
perspective, referencing
"r4" only once is a good idea and less confusing.


Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-22 Thread Y Song
On Fri, Sep 22, 2017 at 6:46 AM, Edward Cree <ec...@solarflare.com> wrote:
> On 22/09/17 00:11, Y Song wrote:
>> On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ec...@solarflare.com> wrote:
>>> On 21/09/17 20:44, Alexei Starovoitov wrote:
>>>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>>>> More intuitive, but agree on the from_be/le. Maybe we should
>>>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>>>> it's not surrounded by braces, it's also not a cast but rather an op.
>>> That works for me.
>>>> 'be16 r4' is ambiguous regarding upper bits.
>>>>
>>>> what about my earlier suggestion:
>>>> r4 = (be16) (u16) r4
>>>> r4 = (le64) (u64) r4
>>>>
>>>> It will be pretty clear what instruction is doing (that upper bits become 
>>>> zero).
>>> Trouble with that is that's very *not* what C will do with those casts
>>>  and it doesn't really capture the bidirectional/symmetry thing.  The
>>>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>>>  but that's quite an ugly mongrel.
>>> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>>>  cleanest and clearest.  Should it be
>>> r4 = be16 r4
>>>  or just
>>> be16 r4
>>> ?  Personally I incline towards the latter, but admit it doesn't really
>>>  match the syntax of other opcodes.
>> I did some quick prototyping in llvm to make sure we have a syntax
>> llvm is happy. Apparently, llvm does not like the syntax
>>r4 = be16 r4orr4 = (be16) (u16) r4.
>>
>> In llvm:utils/TableGen/AsmMatcherEmitter.cpp:
>>
>> // Verify that any operand is only mentioned once.
> Wait, how do you deal with (totally legal) r4 += r4?
> Or r4 = *(r4 +0)?
> Even jumps can have src_reg == dst_reg, though it doesn't seem useful.

We are talking about dag node here. The above "r4", although using the same
register, will be different dag nodes. So it will be okay.

The "r4 = be16 r4" tries to use the *same* dag node as both source and
destination
in the asm output which is prohibited.


Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-21 Thread Y Song
On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree  wrote:
> On 21/09/17 20:44, Alexei Starovoitov wrote:
>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>> More intuitive, but agree on the from_be/le. Maybe we should
>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>> it's not surrounded by braces, it's also not a cast but rather an op.
> That works for me.
>> 'be16 r4' is ambiguous regarding upper bits.
>>
>> what about my earlier suggestion:
>> r4 = (be16) (u16) r4
>> r4 = (le64) (u64) r4
>>
>> It will be pretty clear what instruction is doing (that upper bits become 
>> zero).
> Trouble with that is that's very *not* what C will do with those casts
>  and it doesn't really capture the bidirectional/symmetry thing.  The
>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>  but that's quite an ugly mongrel.
> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>  cleanest and clearest.  Should it be
> r4 = be16 r4
>  or just
> be16 r4
> ?  Personally I incline towards the latter, but admit it doesn't really
>  match the syntax of other opcodes.

I did some quick prototyping in llvm to make sure we have a syntax
llvm is happy. Apparently, llvm does not like the syntax
   r4 = be16 r4orr4 = (be16) (u16) r4.

In llvm:utils/TableGen/AsmMatcherEmitter.cpp:

// Verify that any operand is only mentioned once.
// We reject aliases and ignore instructions for now.
if (Tok[0] == '$' && !OperandNames.insert(Tok).second) {
  if (!Hack)
PrintFatalError(TheDef->getLoc(),
"ERROR: matchable with tied operand '" + Tok +
"' can never be matched!");
  // FIXME: Should reject these.  The ARM backend hits this with $lane in a
  // bunch of instructions.  It is unclear what the right answer is.
  DEBUG({
errs() << "warning: '" << TheDef->getName() << "': "
   << "ignoring instruction with tied operand '"
   << Tok << "'\n";
  });
  return false;
}

Later on, such insn will be ignored in table matching and assember
will not work any more.

Note that here bswap16/32/64 require source and destination register
must be the same.

So it looks like "be16/be32/be64/le16/le32/le64 #register" is a good idea.
We could use "be16 (u16)#register", but not sure whether extra u16
conversion adds value or
rather adds more confusion.

>
> To shed a few more bikes, I did also wonder about the BPF_NEG opcode,
>  which (if I'm reading the code correctly) currently renders as
> r4 = neg r4 0
> (u32) r4 = neg (u32) r4 0
> That printing of the insn->imm, while harmless, is needless and
>  potentially confusing.  Should we get rid of it?

Currently, llvm does not issue "neg" insn yet. Maybe you can issue
 r3 = -r4  // 64bit
 r3 = (u32) -r4 // 32bit

This matches what interpreter does. This will be similar to other ALU
operations.


Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-21 Thread Y Song
On Thu, Sep 21, 2017 at 9:24 AM, Edward Cree  wrote:
> On 21/09/17 16:52, Alexei Starovoitov wrote:
>> On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
>>> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
>>>  different structure: it has a size in insn->imm (even if it's BPF_X) and
>>>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
>>>  needs different code to print it.
>>>
>>> Signed-off-by: Edward Cree 
>>> ---
>>> It's not the same format as the new LLVM asm uses, does that matter?
>>> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
>>>  endian ops are necessarily swaps (rather than sometimes nops).
>> that is being fixed and we will fix asm format too.
>> Let's pick good format first.
> Agreed.
>>>  kernel/bpf/verifier.c | 13 +++--
>>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 799b245..e7657a4 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct 
>>> bpf_verifier_env *env,
>>>  u8 class = BPF_CLASS(insn->code);
>>>
>>>  if (class == BPF_ALU || class == BPF_ALU64) {
>>> -if (BPF_SRC(insn->code) == BPF_X)
>>> +if (BPF_OP(insn->code) == BPF_END) {
>>> +if (class == BPF_ALU64)
>>> +verbose("BUG_alu64_%02x\n", insn->code);
>>> +else
>>> +verbose("(%02x) (u%d) r%d %s %s\n",
>>> +insn->code, insn->imm, insn->dst_reg,
>>> +bpf_alu_string[BPF_END >> 4],
>>> +BPF_SRC(insn->code) == BPF_X ? "be" : 
>>> "le");
>> yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
> Good point.
>> imo
>> (u16) r4 endian be
>> isn't intuitive.
>> Can we come up with some better syntax?
>> Like
>> bswap16be r4
>> bswap32le r4
> Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be.
>> or
>>
>> to_be16 r4
>> to_le32 r4
> And the problem here is that it's not just to_be, it's also from_be.

Could you explain what is "from_be" here? Do not quite understand.

>  Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more
>  explicit about what's happening.
> Really the operation is something like `cpu_tofrom_be16 r4`, but that also
>  seems a bit clumsy and longwinded.  Also it's inconsistent with the other
>  ops that all indicate sizes with these (u16) etc casts.
> `endian (be16) r4`, perhaps?


Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions

2017-09-21 Thread Y Song
On Thu, Sep 21, 2017 at 8:52 AM, Alexei Starovoitov
 wrote:
> On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
>> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
>>  different structure: it has a size in insn->imm (even if it's BPF_X) and
>>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
>>  needs different code to print it.
>>
>> Signed-off-by: Edward Cree 
>> ---
>> It's not the same format as the new LLVM asm uses, does that matter?
>> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
>>  endian ops are necessarily swaps (rather than sometimes nops).
>
> that is being fixed and we will fix asm format too.
> Let's pick good format first.
>
>>  kernel/bpf/verifier.c | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 799b245..e7657a4 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct 
>> bpf_verifier_env *env,
>>   u8 class = BPF_CLASS(insn->code);
>>
>>   if (class == BPF_ALU || class == BPF_ALU64) {
>> - if (BPF_SRC(insn->code) == BPF_X)
>> + if (BPF_OP(insn->code) == BPF_END) {
>> + if (class == BPF_ALU64)
>> + verbose("BUG_alu64_%02x\n", insn->code);
>> + else
>> + verbose("(%02x) (u%d) r%d %s %s\n",
>> + insn->code, insn->imm, insn->dst_reg,
>> + bpf_alu_string[BPF_END >> 4],
>> + BPF_SRC(insn->code) == BPF_X ? "be" : 
>> "le");
>
> yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
> imo
> (u16) r4 endian be
> isn't intuitive.
> Can we come up with some better syntax?
> Like
> bswap16be r4
> bswap32le r4
>
> or
>
> to_be16 r4
> to_le32 r4

Currently, llvm bpf backend uses "bswap16 r4" "bswap32 r2" "bswap64 r2" syntax.

I prefer something like "to_be16 r4" "to_le32 r4", or "bswap2be16"
"bswap2le32" or something similar.
This captures what the operation really is.

Right the support to bswap2le will be added to LLVM soon.

>
> It will be more obvious what's happening.
>


Re: Bug with BPF_ALU64 | BPF_END?

2017-09-14 Thread Y Song
On Thu, Sep 14, 2017 at 11:14 AM, David Miller  wrote:
> From: Edward Cree 
> Date: Thu, 14 Sep 2017 18:53:17 +0100
>
>> Is BPF_END supposed to only be used with BPF_ALU, never with BPF_ALU64?

Yes, only BPF_ALU. The below is LLVM bpf swap insn encoding:

...
// bswap16, bswap32, bswap64
class BSWAP ...
...
  let op = 0xd; // BPF_END
  let BPFSrc = 1;   // BPF_TO_BE (TODO: use BPF_TO_LE for big-endian target)
  let BPFClass = 4; // BPF_ALU
...

>> In kernel/bpf/core.c:___bpf_prog_run(), there are only jump table targets
>>  for the BPF_ALU case, not for the BPF_ALU64 case (opcodes 0xd7 and 0xdf).
>> But the verifier doesn't enforce this; by crafting a program that uses
>>  these opcodes I can get a WARN when they're run (without JIT; it looks
>>  like the x86 JIT, at least, won't like it either).
>> Proposed patch below the cut; build-tested only.
>
> Good catch.
>
> A really neat test would be a program that uploads random BPF programs
> into the kernel, in a syzkaller'ish way.  It might have triggered this
> (eventually).
>


Re: [PATCH net-next] samples/bpf: Fix compilation issue in redirect dummy program

2017-08-31 Thread Y Song
On Thu, Aug 31, 2017 at 4:43 AM, Daniel Borkmann  wrote:
> On 08/31/2017 01:27 PM, Jesper Dangaard Brouer wrote:
>>
>> On Thu, 31 Aug 2017 14:16:39 +0300
>> Tariq Toukan  wrote:
>>
>>> Fix compilation error below:
>>>
>>> $ make samples/bpf/
>>>
>>> LLVM ERROR: 'xdp_redirect_dummy' label emitted multiple times to
>>> assembly file
>>> make[1]: *** [samples/bpf/xdp_redirect_kern.o] Error 1
>>> make: *** [samples/bpf/] Error 2
>>>
>>> Fixes: 306da4e685b4 ("samples/bpf: xdp_redirect load XDP dummy prog on TX
>>> device")
>>> Signed-off-by: Tariq Toukan 
>>> ---
>>
>>
>> Acked-by: Jesper Dangaard Brouer 
>
>
> Acked-by: Daniel Borkmann 
>
>> What LLVM/clang version do you use?
>>
>> I don't see this compile error, and I have:
>>   $ clang --version
>>   clang version 3.9.1 (tags/RELEA
>
>
> I'm seeing the error as well with a fairly recent LLVM from git
> tree (6.0.0git-2d810c2).
>
> Looks like the llvm error is triggered when section name and
> the function name for XDP prog is the same. Changing either the
> function or the section name right above resolves the issue. If
> such error didn't trigger on older versions, people could be
> using such naming scheme as done here, so seems to me like a
> regression on LLVM side we might need to look at ...

Martin fixed a similar bug earlier:
=
commit a2e8bbd2ef5457485f00b6b947bbbfa2778e5b1e
Author: Martin KaFai Lau 
Date:   Thu Jun 8 22:30:17 2017 -0700

bpf: Fix test_obj_id.c for llvm 5.0

llvm 5.0 does not like the section name and the function name
to be the same:
...
=

gcc also has this behavior. Section name is treated as global
and hence cannot collide with a function name...


Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints

2017-08-03 Thread Y Song
On Thu, Aug 3, 2017 at 7:08 PM, Alexei Starovoitov  wrote:
> On 8/3/17 6:29 AM, Yonghong Song wrote:
>>
>> @@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct
>> pt_regs *regs, long id)
>> if (!sys_data)
>> return;
>>
>> +   prog = READ_ONCE(sys_data->enter_event->prog);
>> head = this_cpu_ptr(sys_data->enter_event->perf_events);
>> -   if (hlist_empty(head))
>> +   if (!prog && hlist_empty(head))
>> return;
>>
>> /* get the size after alignment with the u32 buffer size field */
>> @@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, struct
>> pt_regs *regs, long id)
>> rec->nr = syscall_nr;
>> syscall_get_arguments(current, regs, 0, sys_data->nb_args,
>>(unsigned long *)>args);
>> +
>> +   if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, rec)) ||
>> +   hlist_empty(head)) {
>> +   perf_swevent_put_recursion_context(rctx);
>> +   return;
>> +   }
>
>
> hmm. if I read the patch correctly that makes it different from
> kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
> force user space to perf_event_open() on every cpu?
> In other cases it's the job of the bpf program to filter by cpu
> if necessary and that is well understood by bcc scripts.

The patch actually does allow the bpf program to track all cpus.
The test:
>> +   if (!prog && hlist_empty(head))
>> return;
ensures that if prog is not empty, it will not return even if the
event in the current cpu is empty. Later on, perf_call_bpf_enter will
be called if prog is not empty. This ensures that
the bpf program will execute regardless of the current cpu.

Maybe I missed anything here?


Re: [iovisor-dev] [PATCH v3 net-next 00/12] bpf: rewrite value tracking in verifier

2017-07-14 Thread Y Song
I did an experiment with one of our internal bpf programs.
The program has 1563 insns.

Without Edward's patch:
processed 13634 insns, stack depth 160

With Edward's patch:
processed 15807 insns, stack depth 160

So the number of processed insns regressed by roughly 16%.
Did anybody do any similar experiments to quantify the patch's
impact in verification performance (e.g., in terms of processed insns)?

On Tue, Jun 27, 2017 at 5:53 AM, Edward Cree via iovisor-dev
 wrote:
> This series simplifies alignment tracking, generalises bounds tracking and
>  fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
>  packet pointers, stack pointers, map value pointers and context pointers has
>  been unified, and bounds on these pointers are only checked when the pointer
>  is dereferenced.
> Operations on pointers which destroy all relation to the original pointer
>  (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
>  otherwise they convert the pointer to an unknown scalar and feed it to the
>  normal scalar arithmetic handling.
> Pointer types have been unified with the corresponding adjusted-pointer types
>  where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
>  PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
>  SCALAR_VALUE.
> Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
>  PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
>  a 'variable offset'; the former is used when e.g. adding an immediate or a
>  known-constant register, as long as it does not overflow.  Otherwise the
>  latter is used, and any operation creating a new variable offset creates a
>  new 'id' (and, for PTR_TO_PACKET, clears the 'range').
> SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
>  values; the 'fixed offset' should never be set on a scalar.
>
> As of patch 12/12, all tests of tools/testing/selftests/bpf/test_verifier
>  and tools/testing/selftests/bpf/test_align pass.
>
> v3: added a few more tests; removed RFC tags.
>
> v2: fixed nfp build, made test_align pass again and extended it with a few
>  new tests (though still need to add more).
>
> Edward Cree (12):
>   selftests/bpf: add test for mixed signed and unsigned bounds checks
>   bpf/verifier: rework value tracking
>   nfp: change bpf verifier hooks to match new verifier data structures
>   bpf/verifier: track signed and unsigned min/max values
>   bpf/verifier: more concise register state logs for constant var_off
>   selftests/bpf: change test_verifier expectations
>   selftests/bpf: rewrite test_align
>   selftests/bpf: add a test to test_align
>   selftests/bpf: add test for bogus operations on pointers
>   selftests/bpf: don't try to access past MAX_PACKET_OFF in
> test_verifier
>   selftests/bpf: add tests for subtraction & negative numbers
>   selftests/bpf: variable offset negative tests
>
>  drivers/net/ethernet/netronome/nfp/bpf/verifier.c |   24 +-
>  include/linux/bpf.h   |   34 +-
>  include/linux/bpf_verifier.h  |   56 +-
>  include/linux/tnum.h  |   81 +
>  kernel/bpf/Makefile   |2 +-
>  kernel/bpf/tnum.c |  180 ++
>  kernel/bpf/verifier.c | 1943 
> -
>  tools/testing/selftests/bpf/test_align.c  |  462 -
>  tools/testing/selftests/bpf/test_verifier.c   |  293 ++--
>  9 files changed, 2034 insertions(+), 1041 deletions(-)
>  create mode 100644 include/linux/tnum.h
>  create mode 100644 kernel/bpf/tnum.c
>
> ___
> iovisor-dev mailing list
> iovisor-...@lists.iovisor.org
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: More BPF verifier questions

2017-06-05 Thread Y Song
On Fri, Jun 2, 2017 at 7:42 AM, Edward Cree  wrote:
> A couple of the tests in tools/testing/selftests/bpf/test_verifier.c seem to 
> be bogus: Test "multiple registers share map_lookup_elem bad reg type" is 
> supposed to
>  error with "R3 invalid mem access 'inv'", but from my reading of it, R3 gets
>  loaded with a map_value_or_null, that later gets null-checked (both directly
>  and - through R0 - indirectly), and finally stored through.  I don't see
>  what's supposed to make R3 become a bad pointer.

You are right. In this case,
r0 = bpf_map_lookup
r2 = r0
r3 = r0
r4 = r0
r5 = r0
if (r0 != 0)  <=== condition 1
  r1 = 1
if (r0 != 0)
  r1 = 2
if (r3 != 0)
  *r3 = 0
...

If (r0 != 0) if false, the current verifier marks r2/r3/r4/r5 as unknown value.
I guess here what you did to have precise value 0 helps and make verifier
complaint go away correctly.

> Test "helper access to variable memory: stack, bitwise AND + JMP, correct
>  bounds" is listed as expected to pass, but it passes zero in the 'size'
>  argument, an ARG_CONST_SIZE, to bpf_probe_read; I believe this should fail
>  (and with my WIP patch it does).

Probably a typo or mis-statement. "size" is not passed in with "zero", but
with an unknown value. Hence, it probably should fail.

  BPF_MOV64_IMM(BPF_REG_2, 16),
  BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
  BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
  BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64),
  BPF_MOV64_IMM(BPF_REG_4, 0),
  BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
  BPF_MOV64_IMM(BPF_REG_3, 0),
  BPF_EMIT_CALL(BPF_FUNC_probe_read),


In kernel/bpf/verifier.c,

  } else if (arg_type == ARG_CONST_SIZE ||
   arg_type == ARG_CONST_SIZE_OR_ZERO) {
expected_type = CONST_IMM;
/* One exception. Allow UNKNOWN_VALUE registers when the
 * boundaries are known and don't cause unsafe memory accesses
 */
if (type != UNKNOWN_VALUE && type != expected_type)
  goto err_type;

Maybe somebody can provide some historical context for this relaxation.


Re: running an eBPF program

2017-05-28 Thread Y Song
On Sun, May 28, 2017 at 12:38 AM, Adel Fuchs  wrote:
> Hi,
> Is there any way to run this eBPF program without that patch?
> Alternatively, is there any other eBPF sample that does run properly? I need
> to run a program that filters packets according to IP address or port.

The following is temporary workaround you can use:
int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
-   __be16 proto = skb->protocol;
+   __be16 proto = (__be16)*(volatile __u32 *)>protocol;
__u8 ip_proto;


Re: running an eBPF program

2017-05-27 Thread Y Song
On Sat, May 27, 2017 at 5:11 PM, David Miller <da...@davemloft.net> wrote:
> From: Y Song <ys114...@gmail.com>
> Date: Sat, 27 May 2017 13:52:27 -0700
>
>> On Sat, May 27, 2017 at 1:23 PM, Y Song <ys114...@gmail.com> wrote:
>>>
>>> From verifier error message:
>>> ==
>>> 0: (bf) r6 = r1
>>>
>>> 1: (18) r9 = 0xffee
>>>
>>> 3: (69) r0 = *(u16 *)(r6 +16)
>>>
>>> invalid bpf_context access off=16 size=2
>>> ==
>>>
>>> The offset 16 of struct __sk_buff is hash.
>>> What instruction #3 tries to do is to access 2 bytes of the hash value
>>> instead of full 4 bytes.
>>> This is explicitly not allowed in verifier due to endianness issue.
>>
>>
>> I can reproduce the issue now. My previous statement saying to access
>> "hash" field is not correct. It is accessing the protocol field.
>>
>> static __inline__ bool flow_dissector(struct __sk_buff *skb,
>>   struct flow_keys *flow)
>> {
>> int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
>> __be16 proto = skb->protocol;
>> __u8 ip_proto;
>>
>> The plan so far is to see whether we can fix the issue in LLVM side.
>
> If the compiler properly asks for "__sk_buff + 16" on little-endian
> and "__sk_buff + 20" on big-endian, the verifier should instead be
> fixed to allow the access to pass.
>
> I can't see any reason why LLVM won't set the offset properly like
> that, and it's a completely legitimate optimization that we shouldn't
> try to stop LLVM from performing.

I do agree that such optimization in LLVM is perfect fine and actually
beneficial.
The only reason I was thinking was to avoid introduce endianness into verifier.
Maybe not too much work there. Let me do some experiments and come with
a patch for that.

Thanks!

Yonghong

>
> It also makes it so that we don't have to fix having absurdly defined
> __sk_buff's protocol field as a u32.
>
> Thanks.


Re: running an eBPF program

2017-05-27 Thread Y Song
On Sat, May 27, 2017 at 1:23 PM, Y Song <ys114...@gmail.com> wrote:
>
> From verifier error message:
> ==
> 0: (bf) r6 = r1
>
> 1: (18) r9 = 0xffee
>
> 3: (69) r0 = *(u16 *)(r6 +16)
>
> invalid bpf_context access off=16 size=2
> ==
>
> The offset 16 of struct __sk_buff is hash.
> What instruction #3 tries to do is to access 2 bytes of the hash value
> instead of full 4 bytes.
> This is explicitly not allowed in verifier due to endianness issue.


I can reproduce the issue now. My previous statement saying to access
"hash" field is not correct. It is accessing the protocol field.

static __inline__ bool flow_dissector(struct __sk_buff *skb,
  struct flow_keys *flow)
{
int poff, nh_off = BPF_LL_OFF + ETH_HLEN;
__be16 proto = skb->protocol;
__u8 ip_proto;

The plan so far is to see whether we can fix the issue in LLVM side.

Yonghong

>
>
> Look at iproute2 example code, it looks like the following may be responsible:
>
> bpf_tailcall.c:#define MAX_JMP_SIZE2
> bpf_tailcall.c:tail_call(skb, _tc, skb->hash & (MAX_JMP_SIZE - 
> 1));
>
> I am thinking of implementing something in LLVM to prevent
> optimization from LD4=>LD2/DL1 for context access like this.
>
>
> On Fri, May 26, 2017 at 4:00 AM, Adel Fuchs <adelfu...@gmail.com> wrote:
> > Hi
> >
> > I'm trying to run this eBPF program:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/tree/examples/bpf
> >
> >
> > and I get this error:
> >
> >
> > :~/iproute2/examples/bpf$sudo tc filter add dev enx00e11100329b parent
> > 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 action bpf obj bpf.o sec
> > action-markaction bpf obj bpf.o sec action-rand ok
> >
> > [sudo] password for adel:
> >
> >
> >
> > Prog section 'classifier' rejected: Permission denied (13)!
> >
> > - Type: 3
> >
> > - Instructions: 218 (0 over limit)
> >
> > - License:  GPL
> >
> >
> >
> > Verifier analysis:
> >
> >
> >
> > 0: (bf) r6 = r1
> >
> > 1: (18) r9 = 0xffee
> >
> > 3: (69) r0 = *(u16 *)(r6 +16)
> >
> > invalid bpf_context access off=16 size=2
> >
> >
> >
> > Error fetching program/map!
> >
> > Failed to retrieve (e)BPF data!
> >
> >
> > Any suggestions?
> >
> > Thanks,
> >
> > Adel


Re: running an eBPF program

2017-05-27 Thread Y Song
>From verifier error message:
==
0: (bf) r6 = r1

1: (18) r9 = 0xffee

3: (69) r0 = *(u16 *)(r6 +16)

invalid bpf_context access off=16 size=2
==

The offset 16 of struct __sk_buff is hash.
What instruction #3 tries to do is to access 2 bytes of the hash value
instead of full 4 bytes.
This is explicitly not allowed in verifier due to endianness issue.

Look at iproute2 example code, it looks like the following may be responsible:

bpf_tailcall.c:#define MAX_JMP_SIZE2
bpf_tailcall.c:tail_call(skb, _tc, skb->hash & (MAX_JMP_SIZE - 1));

I am thinking of implementing something in LLVM to prevent
optimization from LD4=>LD2/DL1 for context access like this.


On Fri, May 26, 2017 at 4:00 AM, Adel Fuchs  wrote:
> Hi
>
> I'm trying to run this eBPF program:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/tree/examples/bpf
>
>
> and I get this error:
>
>
> :~/iproute2/examples/bpf$sudo tc filter add dev enx00e11100329b parent
> 1: bpf obj bpf.o exp /tmp/bpf-uds flowid 1:1 action bpf obj bpf.o sec
> action-markaction bpf obj bpf.o sec action-rand ok
>
> [sudo] password for adel:
>
>
>
> Prog section 'classifier' rejected: Permission denied (13)!
>
> - Type: 3
>
> - Instructions: 218 (0 over limit)
>
> - License:  GPL
>
>
>
> Verifier analysis:
>
>
>
> 0: (bf) r6 = r1
>
> 1: (18) r9 = 0xffee
>
> 3: (69) r0 = *(u16 *)(r6 +16)
>
> invalid bpf_context access off=16 size=2
>
>
>
> Error fetching program/map!
>
> Failed to retrieve (e)BPF data!
>
>
> Any suggestions?
>
> Thanks,
>
> Adel