[PATCH v2 bpf-next 1/1] libbpf: make bpf_object__open default to UNSPEC

2018-11-23 Thread Nikita V. Shirokov
currently by default libbpf's bpf_object__open requires
bpf's program to specify  version in a code because of two things:
1) default prog type is set to KPROBE
2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified

in this patch i'm changing default prog type to UNSPEC and also changing
requirments for version's section to be present in object file.
now it would reflect what we have today in kernel
(only KPROBE prog type requires for version to be explicitly set).

v1 -> v2:
 - RFC tag has been dropped

Signed-off-by: Nikita V. Shirokov 
---
 tools/lib/bpf/libbpf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0f14f7c074c2..ed4212a4c5f9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -333,7 +333,7 @@ bpf_program__init(void *data, size_t size, char 
*section_name, int idx,
prog->idx = idx;
prog->instances.fds = NULL;
prog->instances.nr = -1;
-   prog->type = BPF_PROG_TYPE_KPROBE;
+   prog->type = BPF_PROG_TYPE_UNSPEC;
prog->btf_fd = -1;
 
return 0;
@@ -1649,12 +1649,12 @@ static bool bpf_prog_type__needs_kver(enum 
bpf_prog_type type)
case BPF_PROG_TYPE_LIRC_MODE2:
case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
-   return false;
case BPF_PROG_TYPE_UNSPEC:
-   case BPF_PROG_TYPE_KPROBE:
case BPF_PROG_TYPE_TRACEPOINT:
-   case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_RAW_TRACEPOINT:
+   case BPF_PROG_TYPE_PERF_EVENT:
+   return false;
+   case BPF_PROG_TYPE_KPROBE:
default:
return true;
}
-- 
2.15.1



[RFC PATCH bpf-next] libbpf: make bpf_object__open default to UNSPEC

2018-11-21 Thread Nikita V. Shirokov
currently by default libbpf's bpf_object__open requires
bpf's program to specify  version in a code because of two things:
1) default prog type is set to KPROBE
2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified

in this RFC i'm proposing change default to UNSPEC and also changing
logic of libbpf that it would reflect what we have today in kernel
(aka only KPROBE type requires for version to be explicitly set).

reason for change:
currently only libbpf requires by default version to be
explicitly set. it would be really hard for mainteiners of other custom
bpf loaders to migrate to libbpf (as they dont control user's code
and migration to the new loader (libbpf) wont be transparent for end
user).

what is going to be broken after this change:
if someone were relying on default to be KPROBE for bpf_object__open
his code will stop to work. however i'm really doubtfull that anyone
is using this for kprobe type of programs (instead of, say, bcc or
other tracing frameworks)

other possible solutions (for discussion, would require more machinery):
add another function like bpf_object__open w/ default to unspec

Signed-off-by: Nikita V. Shirokov 
---
 tools/lib/bpf/libbpf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0f14f7c074c2..ed4212a4c5f9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -333,7 +333,7 @@ bpf_program__init(void *data, size_t size, char 
*section_name, int idx,
prog->idx = idx;
prog->instances.fds = NULL;
prog->instances.nr = -1;
-   prog->type = BPF_PROG_TYPE_KPROBE;
+   prog->type = BPF_PROG_TYPE_UNSPEC;
prog->btf_fd = -1;
 
return 0;
@@ -1649,12 +1649,12 @@ static bool bpf_prog_type__needs_kver(enum 
bpf_prog_type type)
case BPF_PROG_TYPE_LIRC_MODE2:
case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
-   return false;
case BPF_PROG_TYPE_UNSPEC:
-   case BPF_PROG_TYPE_KPROBE:
case BPF_PROG_TYPE_TRACEPOINT:
-   case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_RAW_TRACEPOINT:
+   case BPF_PROG_TYPE_PERF_EVENT:
+   return false;
+   case BPF_PROG_TYPE_KPROBE:
default:
return true;
}
-- 
2.15.1



[PATCH v5 bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-20 Thread Nikita V. Shirokov
in this patch series i'm adding a helper for libbpf which would allow
it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS).
first patch contains new helper + explains proposed workflow
second patch contains tests which also could be used as example of usage

v4->v5:
 - naming: renamed everything to map_in_map instead of mapinmap
 - start to return nonzero val if set_inner_map_fd failed

v3->v4:
 - renamed helper to set_inner_map_fd
 - now we set this value only if it haven't
   been set before and only for (array|hash) of maps

v2->v3:
 - fixing typo in patch description
 - initializing inner_map_fd to -1 by default

v1->v2:
 - addressing nits
 - removing const identifier from fd in new helper
 - starting to check return val for bpf_map_update_elem

Nikita V. Shirokov (2):
  bpf: adding support for map in map in libbpf
  bpf: adding tests for mapinmap helpber in libbpf

 tools/lib/bpf/libbpf.c| 40 ++--
 tools/lib/bpf/libbpf.h|  2 +
 tools/testing/selftests/bpf/Makefile  |  3 +-
 tools/testing/selftests/bpf/test_map_in_map.c | 49 +++
 tools/testing/selftests/bpf/test_maps.c   | 90 +++
 5 files changed, 177 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_map_in_map.c

-- 
2.15.1



[PATCH v5 bpf-next 2/2] bpf: adding tests for map_in_map helpber in libbpf

2018-11-20 Thread Nikita V. Shirokov
adding test/example of bpf_map__set_inner_map_fd usage

Signed-off-by: Nikita V. Shirokov 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/Makefile  |  3 +-
 tools/testing/selftests/bpf/test_map_in_map.c | 49 +++
 tools/testing/selftests/bpf/test_maps.c   | 90 +++
 3 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_map_in_map.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 1dde03ea1484..43157bd89165 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
+   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o 
\
+   test_map_in_map.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/test_map_in_map.c 
b/tools/testing/selftests/bpf/test_map_in_map.c
new file mode 100644
index ..ce923e67e08e
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_map_in_map.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") mim_array = {
+   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+struct bpf_map_def SEC("maps") mim_hash = {
+   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+SEC("xdp_mimtest")
+int xdp_mimtest0(struct xdp_md *ctx)
+{
+   int value = 123;
+   int key = 0;
+   void *map;
+
+   map = bpf_map_lookup_elem(_array, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   map = bpf_map_lookup_elem(_hash, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   return XDP_PASS;
+}
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 9f0a5b16a246..9c79ee017df3 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1125,6 +1125,94 @@ static void test_sockmap(int tasks, void *data)
exit(1);
 }
 
+#define MAPINMAP_PROG "./test_map_in_map.o"
+static void test_map_in_map(void)
+{
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   struct bpf_map *map;
+   int mim_fd, fd, err;
+   int pos = 0;
+
+   obj = bpf_object__open(MAPINMAP_PROG);
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
+   2, 0);
+   if (fd < 0) {
+   printf("Failed to create hashmap '%s'!\n", strerror(errno));
+   exit(1);
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_map_in_map;
+   }
+   err = bpf_map__set_inner_map_fd(map, fd);
+   if (err) {
+   printf("Failed to set inner_map_fd for array of maps\n");
+   goto out_map_in_map;
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("Failed to load hash of maps from test prog\n");
+   goto out_map_in_map;
+   }
+   err = bpf_map__set_inner_map_fd(map, fd);
+   if (err) {
+   printf("Failed to set inner_map_fd for hash of maps\n");
+   goto out_map_in_map;
+   }
+
+   bpf_object__for_each_program(prog, obj) {
+   bpf_program__set_xdp(prog);
+   }
+   bpf_object__load(obj);
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_map_in_map;
+   }
+   mim_fd = bpf_map__fd(map);
+   if (mim_fd < 0) {
+   printf("Failed to get descriptor for array of maps\n");
+   

[PATCH v5 bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-20 Thread Nikita V. Shirokov
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load

Signed-off-by: Nikita V. Shirokov 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/libbpf.c | 40 ++--
 tools/lib/bpf/libbpf.h |  2 ++
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index cb6565d79603..ba12e070f182 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -167,6 +167,7 @@ struct bpf_map {
char *name;
size_t offset;
int map_ifindex;
+   int inner_map_fd;
struct bpf_map_def def;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
@@ -594,6 +595,14 @@ static int compare_bpf_map(const void *_a, const void *_b)
return a->offset - b->offset;
 }
 
+static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
+{
+   if (type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
+   type == BPF_MAP_TYPE_HASH_OF_MAPS)
+   return true;
+   return false;
+}
+
 static int
 bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
@@ -657,13 +666,15 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
}
obj->nr_maps = nr_maps;
 
-   /*
-* fill all fd with -1 so won't close incorrect
-* fd (fd=0 is stdin) when failure (zclose won't close
-* negative fd)).
-*/
-   for (i = 0; i < nr_maps; i++)
+   for (i = 0; i < nr_maps; i++) {
+   /*
+* fill all fd with -1 so won't close incorrect
+* fd (fd=0 is stdin) when failure (zclose won't close
+* negative fd)).
+*/
obj->maps[i].fd = -1;
+   obj->maps[i].inner_map_fd = -1;
+   }
 
/*
 * Fill obj->maps using data in "maps" section.
@@ -1164,6 +1175,9 @@ bpf_object__create_maps(struct bpf_object *obj)
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
+   if (bpf_map_type__is_map_in_map(def->type) &&
+   map->inner_map_fd >= 0)
+   create_attr.inner_map_fd = map->inner_map_fd;
 
if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
create_attr.btf_fd = btf__fd(obj->btf);
@@ -2621,6 +2635,20 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
ifindex)
map->map_ifindex = ifindex;
 }
 
+int bpf_map__set_inner_map_fd(struct bpf_map *map, int fd)
+{
+   if (!bpf_map_type__is_map_in_map(map->def.type)) {
+   pr_warning("error: unsupported map type\n");
+   return -EINVAL;
+   }
+   if (map->inner_map_fd != -1) {
+   pr_warning("error: inner_map_fd already specified\n");
+   return -EINVAL;
+   }
+   map->inner_map_fd = fd;
+   return 0;
+}
+
 static struct bpf_map *
 __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b1686a787102..16158b6b213f 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, 
__u32 ifindex);
 LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
 LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
 
+LIBBPF_API int bpf_map__set_inner_map_fd(struct bpf_map *map, int fd);
+
 LIBBPF_API long libbpf_get_error(const void *ptr);
 
 struct bpf_prog_load_attr {
-- 
2.15.1



[PATCH v4 bpf-next 2/2] bpf: adding tests for mapinmap helpber in libbpf

2018-11-20 Thread Nikita V. Shirokov
adding test/example of bpf_map__add_inner_map_fd usage

Signed-off-by: Nikita V. Shirokov 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 49 +
 tools/testing/selftests/bpf/test_maps.c | 82 +
 3 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 57b4712a6276..a3ea69dc9bdf 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
+   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o 
\
+   test_mapinmap.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/test_mapinmap.c 
b/tools/testing/selftests/bpf/test_mapinmap.c
new file mode 100644
index ..ce923e67e08e
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_mapinmap.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") mim_array = {
+   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+struct bpf_map_def SEC("maps") mim_hash = {
+   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+SEC("xdp_mimtest")
+int xdp_mimtest0(struct xdp_md *ctx)
+{
+   int value = 123;
+   int key = 0;
+   void *map;
+
+   map = bpf_map_lookup_elem(_array, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   map = bpf_map_lookup_elem(_hash, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   return XDP_PASS;
+}
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 4db2116e52be..6f2cf1a8a1b6 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1080,6 +1080,86 @@ static void test_sockmap(int tasks, void *data)
exit(1);
 }
 
+#define MAPINMAP_PROG "./test_mapinmap.o"
+static void test_mapinmap(void)
+{
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   struct bpf_map *map;
+   int mim_fd, fd, err;
+   int pos = 0;
+
+   obj = bpf_object__open(MAPINMAP_PROG);
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
+   2, 0);
+   if (fd < 0) {
+   printf("Failed to create hashmap '%s'!\n", strerror(errno));
+   exit(1);
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__set_inner_map_fd(map, fd);
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("Failed to load hash of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__set_inner_map_fd(map, fd);
+
+   bpf_object__for_each_program(prog, obj) {
+   bpf_program__set_xdp(prog);
+   }
+   bpf_object__load(obj);
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   mim_fd = bpf_map__fd(map);
+   if (mim_fd < 0) {
+   printf("Failed to get descriptor for array of maps\n");
+   goto out_mapinmap;
+   }
+
+   err = bpf_map_update_elem(mim_fd, , , 0);
+   if (err) {
+   printf("Failed to update array of maps\n");
+   goto out_mapinmap;
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+ 

[PATCH v4 bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-20 Thread Nikita V. Shirokov
in this patch series i'm adding a helper for libbpf which would allow
it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS).
first patch contains new helper + explains proposed workflow
second patch contains tests which also could be used as example of usage

v3->v4:
 - renamed helper to set_inner_map_fd
 - now we set this value only if it haven't
   been set before and only for (array|hash) of maps

v2->v3:
 - fixing typo in patch description
 - initializing inner_map_fd to -1 by default

v1->v2:
 - addressing nits
 - removing const identifier from fd in new helper
 - starting to check return val for bpf_map_update_elem

Nikita V. Shirokov (2):
  bpf: adding support for map in map in libbpf
  bpf: adding tests for mapinmap helpber in libbpf

 tools/lib/bpf/libbpf.c  | 33 +---
 tools/lib/bpf/libbpf.h  |  2 +
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 49 +
 tools/testing/selftests/bpf/test_maps.c | 82 +
 5 files changed, 162 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

-- 
2.15.1



[PATCH v4 bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-20 Thread Nikita V. Shirokov
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load

Signed-off-by: Nikita V. Shirokov 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/libbpf.c | 33 +++--
 tools/lib/bpf/libbpf.h |  2 ++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a01eb9584e52..0f46e8497ab8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -163,6 +163,7 @@ struct bpf_map {
char *name;
size_t offset;
int map_ifindex;
+   int inner_map_fd;
struct bpf_map_def def;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
@@ -585,6 +586,14 @@ static int compare_bpf_map(const void *_a, const void *_b)
return a->offset - b->offset;
 }
 
+static bool bpf_map_type__is_mapinmap(enum bpf_map_type type)
+{
+   if (type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
+   type == BPF_MAP_TYPE_HASH_OF_MAPS)
+   return true;
+   return false;
+}
+
 static int
 bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
@@ -648,13 +657,15 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
}
obj->nr_maps = nr_maps;
 
-   /*
-* fill all fd with -1 so won't close incorrect
-* fd (fd=0 is stdin) when failure (zclose won't close
-* negative fd)).
-*/
-   for (i = 0; i < nr_maps; i++)
+   for (i = 0; i < nr_maps; i++) {
+   /*
+* fill all fd with -1 so won't close incorrect
+* fd (fd=0 is stdin) when failure (zclose won't close
+* negative fd)).
+*/
obj->maps[i].fd = -1;
+   obj->maps[i].inner_map_fd = -1;
+   }
 
/*
 * Fill obj->maps using data in "maps" section.
@@ -1146,6 +1157,9 @@ bpf_object__create_maps(struct bpf_object *obj)
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
+   if (bpf_map_type__is_mapinmap(def->type) &&
+   map->inner_map_fd >= 0)
+   create_attr.inner_map_fd = map->inner_map_fd;
 
if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
create_attr.btf_fd = btf__fd(obj->btf);
@@ -2562,6 +2576,13 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
ifindex)
map->map_ifindex = ifindex;
 }
 
+void bpf_map__set_inner_map_fd(struct bpf_map *map, int fd)
+{
+   if (bpf_map_type__is_mapinmap(map->def.type) &&
+   map->inner_map_fd == -1)
+   map->inner_map_fd = fd;
+}
+
 static struct bpf_map *
 __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b1686a787102..e2132c8c84ae 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, 
__u32 ifindex);
 LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
 LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
 
+LIBBPF_API void bpf_map__set_inner_map_fd(struct bpf_map *map, int fd);
+
 LIBBPF_API long libbpf_get_error(const void *ptr);
 
 struct bpf_prog_load_attr {
-- 
2.15.1



[PATCH v3 bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-20 Thread Nikita V. Shirokov
in this patch series i'm adding a helper for libbpf which would allow
it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS).
first patch contains new helper + explains proposed workflow
second patch contains tests which also could be used as example of usage

v2->v3:
 - fixing typo in patch description
 - initializing inner_map_fd to -1 by default

v1->v2:
 - addressing nits
 - removing const identifier from fd in new helper
 - starting to check return val for bpf_map_update_elem

Nikita V. Shirokov (2):
  bpf: adding support for map in map in libbpf
  bpf: adding tests for mapinmap helpber in libbpf

 tools/lib/bpf/libbpf.c  | 11 +++-
 tools/lib/bpf/libbpf.h  |  2 +
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 49 +
 tools/testing/selftests/bpf/test_maps.c | 82 +
 5 files changed, 145 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

-- 
2.15.1



[PATCH v3 bpf-next 2/2] bpf: adding tests for mapinmap helpber in libbpf

2018-11-20 Thread Nikita V. Shirokov
adding test/example of bpf_map__add_inner_map_fd usage

Signed-off-by: Nikita V. Shirokov 
Acked-by: Yonghong Song 
---
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 49 +
 tools/testing/selftests/bpf/test_maps.c | 82 +
 3 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 57b4712a6276..a3ea69dc9bdf 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
+   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o 
\
+   test_mapinmap.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/test_mapinmap.c 
b/tools/testing/selftests/bpf/test_mapinmap.c
new file mode 100644
index ..ce923e67e08e
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_mapinmap.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") mim_array = {
+   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+struct bpf_map_def SEC("maps") mim_hash = {
+   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+SEC("xdp_mimtest")
+int xdp_mimtest0(struct xdp_md *ctx)
+{
+   int value = 123;
+   int key = 0;
+   void *map;
+
+   map = bpf_map_lookup_elem(_array, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   map = bpf_map_lookup_elem(_hash, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   return XDP_PASS;
+}
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 4db2116e52be..6bbcb3ba5787 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1080,6 +1080,86 @@ static void test_sockmap(int tasks, void *data)
exit(1);
 }
 
+#define MAPINMAP_PROG "./test_mapinmap.o"
+static void test_mapinmap(void)
+{
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   struct bpf_map *map;
+   int mim_fd, fd, err;
+   int pos = 0;
+
+   obj = bpf_object__open(MAPINMAP_PROG);
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
+   2, 0);
+   if (fd < 0) {
+   printf("Failed to create hashmap '%s'!\n", strerror(errno));
+   exit(1);
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__add_inner_map_fd(map, fd);
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("Failed to load hash of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__add_inner_map_fd(map, fd);
+
+   bpf_object__for_each_program(prog, obj) {
+   bpf_program__set_xdp(prog);
+   }
+   bpf_object__load(obj);
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   mim_fd = bpf_map__fd(map);
+   if (mim_fd < 0) {
+   printf("Failed to get descriptor for array of maps\n");
+   goto out_mapinmap;
+   }
+
+   err = bpf_map_update_elem(mim_fd, , , 0);
+   if (err) {
+   printf("Failed to update array of maps\n");
+   goto out_mapinmap;
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+ 

[PATCH v3 bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-20 Thread Nikita V. Shirokov
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load

inner_map_fd is ignored by any other maps aside from (hash|array) of
maps

Signed-off-by: Nikita V. Shirokov 
Acked-by: Yonghong Song 
---
 tools/lib/bpf/libbpf.c | 11 ++-
 tools/lib/bpf/libbpf.h |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a01eb9584e52..7e130e0c8fc9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -163,6 +163,7 @@ struct bpf_map {
char *name;
size_t offset;
int map_ifindex;
+   int inner_map_fd;
struct bpf_map_def def;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
@@ -653,8 +654,10 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 * fd (fd=0 is stdin) when failure (zclose won't close
 * negative fd)).
 */
-   for (i = 0; i < nr_maps; i++)
+   for (i = 0; i < nr_maps; i++) {
obj->maps[i].fd = -1;
+   obj->maps[i].inner_map_fd = -1;
+   }
 
/*
 * Fill obj->maps using data in "maps" section.
@@ -1146,6 +1149,7 @@ bpf_object__create_maps(struct bpf_object *obj)
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
+   create_attr.inner_map_fd = map->inner_map_fd;
 
if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
create_attr.btf_fd = btf__fd(obj->btf);
@@ -2562,6 +2566,11 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
ifindex)
map->map_ifindex = ifindex;
 }
 
+void bpf_map__add_inner_map_fd(struct bpf_map *map, int fd)
+{
+   map->inner_map_fd = fd;
+}
+
 static struct bpf_map *
 __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b1686a787102..0a0b7e0ed554 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, 
__u32 ifindex);
 LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
 LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
 
+LIBBPF_API void bpf_map__add_inner_map_fd(struct bpf_map *map, int fd);
+
 LIBBPF_API long libbpf_get_error(const void *ptr);
 
 struct bpf_prog_load_attr {
-- 
2.15.1



[PATCH v2 bpf-next 2/2] bpf: adding tests for mapinmap helpber in libbpf

2018-11-19 Thread Nikita V. Shirokov
adding test/example of bpf_map__add_inner_map_fd usage

Signed-off-by: Nikita V. Shirokov 
---
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 49 +
 tools/testing/selftests/bpf/test_maps.c | 82 +
 3 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 57b4712a6276..a3ea69dc9bdf 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
+   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o 
\
+   test_mapinmap.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/test_mapinmap.c 
b/tools/testing/selftests/bpf/test_mapinmap.c
new file mode 100644
index ..ce923e67e08e
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_mapinmap.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") mim_array = {
+   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+struct bpf_map_def SEC("maps") mim_hash = {
+   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+SEC("xdp_mimtest")
+int xdp_mimtest0(struct xdp_md *ctx)
+{
+   int value = 123;
+   int key = 0;
+   void *map;
+
+   map = bpf_map_lookup_elem(_array, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   map = bpf_map_lookup_elem(_hash, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   return XDP_PASS;
+}
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 4db2116e52be..b84f069c2aa9 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1080,6 +1080,86 @@ static void test_sockmap(int tasks, void *data)
exit(1);
 }
 
+#define MAPINMAP_PROG "./test_mapinmap.o"
+static void test_mapinmap(void)
+{
+   int mim_fd, fd, test_fd, err;
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   struct bpf_map *map;
+   int pos = 0;
+
+   obj = bpf_object__open(MAPINMAP_PROG);
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
+   2, 0);
+   if (fd < 0) {
+   printf("Failed to create hashmap '%s'!\n", strerror(errno));
+   exit(1);
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__add_inner_map_fd(map, fd);
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("Failed to load hash of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__add_inner_map_fd(map, fd);
+
+   bpf_object__for_each_program(prog, obj) {
+   bpf_program__set_xdp(prog);
+   }
+   bpf_object__load(obj);
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   mim_fd = bpf_map__fd(map);
+   if (mim_fd < 0) {
+   printf("Failed to get descriptor for array of maps\n");
+   goto out_mapinmap;
+   }
+
+   err = bpf_map_update_elem(mim_fd, , , 0);
+   if (err) {
+   printf("Failed to update array of maps\n");
+   goto out_mapinmap;
+   }
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("

[PATCH v2 bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-19 Thread Nikita V. Shirokov
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load

inner_map_fd is ignored by any other maps asidef from (hash|array) of
maps

Signed-off-by: Nikita V. Shirokov 
---
 tools/lib/bpf/libbpf.c | 7 +++
 tools/lib/bpf/libbpf.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a01eb9584e52..0e229ab037dc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -163,6 +163,7 @@ struct bpf_map {
char *name;
size_t offset;
int map_ifindex;
+   int inner_map_fd;
struct bpf_map_def def;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
@@ -1146,6 +1147,7 @@ bpf_object__create_maps(struct bpf_object *obj)
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
+   create_attr.inner_map_fd = map->inner_map_fd;
 
if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
create_attr.btf_fd = btf__fd(obj->btf);
@@ -2562,6 +2564,11 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
ifindex)
map->map_ifindex = ifindex;
 }
 
+void bpf_map__add_inner_map_fd(struct bpf_map *map, int fd)
+{
+   map->inner_map_fd = fd;
+}
+
 static struct bpf_map *
 __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b1686a787102..0a0b7e0ed554 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, 
__u32 ifindex);
 LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
 LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
 
+LIBBPF_API void bpf_map__add_inner_map_fd(struct bpf_map *map, int fd);
+
 LIBBPF_API long libbpf_get_error(const void *ptr);
 
 struct bpf_prog_load_attr {
-- 
2.15.1



[PATCH v2 bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-19 Thread Nikita V. Shirokov
in this patch series i'm adding a helper for libbpf which would allow
it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS).
first patch contains new helper + explains proposed workflow
second patch contains tests which also could be used as example of
usage

v1->v2:
 - addressing nits
 - removing const identifier from fd in new helper
 - starting to check return val for bpf_map_update_elem

Nikita V. Shirokov (2):
  bpf: adding support for map in map in libbpf
  bpf: adding tests for mapinmap helpber in libbpf

 tools/lib/bpf/libbpf.c  |  7 +++
 tools/lib/bpf/libbpf.h  |  2 +
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 49 +
 tools/testing/selftests/bpf/test_maps.c | 82 +
 5 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

-- 
2.15.1



Re: [PATCH bpf-next 2/2] bpf: adding tests for mapinmap helpber in libbpf

2018-11-19 Thread Nikita V. Shirokov
O Mon, Nov 19, 2018 at 05:18:46PM -0800, Y Song wrote:
> On Mon, Nov 19, 2018 at 4:13 PM Nikita V. Shirokov  
> wrote:
> >
> > adding test/example of bpf_map__add_inner_map_fd usage
> >
> > Signed-off-by: Nikita V. Shirokov 
> > ---
> >  tools/testing/selftests/bpf/Makefile|  3 +-
> >  tools/testing/selftests/bpf/test_mapinmap.c | 53 
> >  tools/testing/selftests/bpf/test_maps.c | 76 
> > +
> >  3 files changed, 131 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c
> >
> > diff --git a/tools/testing/selftests/bpf/Makefile 
> > b/tools/testing/selftests/bpf/Makefile
> > index 57b4712a6276..a3ea69dc9bdf 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
> > test_tcp_estats.o test
> > test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
> > test_lirc_mode2_kern.o \
> > get_cgroup_id_kern.o socket_cookie_prog.o 
> > test_select_reuseport_kern.o \
> > test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
> > -   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o 
> > test_stack_map.o
> > +   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o 
> > test_stack_map.o \
> > +   test_mapinmap.o
> >
> >  # Order correspond to 'make run_tests' order
> >  TEST_PROGS := test_kmod.sh \
> > diff --git a/tools/testing/selftests/bpf/test_mapinmap.c 
> > b/tools/testing/selftests/bpf/test_mapinmap.c
> > new file mode 100644
> > index ..8aef6c652c9c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/test_mapinmap.c
> > @@ -0,0 +1,53 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2018 Facebook */
> > +#include 
> > +#include 
> > +#include 
> > +#include "bpf_helpers.h"
> > +
> > +
> nit: extra new line
> 
> > +struct bpf_map_def SEC("maps") mim_array = {
> > +   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
> > +   .key_size = sizeof(int),
> > +   /* must be sizeof(__u32) for map in map */
> > +   .value_size = sizeof(__u32),
> > +   .max_entries = 1,
> > +   .map_flags = 0,
> > +};
> > +
> > +struct bpf_map_def SEC("maps") mim_hash = {
> > +   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
> > +   .key_size = sizeof(int),
> > +   /* must be sizeof(__u32) for map in map */
> > +   .value_size = sizeof(__u32),
> > +   .max_entries = 1,
> > +   .map_flags = 0,
> > +};
> > +
> > +
> > +
> nit: extra new lines.
> > +SEC("xdp_mimtest")
> > +int xdp_mimtest0(struct xdp_md *ctx)
> > +{
> > +   int value = 123;
> > +   int key = 0;
> > +   void *map;
> > +
> > +   map = bpf_map_lookup_elem(_array, );
> > +   if (!map)
> > +   return XDP_DROP;
> > +
> > +   bpf_map_update_elem(map, , , 0);
> > +
> > +   map = bpf_map_lookup_elem(_hash, );
> > +   if (!map)
> > +   return XDP_DROP;
> > +
> > +   bpf_map_update_elem(map, , , 0);
> > +
> > +   return XDP_PASS;
> > +}
> > +
> > +
> nit: extra new line
> > +int _version SEC("version") = 1;
> > +char _license[] SEC("license") = "GPL";
> > diff --git a/tools/testing/selftests/bpf/test_maps.c 
> > b/tools/testing/selftests/bpf/test_maps.c
> > index 4db2116e52be..a49ab294971d 100644
> > --- a/tools/testing/selftests/bpf/test_maps.c
> > +++ b/tools/testing/selftests/bpf/test_maps.c
> > @@ -1080,6 +1080,80 @@ static void test_sockmap(int tasks, void *data)
> > exit(1);
> >  }
> >
> > +#define MAPINMAP_PROG "./test_mapinmap.o"
> > +static void test_mapinmap(void)
> > +{
> > +   struct bpf_program *prog;
> > +   struct bpf_object *obj;
> > +   struct bpf_map *map;
> > +   int mim_fd, fd;
> > +   int pos = 0;
> > +
> > +   obj = bpf_object__open(MAPINMAP_PROG);
> > +
> > +   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
> > +   2, 0);
> > +   if (fd < 0) {
> > +   printf("Failed to create hashmap '%s'!\n", strerror(errno));
> > +   exit(1);
> > + 

Re: [PATCH bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-19 Thread Nikita V. Shirokov
On Mon, Nov 19, 2018 at 05:12:43PM -0800, Y Song wrote:
> On Mon, Nov 19, 2018 at 4:13 PM Nikita V. Shirokov  
> wrote:
> >
> > idea is pretty simple. for specified map (pointed by struct bpf_map)
> > we would provide descriptor of already loaded map, which is going to be
> > used as a prototype for inner map. proposed workflow:
> > 1) open bpf's object (bpf_object__open)
> > 2) create bpf's map which is going to be used as a prototype
> > 3) find (by name) map-in-map which you want to load and update w/
> > descriptor of inner map w/ a new helper from this patch
> > 4) load bpf program w/ bpf_object__load
> >
> > inner_map_fd is ignored by any other maps asidef from (hash|array) of
> > maps
> >
> > Signed-off-by: Nikita V. Shirokov 
> > ---
> >  tools/lib/bpf/libbpf.c | 7 +++
> >  tools/lib/bpf/libbpf.h | 2 ++
> >  2 files changed, 9 insertions(+)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index a01eb9584e52..a2ee1b1a93b6 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -163,6 +163,7 @@ struct bpf_map {
> > char *name;
> > size_t offset;
> > int map_ifindex;
> > +   int inner_map_fd;
> > struct bpf_map_def def;
> > __u32 btf_key_type_id;
> > __u32 btf_value_type_id;
> > @@ -1146,6 +1147,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> > create_attr.btf_fd = 0;
> > create_attr.btf_key_type_id = 0;
> > create_attr.btf_value_type_id = 0;
> > +   create_attr.inner_map_fd = map->inner_map_fd;
> >
> > if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> > create_attr.btf_fd = btf__fd(obj->btf);
> > @@ -2562,6 +2564,11 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
> > ifindex)
> > map->map_ifindex = ifindex;
> >  }
> >
> > +void bpf_map__add_inner_map_fd(struct bpf_map *map, const int fd)
> 
> Do we need "const" attribute here?
>

i can drop it in v2
 
> > +{
> > +   map->inner_map_fd = fd;
> > +}
> > +
> >  static struct bpf_map *
> >  __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
> >  {
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index b1686a787102..7cb00cd41789 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map 
> > *map, __u32 ifindex);
> >  LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
> >  LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
> >
> > +LIBBPF_API void bpf_map__add_inner_map_fd(struct bpf_map *map, const int 
> > fd);
> > +
> >  LIBBPF_API long libbpf_get_error(const void *ptr);
> >
> >  struct bpf_prog_load_attr {
> > --
> > 2.15.1
> >

--
Nikita


[PATCH bpf-next 1/2] bpf: adding support for map in map in libbpf

2018-11-19 Thread Nikita V. Shirokov
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load

inner_map_fd is ignored by any other maps asidef from (hash|array) of
maps

Signed-off-by: Nikita V. Shirokov 
---
 tools/lib/bpf/libbpf.c | 7 +++
 tools/lib/bpf/libbpf.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a01eb9584e52..a2ee1b1a93b6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -163,6 +163,7 @@ struct bpf_map {
char *name;
size_t offset;
int map_ifindex;
+   int inner_map_fd;
struct bpf_map_def def;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
@@ -1146,6 +1147,7 @@ bpf_object__create_maps(struct bpf_object *obj)
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
+   create_attr.inner_map_fd = map->inner_map_fd;
 
if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
create_attr.btf_fd = btf__fd(obj->btf);
@@ -2562,6 +2564,11 @@ void bpf_map__set_ifindex(struct bpf_map *map, __u32 
ifindex)
map->map_ifindex = ifindex;
 }
 
+void bpf_map__add_inner_map_fd(struct bpf_map *map, const int fd)
+{
+   map->inner_map_fd = fd;
+}
+
 static struct bpf_map *
 __bpf_map__iter(struct bpf_map *m, struct bpf_object *obj, int i)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b1686a787102..7cb00cd41789 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -293,6 +293,8 @@ LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, 
__u32 ifindex);
 LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
 LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
 
+LIBBPF_API void bpf_map__add_inner_map_fd(struct bpf_map *map, const int fd);
+
 LIBBPF_API long libbpf_get_error(const void *ptr);
 
 struct bpf_prog_load_attr {
-- 
2.15.1



[PATCH bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-19 Thread Nikita V. Shirokov
in this patch series i'm adding a helper for libbpf which would allow
it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS).
first patch contains new helper + explains proposed workflow
second patch contains tests which also could be used as example of
usage

Nikita V. Shirokov (2):
  bpf: adding support for map in map in libbpf
  bpf: adding tests for mapinmap helpber in libbpf

 tools/lib/bpf/libbpf.c  |  7 +++
 tools/lib/bpf/libbpf.h  |  2 +
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 53 
 tools/testing/selftests/bpf/test_maps.c | 76 +
 5 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

-- 
2.15.1



[PATCH bpf-next 2/2] bpf: adding tests for mapinmap helpber in libbpf

2018-11-19 Thread Nikita V. Shirokov
adding test/example of bpf_map__add_inner_map_fd usage

Signed-off-by: Nikita V. Shirokov 
---
 tools/testing/selftests/bpf/Makefile|  3 +-
 tools/testing/selftests/bpf/test_mapinmap.c | 53 
 tools/testing/selftests/bpf/test_maps.c | 76 +
 3 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_mapinmap.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 57b4712a6276..a3ea69dc9bdf 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -38,7 +38,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
+   test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o 
\
+   test_mapinmap.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/test_mapinmap.c 
b/tools/testing/selftests/bpf/test_mapinmap.c
new file mode 100644
index ..8aef6c652c9c
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_mapinmap.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+
+struct bpf_map_def SEC("maps") mim_array = {
+   .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+struct bpf_map_def SEC("maps") mim_hash = {
+   .type = BPF_MAP_TYPE_HASH_OF_MAPS,
+   .key_size = sizeof(int),
+   /* must be sizeof(__u32) for map in map */
+   .value_size = sizeof(__u32),
+   .max_entries = 1,
+   .map_flags = 0,
+};
+
+
+
+SEC("xdp_mimtest")
+int xdp_mimtest0(struct xdp_md *ctx)
+{
+   int value = 123;
+   int key = 0;
+   void *map;
+
+   map = bpf_map_lookup_elem(_array, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   map = bpf_map_lookup_elem(_hash, );
+   if (!map)
+   return XDP_DROP;
+
+   bpf_map_update_elem(map, , , 0);
+
+   return XDP_PASS;
+}
+
+
+int _version SEC("version") = 1;
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 4db2116e52be..a49ab294971d 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1080,6 +1080,80 @@ static void test_sockmap(int tasks, void *data)
exit(1);
 }
 
+#define MAPINMAP_PROG "./test_mapinmap.o"
+static void test_mapinmap(void)
+{
+   struct bpf_program *prog;
+   struct bpf_object *obj;
+   struct bpf_map *map;
+   int mim_fd, fd;
+   int pos = 0;
+
+   obj = bpf_object__open(MAPINMAP_PROG);
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int),
+   2, 0);
+   if (fd < 0) {
+   printf("Failed to create hashmap '%s'!\n", strerror(errno));
+   exit(1);
+   }
+   printf("fd is %d\n", fd);
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__add_inner_map_fd(map, fd);
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("Failed to load hash of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   bpf_map__add_inner_map_fd(map, fd);
+
+
+   bpf_object__for_each_program(prog, obj) {
+   bpf_program__set_xdp(prog);
+   }
+   bpf_object__load(obj);
+
+   map = bpf_object__find_map_by_name(obj, "mim_array");
+   if (IS_ERR(map)) {
+   printf("Failed to load array of maps from test prog\n");
+   goto out_mapinmap;
+   }
+   mim_fd = bpf_map__fd(map);
+   if (mim_fd < 0) {
+   printf("Failed to get descriptor for array of maps\n");
+   goto out_mapinmap;
+   }
+
+   bpf_map_update_elem(mim_fd, , , 0);
+
+   map = bpf_object__find_map_by_name(obj, "mim_hash");
+   if (IS_ERR(map)) {
+   printf("Failed to load hash of maps from test prog\n");
+ 

[PATCH bpf-next] adding selftest for bpf's (set|get)_sockopt for SAVE_SYN

2018-08-31 Thread Nikita V. Shirokov
adding selftest for feature, introduced in
commit 9452048c79404 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for
bpf_(set|get)sockopt")

Signed-off-by: Nikita V. Shirokov 
---
 .../testing/selftests/bpf/test_tcpbpf_kern.c  | 38 +--
 .../testing/selftests/bpf/test_tcpbpf_user.c  | 31 ++-
 2 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_tcpbpf_kern.c 
b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
index 4b7fd540cea9..74f73b33a7b0 100644
--- a/tools/testing/selftests/bpf/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/test_tcpbpf_kern.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -17,6 +18,13 @@ struct bpf_map_def SEC("maps") global_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(struct tcpbpf_globals),
+   .max_entries = 4,
+};
+
+struct bpf_map_def SEC("maps") sockopt_results = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(__u32),
+   .value_size = sizeof(int),
.max_entries = 2,
 };
 
@@ -45,11 +53,14 @@ int _version SEC("version") = 1;
 SEC("sockops")
 int bpf_testcb(struct bpf_sock_ops *skops)
 {
-   int rv = -1;
-   int bad_call_rv = 0;
+   char header[sizeof(struct ipv6hdr) + sizeof(struct tcphdr)];
+   struct tcphdr *thdr;
int good_call_rv = 0;
-   int op;
+   int bad_call_rv = 0;
+   int save_syn = 1;
+   int rv = -1;
int v = 0;
+   int op;
 
op = (int) skops->op;
 
@@ -82,6 +93,21 @@ int bpf_testcb(struct bpf_sock_ops *skops)
v = 0xff;
rv = bpf_setsockopt(skops, SOL_IPV6, IPV6_TCLASS, ,
sizeof(v));
+   if (skops->family == AF_INET6) {
+   v = bpf_getsockopt(skops, IPPROTO_TCP, TCP_SAVED_SYN,
+  header, (sizeof(struct ipv6hdr) +
+   sizeof(struct tcphdr)));
+   if (!v) {
+   int offset = sizeof(struct ipv6hdr);
+
+   thdr = (struct tcphdr *)(header + offset);
+   v = thdr->syn;
+   __u32 key = 1;
+
+   bpf_map_update_elem(_results, , ,
+   BPF_ANY);
+   }
+   }
break;
case BPF_SOCK_OPS_RTO_CB:
break;
@@ -111,6 +137,12 @@ int bpf_testcb(struct bpf_sock_ops *skops)
break;
case BPF_SOCK_OPS_TCP_LISTEN_CB:
bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG);
+   v = bpf_setsockopt(skops, IPPROTO_TCP, TCP_SAVE_SYN,
+  _syn, sizeof(save_syn));
+   /* Update global map w/ result of setsock opt */
+   __u32 key = 0;
+
+   bpf_map_update_elem(_results, , , BPF_ANY);
break;
default:
rv = -1;
diff --git a/tools/testing/selftests/bpf/test_tcpbpf_user.c 
b/tools/testing/selftests/bpf/test_tcpbpf_user.c
index a275c2971376..e6eebda7d112 100644
--- a/tools/testing/selftests/bpf/test_tcpbpf_user.c
+++ b/tools/testing/selftests/bpf/test_tcpbpf_user.c
@@ -54,6 +54,26 @@ int verify_result(const struct tcpbpf_globals *result)
return -1;
 }
 
+int verify_sockopt_result(int sock_map_fd)
+{
+   __u32 key = 0;
+   int res;
+   int rv;
+
+   /* check setsockopt for SAVE_SYN */
+   rv = bpf_map_lookup_elem(sock_map_fd, , );
+   EXPECT_EQ(0, rv, "d");
+   EXPECT_EQ(0, res, "d");
+   key = 1;
+   /* check getsockopt for SAVED_SYN */
+   rv = bpf_map_lookup_elem(sock_map_fd, , );
+   EXPECT_EQ(0, rv, "d");
+   EXPECT_EQ(1, res, "d");
+   return 0;
+err:
+   return -1;
+}
+
 static int bpf_find_map(const char *test, struct bpf_object *obj,
const char *name)
 {
@@ -70,11 +90,11 @@ static int bpf_find_map(const char *test, struct bpf_object 
*obj,
 int main(int argc, char **argv)
 {
const char *file = "test_tcpbpf_kern.o";
+   int prog_fd, map_fd, sock_map_fd;
struct tcpbpf_globals g = {0};
const char *cg_path = "/foo";
int error = EXIT_FAILURE;
struct bpf_object *obj;
-   int prog_fd, map_fd;
int cg_fd = -1;
__u32 key = 0;
int rv;
@@ -110,6 +130,10 @@ int main(int argc, char **argv)
if (map_fd < 0)
goto err;
 
+   sock_map_fd = bpf_find_map(__func__, obj, "sockopt_results");
+   if (sock_map_fd < 0)
+   goto err;
+
rv = bpf_map_lookup_elem(map_fd, , );
if 

[PATCH v3 bpf-next 0/2] bpf tcp save syn set/get sockoptions

2018-08-30 Thread Nikita V. Shirokov


adding supprot for two new bpf's tcp sockopts:
TCP_SAVE_SYN (set) and TCP_SAVED_SYN (get)
this would allow for tcp-bpf program to build some logic based on fields from
ingress syn packet (e.g. doing tcp's tos/tclass reflection (see sample prog))
and do it transparently from userspace program point of view

v2->v3:
 - make patch series public
v1->v2:
 - adding proper SPDX license

Nikita V. Shirokov (2):
  new options for bpf_(set|get)sockopt
  new sample bpf prog

 net/core/filter.c  | 25 +++--
 samples/bpf/Makefile   |  1 +
 samples/bpf/tcp_tos_reflect_kern.c | 87 ++
 3 files changed, 109 insertions(+), 4 deletions(-)
 create mode 100644 samples/bpf/tcp_tos_reflect_kern.c

-- 
2.17.1



[PATCH v3 bpf-next 1/2] new options for bpf_(set|get)sockopt

2018-08-30 Thread Nikita V. Shirokov
adding support for two new bpf's get/set sockopts: TCP_SAVE_SYN (set)
and TCP_SAVED_SYN (get). this would allow for bpf program to build
logic based on data from ingress SYN packet

Signed-off-by: Nikita V. Shirokov 
---
 net/core/filter.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index c25eb36f1320..feb578506009 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4007,6 +4007,12 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
tp->snd_ssthresh = val;
}
break;
+   case TCP_SAVE_SYN:
+   if (val < 0 || val > 1)
+   ret = -EINVAL;
+   else
+   tp->save_syn = val;
+   break;
default:
ret = -EINVAL;
}
@@ -4032,21 +4038,32 @@ static const struct bpf_func_proto bpf_setsockopt_proto 
= {
 BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
   int, level, int, optname, char *, optval, int, optlen)
 {
+   struct inet_connection_sock *icsk;
struct sock *sk = bpf_sock->sk;
+   struct tcp_sock *tp;
 
if (!sk_fullsock(sk))
goto err_clear;
-
 #ifdef CONFIG_INET
if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
-   if (optname == TCP_CONGESTION) {
-   struct inet_connection_sock *icsk = inet_csk(sk);
+   switch (optname) {
+   case TCP_CONGESTION:
+   icsk = inet_csk(sk);
 
if (!icsk->icsk_ca_ops || optlen <= 1)
goto err_clear;
strncpy(optval, icsk->icsk_ca_ops->name, optlen);
optval[optlen - 1] = 0;
-   } else {
+   break;
+   case TCP_SAVED_SYN:
+   tp = tcp_sk(sk);
+
+   if (optlen <= 0 || !tp->saved_syn ||
+   optlen > tp->saved_syn[0])
+   goto err_clear;
+   memcpy(optval, tp->saved_syn + 1, optlen);
+   break;
+   default:
goto err_clear;
}
} else if (level == SOL_IP) {
-- 
2.17.1



[PATCH v3 bpf-next 2/2] new sample bpf prog

2018-08-30 Thread Nikita V. Shirokov
sample program which shows TCP_SAVE_SYN/TCP_SAVED_SYN usage example:
bpf's program which is doing TOS/TCLASS reflection (server would reply
with a same TOS/TCLASS as client)

Signed-off-by: Nikita V. Shirokov 
---
 samples/bpf/Makefile   |  1 +
 samples/bpf/tcp_tos_reflect_kern.c | 87 ++
 2 files changed, 88 insertions(+)
 create mode 100644 samples/bpf/tcp_tos_reflect_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 36f9f41d094b..be0a961450bc 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -153,6 +153,7 @@ always += tcp_cong_kern.o
 always += tcp_iw_kern.o
 always += tcp_clamp_kern.o
 always += tcp_basertt_kern.o
+always += tcp_tos_reflect_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
 always += xdp_redirect_cpu_kern.o
diff --git a/samples/bpf/tcp_tos_reflect_kern.c 
b/samples/bpf/tcp_tos_reflect_kern.c
new file mode 100644
index ..d51dab19eca6
--- /dev/null
+++ b/samples/bpf/tcp_tos_reflect_kern.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Facebook
+ *
+ * BPF program to automatically reflect TOS option from received syn packet
+ *
+ * Use load_sock_ops to load this BPF program.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define DEBUG 1
+
+#define bpf_printk(fmt, ...)   \
+({ \
+  char fmt[] = fmt;\
+  bpf_trace_printk(fmt, sizeof(fmt),   \
+   ##__VA_ARGS__); \
+})
+
+SEC("sockops")
+int bpf_basertt(struct bpf_sock_ops *skops)
+{
+   char header[sizeof(struct ipv6hdr)];
+   struct ipv6hdr *hdr6;
+   struct iphdr *hdr;
+   int hdr_size = 0;
+   int save_syn = 1;
+   int tos = 0;
+   int rv = 0;
+   int op;
+
+   op = (int) skops->op;
+
+#ifdef DEBUG
+   bpf_printk("BPF command: %d\n", op);
+#endif
+   switch (op) {
+   case BPF_SOCK_OPS_TCP_LISTEN_CB:
+   rv = bpf_setsockopt(skops, SOL_TCP, TCP_SAVE_SYN,
+  _syn, sizeof(save_syn));
+   break;
+   case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+   if (skops->family == AF_INET)
+   hdr_size = sizeof(struct iphdr);
+   else
+   hdr_size = sizeof(struct ipv6hdr);
+   rv = bpf_getsockopt(skops, SOL_TCP, TCP_SAVED_SYN,
+   header, hdr_size);
+   if (!rv) {
+   if (skops->family == AF_INET) {
+   hdr = (struct iphdr *) header;
+   tos = hdr->tos;
+   if (tos != 0)
+   bpf_setsockopt(skops, SOL_IP, IP_TOS,
+  , sizeof(tos));
+   } else {
+   hdr6 = (struct ipv6hdr *) header;
+   tos = ((hdr6->priority) << 4 |
+  (hdr6->flow_lbl[0]) >>  4);
+   if (tos)
+   bpf_setsockopt(skops, SOL_IPV6,
+  IPV6_TCLASS,
+  , sizeof(tos));
+   }
+   rv = 0;
+   }
+   break;
+   default:
+   rv = -1;
+   }
+#ifdef DEBUG
+   bpf_printk("Returning %d\n", rv);
+#endif
+   skops->reply = rv;
+   return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.17.1



Re: ixgbe hangs when XDP_TX is enabled

2018-08-21 Thread Nikita V. Shirokov
On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov  
> wrote:
> >
> > we are getting such errors:
> >
> > [  408.737313] ixgbe :03:00.0 eth0: Detected Tx Unit Hang (XDP)
> >  Tx Queue <46>
> >  TDH, TDT <0>, <2>
> >  next_to_use  <2>
> >  next_to_clean<0>
> >tx_buffer_info[next_to_clean]
> >  time_stamp   <0>
> >  jiffies  <1000197c0>
> > [  408.804438] ixgbe :03:00.0 eth0: tx hang 1 detected on queue 46, 
> > resetting adapter
> > [  408.804440] ixgbe :03:00.0 eth0: initiating reset due to tx timeout
> > [  408.817679] ixgbe :03:00.0 eth0: Reset adapter
> > [  408.866091] ixgbe :03:00.0 eth0: TXDCTL.ENABLE for one or more 
> > queues not cleared within the polling period
> > [  409.345289] ixgbe :03:00.0 eth0: detected SFP+: 3
> > [  409.497232] ixgbe :03:00.0 eth0: NIC Link is Up 10 Gbps, Flow 
> > Control: RX/TX
> >
> > while running XDP prog on ixgbe nic.
> > right now i'm seing this on bpfnext kernel
> > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> >
> > looks like this is the same issue as reported by Brenden in
> > https://www.spinics.net/lists/netdev/msg439438.html
> >
> > --
> > Nikita V. Shirokov
> 
> Could you provide some additional information about your setup.
> Specifically useful would be "ethtool -i", "ethtool -l", and lspci
> -vvv info for your device. The total number of CPUs on the system
> would be useful to know as well. In addition could you try
> reproducing
sure:

ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other:  1
Combined:   63
Current hardware settings:
RX: 0
TX: 0
Other:  1
Combined:   48

# ethtool -i eth0
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x86f1
expansion-rom-version:
bus-info: :03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


# nproc
48

lspci:

03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ 
Network Connection (rev 01)
Subsystem: Intel Corporation Device 000d
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-  the issue with one of the sample XDP programs provided with the kernel
> such as the xdp2 which I believe uses the XDP_TX function. We need to
> try and create a similar setup in our own environment for
> reproduction and debugging.

will try but this could take a while, because i'm not sure that we have
ixgbe in our test lab (and it would be hard to run such test in prod)

> 
> Thanks.
> 
> - Alex

--
Nikita V. Shirokov


ixgbe hangs when XDP_TX is enabled

2018-08-20 Thread Nikita V. Shirokov
we are getting such errors:

[  408.737313] ixgbe :03:00.0 eth0: Detected Tx Unit Hang (XDP)
 Tx Queue <46>
 TDH, TDT <0>, <2>
 next_to_use  <2>
 next_to_clean<0>
   tx_buffer_info[next_to_clean]
 time_stamp   <0>
 jiffies  <1000197c0>
[  408.804438] ixgbe :03:00.0 eth0: tx hang 1 detected on queue 46, 
resetting adapter
[  408.804440] ixgbe :03:00.0 eth0: initiating reset due to tx timeout
[  408.817679] ixgbe :03:00.0 eth0: Reset adapter
[  408.866091] ixgbe :03:00.0 eth0: TXDCTL.ENABLE for one or more queues 
not cleared within the polling period
[  409.345289] ixgbe :03:00.0 eth0: detected SFP+: 3
[  409.497232] ixgbe :03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX

while running XDP prog on ixgbe nic.
right now i'm seing this on bpfnext kernel 
(latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
9a76aba02a37718242d7cdc294f0a3901928aa57)

looks like this is the same issue as reported by Brenden in
https://www.spinics.net/lists/netdev/msg439438.html

--
Nikita V. Shirokov


[PATCH bpf-next] bpf: fix xdp_generic for bpf_adjust_tail usecase

2018-04-25 Thread Nikita V. Shirokov
 when bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
 pointer, so it was pointing to the new  "end of the packet". however skb's
 len field wasn't properly modified, so on the wire ethernet frame had
 original (or even bigger, if adjust_head was used) size. this diff is fixing
 this.

Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/
bpf_xdp_adjust_tail")

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---

Notes:
original tests missed this because it looks like tap interface
ignores incorrect ethernet FCS (all tests were done in VM)
and even w/ missaligned l3 and l2 lengths, kernel still were
accepting this ICMP packet
output was generated w/ bpf_adjust_tail prog from samples
before this fix (see lengths field of the ethernet layer):

tehnerd@maindev:~$ sudo tcpdump -ni tap0 icmp -vvv -eee
tcpdump: listening on tap0, link-type EN10MB (Ethernet), capture size 
262144 bytes
06:38:15.546782 52:54:00:12:34:57 > 12:0e:a3:cc:78:b8, ethertype IPv4 
(0x0800), length 1454: (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto 
ICMP (1), length 112)
172.16.0.2 > 172.16.0.1: ICMP 172.16.0.2 unreachable - need to frag 
(mtu 586), length 92
(tos 0x0, ttl 64, id 48021, offset 0, flags [DF], proto TCP (6), 
length 1412)
172.16.0.1.50916 > 172.16.0.2.22: Flags [P.], seq 427401155:427402515, 
ack 3567613893, win 229, options [nop,nop,TS val 1287434011 ecr 2176566223], 
length 1360

after:
tehnerd@maindev:~$ sudo tcpdump -ni tap0 icmp -vvv -eee
tcpdump: listening on tap0, link-type EN10MB (Ethernet), capture size 
262144 bytes
06:47:37.226843 52:54:00:12:34:57 > 32:45:9f:69:35:ba, ethertype IPv4 
(0x0800), length 126: (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto 
ICMP (1), length 112)

172.16.0.2 > 172.16.0.1: ICMP 172.16.0.2 unreachable - need to frag (mtu 
586), length 92
(tos 0x0, ttl 64, id 29964, offset 0, flags [DF], proto TCP (6), length 
1412)
172.16.0.1.50918 > 172.16.0.2.22: Flags [P.], seq 14171614:14172974, ack 
1433043471, win 229, options [nop,nop,TS val 1287995744 ecr 3312743811], length 
1360

 net/core/dev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c624a04dad1f..8f8931b93140 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4057,8 +4057,10 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 * pckt.
 */
off = orig_data_end - xdp.data_end;
-   if (off != 0)
+   if (off != 0) {
skb_set_tail_pointer(skb, xdp.data_end - xdp.data);
+   skb->len -= off;
+   }
 
switch (act) {
case XDP_REDIRECT:
-- 
2.15.1



[PATCH bpf-next] bpf: fix virtio-net's length calc for XDP_PASS

2018-04-22 Thread Nikita V. Shirokov
In commit 6870de435b90 ("bpf: make virtio compatible w/ 
bpf_xdp_adjust_tail") i didn't account for vi->hdr_len during new 
packet's length calculation after bpf_prog_run in receive_mergeable. 
because of this all packets, if they were passed to the kernel, 
were truncated by 12 bytes.

Fixes:6870de435b90 ("bpf: make virtio compatible w/ bpf_xdp_adjust_tail")
Reported-by: David Ahern <dsah...@gmail.com>

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---

Notes:
unfortunately it looks like that xdp_tx is still broken because
fix by Jason (introduced in "XDP_TX for virtio_net not working in recent 
kernel?
" thread) haven't landed yet)

 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 779a4f798522..08ac2cc986aa 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -761,7 +761,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
/* recalculate len if xdp.data or xdp.data_end were
 * adjusted
 */
-   len = xdp.data_end - xdp.data;
+   len = xdp.data_end - xdp.data + vi->hdr_len;
/* We can only create skb based on xdp_page. */
if (unlikely(xdp_page != page)) {
rcu_read_unlock();
-- 
2.15.1



Re: XDP breakage with virtio due to 6870de435b90c083ae0f3f7f341287976ef56f03

2018-04-22 Thread Nikita V. Shirokov
On Sun, Apr 22, 2018 at 04:47:48PM -0600, David Ahern wrote:
> This commit breaks my FIB forwarding program:
> 
> commit 6870de435b90c083ae0f3f7f341287976ef56f03
> Author: Nikita V. Shirokov <tehn...@tehnerd.com>
> Date:   Tue Apr 17 21:42:20 2018 -0700
> 
> bpf: make virtio compatible w/ bpf_xdp_adjust_tail
> 
> w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
> well (only "decrease" of pointer's location is going to be supported).
> changing of this pointer will change packet's size.
> for virtio driver we need to adjust XDP_PASS handling by recalculating
> length of the packet if it was passed to the TCP/IP stack
> 
> Reviewed-by: Jason Wang <jasow...@redhat.com>
> Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> 
> ###
> 
> Some of the packets (e.g., ARP or those without a resolved neighbor) are
> passed to the networking stack. What shows up are clearly broken packets:
> 
> # tcpdump -n -i eth1
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
> 15:45:29.693238 [|ARP]
>   0x:  0001 0800 0604 0001 42c0 ce2f 3fa9 0a64  B../?..d
> 15:45:30.710327 [|ARP]
>   0x:  0001 0800 0604 0001 42c0 ce2f 3fa9 0a64  B../?..d
> 15:45:31.734296 [|ARP]
>   0x:  0001 0800 0604 0001 42c0 ce2f 3fa9 0a64  B../?..d
> 15:45:32.908720 IP6 truncated-ip6 - 12 bytes
> missing!fe80::40c0:ceff:fe2f:3fa9 > ff02::1:ff00:2: ICMP6, neighbor
> solicitation[|icmp6]
> 15:45:33.910530 IP6 truncated-ip6 - 12 bytes missing!2001:db8:1::64 >
> ff02::1:ff00:2: ICMP6, neighbor solicitation[|icmp6]
> 15:45:34.934437 IP6 truncated-ip6 - 12 bytes missing!2001:db8:1::64 >
> ff02::1:ff00:2: ICMP6, neighbor solicitation[|icmp6]
> 15:45:35.958394 IP6 truncated-ip6 - 12 bytes missing!2001:db8:1::64 >
> ff02::1:ff00:2: ICMP6, neighbor solicitation[|icmp6]
> 
> Reverting the mentioned patch fixes the problem.
Hi, David.
thanks for reporting this. looks like in my calculation i've missed
vi->hdr_len during new lengths calculation (it was len = xdp->data_end -
xdp->data; but also shouldbe +vi->hdr_len). will run few more tests
before sending a fix.

--
Nikita


Re: [PATCH bpf-next v2 02/11] bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
On Wed, Apr 18, 2018 at 02:48:18PM +0200, Jesper Dangaard Brouer wrote:
> On Tue, 17 Apr 2018 21:29:42 -0700
> "Nikita V. Shirokov" <tehn...@tehnerd.com> wrote:
> 
> > w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
> > well (only "decrease" of pointer's location is going to be supported).
> > changing of this pointer will change packet's size.
> > for generic XDP we need to reflect this packet's length change by
> > adjusting skb's tail pointer
> > 
> > Acked-by: Alexei Starovoitov <a...@kernel.org>
> 
> You are missing your own Signed-off-by: line on all of the patches.
> 
yeah, somehow lost it between v1 and v2 :) thanks !
> BTW, thank you for working on this! It have been on my todo-list for a
> while now!
> 
> _After_ this patchset, I would like to see adding support for
> "increasing" the data_end location to create a larger packet.  For that
> we should likely add a data_hard_end pointer.  This, would also be
> helpful in cpu_map_build_skb() to know the data_hard_end, to determine
> the frame size (as some driver doesn't use PAGE_SIZE frames, ixgbe).
> 
yeah, increasing the size would be nice to have, but will require more
thinking / rework on drivers side (as you pointed out it's not as easy
as "every driver have at least PAGE_SIZE of data available for xdp".).
will add to my TODO
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
--
Nikita


Re: [PATCH bpf-next v2 00/11] introduction of bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
On Wed, Apr 18, 2018 at 02:37:40PM +0200, Daniel Borkmann wrote:
> On 04/18/2018 06:29 AM, Nikita V. Shirokov wrote:
> > In this patch series i'm add new bpf helper which allow to manupulate
> > xdp's data_end pointer. right now only "shrinking" (reduce packet's size
> > by moving pointer) is supported (and i see no use case for "growing").
> > Main use case for such helper is to be able to generate controll (ICMP)
> > messages from XDP context. such messages usually contains first N bytes
> > from original packets as a payload, and this is exactly what this helper
> > would allow us to do (see patch 3 for sample program, where we generate
> > ICMP "packet too big" message). This helper could be usefull for load
> > balancing applications where after additional encapsulation, resulting
> > packet could be bigger then interface MTU.
> > Aside from new helper this patch series contains minor changes in device
> > drivers (for ones which requires), so they would recal packet's length
> > not only when head pointer was adjusted, but if tail's one as well.
> 
> The whole set doesn't have any SoBs from you which is mandatory before
> applying anything. Please add.
> 
sorry about that, somehow lost it between v1 and v2 ;)
> Thanks,
> Daniel
> 
> > v1->v2:
> >  * fixed kbuild warning
> >  * made offset eq 0 invalid for xdp_bpf_adjust_tail
> >  * splitted bpf_prog_test_run fix and selftests in sep commits
> >  * added SPDX licence where applicable
> >  * some reshuffling in patches order (tests now in the end)
> > 
> > 
> > Nikita V. Shirokov (11):
> >   bpf: making bpf_prog_test run aware of possible data_end ptr change
> >   bpf: adding tests for bpf_xdp_adjust_tail
> >   bpf: adding bpf_xdp_adjust_tail helper
> >   bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail
> >   bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail
> >   bpf: make bnxt compatible w/ bpf_xdp_adjust_tail
> >   bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tail
> >   bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tail
> >   bpf: make tun compatible w/ bpf_xdp_adjust_tail
> >   bpf: make virtio compatible w/ bpf_xdp_adjust_tail
> >   bpf: add bpf_xdp_adjust_tail sample prog
> > 
> >  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   2 +-
> >  drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c |   2 +-
> >  .../net/ethernet/netronome/nfp/nfp_net_common.c|   2 +-
> >  drivers/net/tun.c  |   3 +-
> >  drivers/net/virtio_net.c   |   7 +-
> >  include/uapi/linux/bpf.h   |  10 +-
> >  net/bpf/test_run.c |   3 +-
> >  net/core/dev.c |  10 +-
> >  net/core/filter.c  |  29 +++-
> >  samples/bpf/Makefile   |   4 +
> >  samples/bpf/xdp_adjust_tail_kern.c | 152 
> > +
> >  samples/bpf/xdp_adjust_tail_user.c | 142 
> > +++
> >  tools/include/uapi/linux/bpf.h |  10 +-
> >  tools/testing/selftests/bpf/Makefile   |   2 +-
> >  tools/testing/selftests/bpf/bpf_helpers.h  |   5 +
> >  tools/testing/selftests/bpf/test_adjust_tail.c |  30 
> >  tools/testing/selftests/bpf/test_progs.c   |  32 +
> >  18 files changed, 435 insertions(+), 12 deletions(-)
> >  create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
> >  create mode 100644 samples/bpf/xdp_adjust_tail_user.c
> >  create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c
> > 
> 


[PATCH bpf-next v3 09/11] bpf: making bpf_prog_test run aware of possible data_end ptr change

2018-04-18 Thread Nikita V. Shirokov
after introduction of bpf_xdp_adjust_tail helper packet length
could be changed not only if xdp->data pointer has been changed
but xdp->data_end as well. making bpf_prog_test_run aware of this
possibility

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 net/bpf/test_run.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2ced48662c1f..68c3578343b4 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -170,7 +170,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
xdp.rxq = >xdp_rxq;
 
retval = bpf_test_run(prog, , repeat, );
-   if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+   if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
+   xdp.data_end != xdp.data + size)
size = xdp.data_end - xdp.data;
ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
kfree(data);
-- 
2.15.1



[PATCH bpf-next v3 02/11] bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for generic XDP we need to reflect this packet's length change by
adjusting skb's tail pointer

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 net/core/dev.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 969462ebb296..11c789231a03 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3996,9 +3996,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 struct bpf_prog *xdp_prog)
 {
struct netdev_rx_queue *rxqueue;
+   void *orig_data, *orig_data_end;
u32 metalen, act = XDP_DROP;
struct xdp_buff xdp;
-   void *orig_data;
int hlen, off;
u32 mac_len;
 
@@ -4037,6 +4037,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
xdp.data_meta = xdp.data;
xdp.data_end = xdp.data + hlen;
xdp.data_hard_start = skb->data - skb_headroom(skb);
+   orig_data_end = xdp.data_end;
orig_data = xdp.data;
 
rxqueue = netif_get_rxqueue(skb);
@@ -4051,6 +4052,13 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
__skb_push(skb, -off);
skb->mac_header += off;
 
+   /* check if bpf_xdp_adjust_tail was used. it can only "shrink"
+* pckt.
+*/
+   off = orig_data_end - xdp.data_end;
+   if (off != 0)
+   skb_set_tail_pointer(skb, xdp.data_end - xdp.data);
+
switch (act) {
case XDP_REDIRECT:
case XDP_TX:
-- 
2.15.1



[PATCH bpf-next v3 05/11] bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for cavium's thunder driver we will just calculate packet's length
unconditionally

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 707db3304396..7135db45927e 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -538,9 +538,9 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
action = bpf_prog_run_xdp(prog, );
rcu_read_unlock();
 
+   len = xdp.data_end - xdp.data;
/* Check if XDP program has changed headers */
if (orig_data != xdp.data) {
-   len = xdp.data_end - xdp.data;
offset = orig_data - xdp.data;
dma_addr -= offset;
}
-- 
2.15.1



[PATCH bpf-next v3 07/11] bpf: make tun compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for tun driver we need to adjust XDP_PASS handling by recalculating
length of the packet if it was passed to the TCP/IP stack
(in case if after xdp's prog run data_end pointer was adjusted)

Reviewed-by: Jason Wang <jasow...@redhat.com>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/tun.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e58be152d5c..901351a6ed21 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1696,6 +1696,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
return NULL;
case XDP_PASS:
delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
break;
default:
bpf_warn_invalid_xdp_action(act);
@@ -1716,7 +1717,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
}
 
skb_reserve(skb, pad - delta);
-   skb_put(skb, len + delta);
+   skb_put(skb, len);
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
 
-- 
2.15.1



[PATCH bpf-next v3 04/11] bpf: make bnxt compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for bnxt driver we will just calculate packet's length unconditionally

Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 1389ab5e05df..1f0e872d0667 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -113,10 +113,10 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
bnxt_rx_ring_info *rxr, u16 cons,
if (tx_avail != bp->tx_ring_size)
*event &= ~BNXT_RX_EVENT;
 
+   *len = xdp.data_end - xdp.data;
if (orig_data != xdp.data) {
offset = xdp.data - xdp.data_hard_start;
*data_ptr = xdp.data_hard_start + offset;
-   *len = xdp.data_end - xdp.data;
}
switch (act) {
case XDP_PASS:
-- 
2.15.1



[PATCH bpf-next v3 06/11] bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for nfp driver we will just calculate packet's length unconditionally

Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicin...@netronome.com>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1eb6549f2a54..d9111c077699 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1722,7 +1722,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
act = bpf_prog_run_xdp(xdp_prog, );
 
-   pkt_len -= xdp.data - orig_data;
+   pkt_len = xdp.data_end - xdp.data;
pkt_off += xdp.data - orig_data;
 
switch (act) {
-- 
2.15.1



[PATCH bpf-next v3 11/11] bpf: add bpf_xdp_adjust_tail sample prog

2018-04-18 Thread Nikita V. Shirokov
adding bpf's sample program which is using bpf_xdp_adjust_tail helper
by generating ICMPv4 "packet to big" message if ingress packet's size is
bigger then 600 bytes

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 samples/bpf/Makefile  |   4 +
 samples/bpf/xdp_adjust_tail_kern.c| 152 ++
 samples/bpf/xdp_adjust_tail_user.c| 142 
 tools/testing/selftests/bpf/bpf_helpers.h |   2 +
 4 files changed, 300 insertions(+)
 create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
 create mode 100644 samples/bpf/xdp_adjust_tail_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4d6a6edd4bf6..aa8c392e2e52 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
 hostprogs-y += cpustat
+hostprogs-y += xdp_adjust_tail
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
+xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
 always += cpustat_kern.o
+always += xdp_adjust_tail_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
 HOSTLOADLIBES_cpustat += -lelf
+HOSTLOADLIBES_xdp_adjust_tail += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xdp_adjust_tail_kern.c 
b/samples/bpf/xdp_adjust_tail_kern.c
new file mode 100644
index ..411fdb21f8bc
--- /dev/null
+++ b/samples/bpf/xdp_adjust_tail_kern.c
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program shows how to use bpf_xdp_adjust_tail() by
+ * generating ICMPv4 "packet to big" (unreachable/ df bit set frag needed
+ * to be more preice in case of v4)" where receiving packets bigger then
+ * 600 bytes.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEFAULT_TTL 64
+#define MAX_PCKT_SIZE 600
+#define ICMP_TOOBIG_SIZE 98
+#define ICMP_TOOBIG_PAYLOAD_SIZE 92
+
+struct bpf_map_def SEC("maps") icmpcnt = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(__u32),
+   .value_size = sizeof(__u64),
+   .max_entries = 1,
+};
+
+static __always_inline void count_icmp(void)
+{
+   u64 key = 0;
+   u64 *icmp_count;
+
+   icmp_count = bpf_map_lookup_elem(, );
+   if (icmp_count)
+   *icmp_count += 1;
+}
+
+static __always_inline void swap_mac(void *data, struct ethhdr *orig_eth)
+{
+   struct ethhdr *eth;
+
+   eth = data;
+   memcpy(eth->h_source, orig_eth->h_dest, ETH_ALEN);
+   memcpy(eth->h_dest, orig_eth->h_source, ETH_ALEN);
+   eth->h_proto = orig_eth->h_proto;
+}
+
+static __always_inline __u16 csum_fold_helper(__u32 csum)
+{
+   return ~((csum & 0x) + (csum >> 16));
+}
+
+static __always_inline void ipv4_csum(void *data_start, int data_size,
+ __u32 *csum)
+{
+   *csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);
+   *csum = csum_fold_helper(*csum);
+}
+
+static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
+{
+   int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
+
+   if (bpf_xdp_adjust_head(xdp, 0 - headroom))
+   return XDP_DROP;
+   void *data = (void *)(long)xdp->data;
+   void *data_end = (void *)(long)xdp->data_end;
+
+   if (data + (ICMP_TOOBIG_SIZE + headroom) > data_end)
+   return XDP_DROP;
+
+   struct iphdr *iph, *orig_iph;
+   struct icmphdr *icmp_hdr;
+   struct ethhdr *orig_eth;
+   __u32 csum = 0;
+   __u64 off = 0;
+
+   orig_eth = data + headroom;
+   swap_mac(data, orig_eth);
+   off += sizeof(struct ethhdr);
+   iph = data + off;
+   off += sizeof(struct iphdr);
+   icmp_hdr = data + off;
+   off += sizeof(struct icmphdr);

[PATCH bpf-next v3 01/11] bpf: adding bpf_xdp_adjust_tail helper

2018-04-18 Thread Nikita V. Shirokov
Adding new bpf helper which would allow us to manipulate
xdp's data_end pointer, and allow us to reduce packet's size
indended use case: to generate ICMP messages from XDP context,
where such message would contain truncated original packet.

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 include/uapi/linux/bpf.h | 10 +-
 net/core/filter.c| 29 -
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec89732a8d..9a2d1a04eb24 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -755,6 +755,13 @@ union bpf_attr {
  * @addr: pointer to struct sockaddr to bind socket to
  * @addr_len: length of sockaddr structure
  * Return: 0 on success or negative error code
+ *
+ * int bpf_xdp_adjust_tail(xdp_md, delta)
+ * Adjust the xdp_md.data_end by delta. Only shrinking of packet's
+ * size is supported.
+ * @xdp_md: pointer to xdp_md
+ * @delta: A negative integer to be added to xdp_md.data_end
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -821,7 +828,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(xdp_adjust_tail),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index a374b8560bc4..29318598fd60 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2725,6 +2725,30 @@ static const struct bpf_func_proto 
bpf_xdp_adjust_head_proto = {
.arg2_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
+{
+   void *data_end = xdp->data_end + offset;
+
+   /* only shrinking is allowed for now. */
+   if (unlikely(offset >= 0))
+   return -EINVAL;
+
+   if (unlikely(data_end < xdp->data + ETH_HLEN))
+   return -EINVAL;
+
+   xdp->data_end = data_end;
+
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_adjust_tail_proto = {
+   .func   = bpf_xdp_adjust_tail,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
 {
void *meta = xdp->data_meta + offset;
@@ -3074,7 +3098,8 @@ bool bpf_helper_changes_pkt_data(void *func)
func == bpf_l4_csum_replace ||
func == bpf_xdp_adjust_head ||
func == bpf_xdp_adjust_meta ||
-   func == bpf_msg_pull_data)
+   func == bpf_msg_pull_data ||
+   func == bpf_xdp_adjust_tail)
return true;
 
return false;
@@ -3888,6 +3913,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
return _xdp_redirect_proto;
case BPF_FUNC_redirect_map:
return _xdp_redirect_map_proto;
+   case BPF_FUNC_xdp_adjust_tail:
+   return _xdp_adjust_tail_proto;
default:
return bpf_base_func_proto(func_id);
}
-- 
2.15.1



[PATCH bpf-next v3 00/11] introduction of bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
In this patch series i'm add new bpf helper which allow to manupulate
xdp's data_end pointer. right now only "shrinking" (reduce packet's size
by moving pointer) is supported (and i see no use case for "growing").
Main use case for such helper is to be able to generate controll (ICMP)
messages from XDP context. such messages usually contains first N bytes
from original packets as a payload, and this is exactly what this helper
would allow us to do (see patch 3 for sample program, where we generate
ICMP "packet too big" message). This helper could be usefull for load
balancing applications where after additional encapsulation, resulting
packet could be bigger then interface MTU.
Aside from new helper this patch series contains minor changes in device
drivers (for ones which requires), so they would recal packet's length
not only when head pointer was adjusted, but if tail's one as well.

v2->v3:
 * adding missed "signed off by" in v2

v1->v2:
 * fixed kbuild warning
 * made offset eq 0 invalid for xdp_bpf_adjust_tail
 * splitted bpf_prog_test_run fix and selftests in sep commits
 * added SPDX licence where applicable
 * some reshuffling in patches order (tests now in the end)


Nikita V. Shirokov (11):
  bpf: making bpf_prog_test run aware of possible data_end ptr change
  bpf: adding tests for bpf_xdp_adjust_tail
  bpf: adding bpf_xdp_adjust_tail helper
  bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail
  bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail
  bpf: make bnxt compatible w/ bpf_xdp_adjust_tail
  bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tail
  bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tail
  bpf: make tun compatible w/ bpf_xdp_adjust_tail
  bpf: make virtio compatible w/ bpf_xdp_adjust_tail
  bpf: add bpf_xdp_adjust_tail sample prog

 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   2 +-
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   2 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   2 +-
 drivers/net/tun.c  |   3 +-
 drivers/net/virtio_net.c   |   7 +-
 include/uapi/linux/bpf.h   |  10 +-
 net/bpf/test_run.c |   3 +-
 net/core/dev.c |  10 +-
 net/core/filter.c  |  29 +++-
 samples/bpf/Makefile   |   4 +
 samples/bpf/xdp_adjust_tail_kern.c | 152 +
 samples/bpf/xdp_adjust_tail_user.c | 142 +++
 tools/include/uapi/linux/bpf.h |  10 +-
 tools/testing/selftests/bpf/Makefile   |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |   5 +
 tools/testing/selftests/bpf/test_adjust_tail.c |  30 
 tools/testing/selftests/bpf/test_progs.c   |  32 +
 18 files changed, 435 insertions(+), 12 deletions(-)
 create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
 create mode 100644 samples/bpf/xdp_adjust_tail_user.c
 create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c

-- 
2.15.1



[PATCH bpf-next v3 03/11] bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for mlx4 driver we will just calculate packet's length unconditionally
(the same way as it's already being done in mlx5)

Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tariq Toukan <tar...@mellanox.com>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5c613c6663da..efc55feddc5c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -775,8 +775,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
 
act = bpf_prog_run_xdp(xdp_prog, );
 
+   length = xdp.data_end - xdp.data;
if (xdp.data != orig_data) {
-   length = xdp.data_end - xdp.data;
frags[0].page_offset = xdp.data -
xdp.data_hard_start;
va = xdp.data;
-- 
2.15.1



[PATCH bpf-next v3 10/11] bpf: adding tests for bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
adding selftests for bpf_xdp_adjust_tail helper. in this synthetic test
we are testing that 1) if data_end < data helper will return EINVAL
2) for normal use case packet's length would be reduced.

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 tools/include/uapi/linux/bpf.h | 10 +++-
 tools/testing/selftests/bpf/Makefile   |  2 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |  3 +++
 tools/testing/selftests/bpf/test_adjust_tail.c | 30 
 tools/testing/selftests/bpf/test_progs.c   | 32 ++
 5 files changed, 75 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9d07465023a2..56bf493ba7ed 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -755,6 +755,13 @@ union bpf_attr {
  * @addr: pointer to struct sockaddr to bind socket to
  * @addr_len: length of sockaddr structure
  * Return: 0 on success or negative error code
+ *
+ * int bpf_xdp_adjust_tail(xdp_md, delta)
+ * Adjust the xdp_md.data_end by delta. Only shrinking of packet's
+ * size is supported.
+ * @xdp_md: pointer to xdp_md
+ * @delta: A negative integer to be added to xdp_md.data_end
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -821,7 +828,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(xdp_adjust_tail),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 0a315ddabbf4..3e819dc70bee 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -31,7 +31,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
-   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o
+   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o 
test_adjust_tail.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index d8223d99f96d..50c607014b22 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -96,6 +96,9 @@ static int (*bpf_msg_pull_data)(void *ctx, int start, int 
end, int flags) =
(void *) BPF_FUNC_msg_pull_data;
 static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
(void *) BPF_FUNC_bind;
+static int (*bpf_xdp_adjust_tail)(void *ctx, int offset) =
+   (void *) BPF_FUNC_xdp_adjust_tail;
+
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_adjust_tail.c 
b/tools/testing/selftests/bpf/test_adjust_tail.c
new file mode 100644
index ..4cd5e860c903
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_adjust_tail.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail")
+int _xdp_adjust_tail(struct xdp_md *xdp)
+{
+   void *data_end = (void *)(long)xdp->data_end;
+   void *data = (void *)(long)xdp->data;
+   int offset = 0;
+
+   if (data_end - data == 54)
+   offset = 256;
+   else
+   offset = 20;
+   if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+   return XDP_DROP;
+   return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_progs.c 
b/tools/testing/selftests/bpf/test_progs.c
index faadbe233966..eedda98d7bb1 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -166,6 +166,37 @@ static void test_xdp(void)
bpf_object__close(obj);
 }
 
+static void test_xdp_adjust_tail(void)
+{
+   const char *file = "./test_adjust_tail.o";
+   struct bpf_object *obj;
+   char buf[128];
+   __u32 duration, retval, size;
+   int err, prog_fd;
+
+

[PATCH bpf-next v3 08/11] bpf: make virtio compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for virtio driver we need to adjust XDP_PASS handling by recalculating
length of the packet if it was passed to the TCP/IP stack

Reviewed-by: Jason Wang <jasow...@redhat.com>
Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/virtio_net.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 01694e26f03e..779a4f798522 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -606,6 +606,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
case XDP_PASS:
/* Recalculate length in case bpf program changed it */
delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
break;
case XDP_TX:
xdpf = convert_to_xdp_frame();
@@ -642,7 +643,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
goto err;
}
skb_reserve(skb, headroom - delta);
-   skb_put(skb, len + delta);
+   skb_put(skb, len);
if (!delta) {
buf += header_offset;
memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
@@ -757,6 +758,10 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
offset = xdp.data -
page_address(xdp_page) - vi->hdr_len;
 
+   /* recalculate len if xdp.data or xdp.data_end were
+* adjusted
+*/
+   len = xdp.data_end - xdp.data;
/* We can only create skb based on xdp_page. */
if (unlikely(xdp_page != page)) {
rcu_read_unlock();
-- 
2.15.1



[PATCH bpf-next v2 01/11] bpf: adding bpf_xdp_adjust_tail helper

2018-04-18 Thread Nikita V. Shirokov
Adding new bpf helper which would allow us to manipulate
xdp's data_end pointer, and allow us to reduce packet's size
indended use case: to generate ICMP messages from XDP context,
where such message would contain truncated original packet.
---
 include/uapi/linux/bpf.h | 10 +-
 net/core/filter.c| 29 -
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec89732a8d..9a2d1a04eb24 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -755,6 +755,13 @@ union bpf_attr {
  * @addr: pointer to struct sockaddr to bind socket to
  * @addr_len: length of sockaddr structure
  * Return: 0 on success or negative error code
+ *
+ * int bpf_xdp_adjust_tail(xdp_md, delta)
+ * Adjust the xdp_md.data_end by delta. Only shrinking of packet's
+ * size is supported.
+ * @xdp_md: pointer to xdp_md
+ * @delta: A negative integer to be added to xdp_md.data_end
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -821,7 +828,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(xdp_adjust_tail),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index a374b8560bc4..29318598fd60 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2725,6 +2725,30 @@ static const struct bpf_func_proto 
bpf_xdp_adjust_head_proto = {
.arg2_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
+{
+   void *data_end = xdp->data_end + offset;
+
+   /* only shrinking is allowed for now. */
+   if (unlikely(offset >= 0))
+   return -EINVAL;
+
+   if (unlikely(data_end < xdp->data + ETH_HLEN))
+   return -EINVAL;
+
+   xdp->data_end = data_end;
+
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_adjust_tail_proto = {
+   .func   = bpf_xdp_adjust_tail,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
 {
void *meta = xdp->data_meta + offset;
@@ -3074,7 +3098,8 @@ bool bpf_helper_changes_pkt_data(void *func)
func == bpf_l4_csum_replace ||
func == bpf_xdp_adjust_head ||
func == bpf_xdp_adjust_meta ||
-   func == bpf_msg_pull_data)
+   func == bpf_msg_pull_data ||
+   func == bpf_xdp_adjust_tail)
return true;
 
return false;
@@ -3888,6 +3913,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
return _xdp_redirect_proto;
case BPF_FUNC_redirect_map:
return _xdp_redirect_map_proto;
+   case BPF_FUNC_xdp_adjust_tail:
+   return _xdp_adjust_tail_proto;
default:
return bpf_base_func_proto(func_id);
}
-- 
2.15.1



[PATCH bpf-next v2 06/11] bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for nfp driver we will just calculate packet's length unconditionally

Acked-by: Alexei Starovoitov 
Acked-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1eb6549f2a54..d9111c077699 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1722,7 +1722,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
act = bpf_prog_run_xdp(xdp_prog, );
 
-   pkt_len -= xdp.data - orig_data;
+   pkt_len = xdp.data_end - xdp.data;
pkt_off += xdp.data - orig_data;
 
switch (act) {
-- 
2.15.1



[PATCH bpf-next v2 03/11] bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for mlx4 driver we will just calculate packet's length unconditionally
(the same way as it's already being done in mlx5)

Acked-by: Alexei Starovoitov 
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5c613c6663da..efc55feddc5c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -775,8 +775,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
 
act = bpf_prog_run_xdp(xdp_prog, );
 
+   length = xdp.data_end - xdp.data;
if (xdp.data != orig_data) {
-   length = xdp.data_end - xdp.data;
frags[0].page_offset = xdp.data -
xdp.data_hard_start;
va = xdp.data;
-- 
2.15.1



[PATCH bpf-next v2 07/11] bpf: make tun compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for tun driver we need to adjust XDP_PASS handling by recalculating
length of the packet if it was passed to the TCP/IP stack
(in case if after xdp's prog run data_end pointer was adjusted)

Reviewed-by: Jason Wang 
---
 drivers/net/tun.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e58be152d5c..901351a6ed21 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1696,6 +1696,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
return NULL;
case XDP_PASS:
delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
break;
default:
bpf_warn_invalid_xdp_action(act);
@@ -1716,7 +1717,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
}
 
skb_reserve(skb, pad - delta);
-   skb_put(skb, len + delta);
+   skb_put(skb, len);
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
 
-- 
2.15.1



[PATCH bpf-next v2 08/11] bpf: make virtio compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for virtio driver we need to adjust XDP_PASS handling by recalculating
length of the packet if it was passed to the TCP/IP stack

Reviewed-by: Jason Wang 
---
 drivers/net/virtio_net.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 01694e26f03e..779a4f798522 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -606,6 +606,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
case XDP_PASS:
/* Recalculate length in case bpf program changed it */
delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
break;
case XDP_TX:
xdpf = convert_to_xdp_frame();
@@ -642,7 +643,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
goto err;
}
skb_reserve(skb, headroom - delta);
-   skb_put(skb, len + delta);
+   skb_put(skb, len);
if (!delta) {
buf += header_offset;
memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
@@ -757,6 +758,10 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
offset = xdp.data -
page_address(xdp_page) - vi->hdr_len;
 
+   /* recalculate len if xdp.data or xdp.data_end were
+* adjusted
+*/
+   len = xdp.data_end - xdp.data;
/* We can only create skb based on xdp_page. */
if (unlikely(xdp_page != page)) {
rcu_read_unlock();
-- 
2.15.1



[PATCH bpf-next v2 05/11] bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for cavium's thunder driver we will just calculate packet's length
unconditionally

Acked-by: Alexei Starovoitov 
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 707db3304396..7135db45927e 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -538,9 +538,9 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
action = bpf_prog_run_xdp(prog, );
rcu_read_unlock();
 
+   len = xdp.data_end - xdp.data;
/* Check if XDP program has changed headers */
if (orig_data != xdp.data) {
-   len = xdp.data_end - xdp.data;
offset = orig_data - xdp.data;
dma_addr -= offset;
}
-- 
2.15.1



[PATCH bpf-next v2 09/11] bpf: making bpf_prog_test run aware of possible data_end ptr change

2018-04-18 Thread Nikita V. Shirokov
after introduction of bpf_xdp_adjust_tail helper packet length
could be changed not only if xdp->data pointer has been changed
but xdp->data_end as well. making bpf_prog_test_run aware of this
possibility
---
 net/bpf/test_run.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2ced48662c1f..68c3578343b4 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -170,7 +170,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
xdp.rxq = >xdp_rxq;
 
retval = bpf_test_run(prog, , repeat, );
-   if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+   if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
+   xdp.data_end != xdp.data + size)
size = xdp.data_end - xdp.data;
ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
kfree(data);
-- 
2.15.1



[PATCH bpf-next v2 10/11] bpf: adding tests for bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
adding selftests for bpf_xdp_adjust_tail helper. in this synthetic test
we are testing that 1) if data_end < data helper will return EINVAL
2) for normal use case packet's length would be reduced.
---
 tools/include/uapi/linux/bpf.h | 10 +++-
 tools/testing/selftests/bpf/Makefile   |  2 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |  3 +++
 tools/testing/selftests/bpf/test_adjust_tail.c | 30 
 tools/testing/selftests/bpf/test_progs.c   | 32 ++
 5 files changed, 75 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9d07465023a2..56bf493ba7ed 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -755,6 +755,13 @@ union bpf_attr {
  * @addr: pointer to struct sockaddr to bind socket to
  * @addr_len: length of sockaddr structure
  * Return: 0 on success or negative error code
+ *
+ * int bpf_xdp_adjust_tail(xdp_md, delta)
+ * Adjust the xdp_md.data_end by delta. Only shrinking of packet's
+ * size is supported.
+ * @xdp_md: pointer to xdp_md
+ * @delta: A negative integer to be added to xdp_md.data_end
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -821,7 +828,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(xdp_adjust_tail),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 0a315ddabbf4..3e819dc70bee 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -31,7 +31,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
-   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o
+   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o 
test_adjust_tail.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index d8223d99f96d..50c607014b22 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -96,6 +96,9 @@ static int (*bpf_msg_pull_data)(void *ctx, int start, int 
end, int flags) =
(void *) BPF_FUNC_msg_pull_data;
 static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
(void *) BPF_FUNC_bind;
+static int (*bpf_xdp_adjust_tail)(void *ctx, int offset) =
+   (void *) BPF_FUNC_xdp_adjust_tail;
+
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_adjust_tail.c 
b/tools/testing/selftests/bpf/test_adjust_tail.c
new file mode 100644
index ..4cd5e860c903
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_adjust_tail.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail")
+int _xdp_adjust_tail(struct xdp_md *xdp)
+{
+   void *data_end = (void *)(long)xdp->data_end;
+   void *data = (void *)(long)xdp->data;
+   int offset = 0;
+
+   if (data_end - data == 54)
+   offset = 256;
+   else
+   offset = 20;
+   if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+   return XDP_DROP;
+   return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_progs.c 
b/tools/testing/selftests/bpf/test_progs.c
index faadbe233966..eedda98d7bb1 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -166,6 +166,37 @@ static void test_xdp(void)
bpf_object__close(obj);
 }
 
+static void test_xdp_adjust_tail(void)
+{
+   const char *file = "./test_adjust_tail.o";
+   struct bpf_object *obj;
+   char buf[128];
+   __u32 duration, retval, size;
+   int err, prog_fd;
+
+   err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, , _fd);
+   if (err) {
+   error_cnt++;
+   return;
+   }
+
+   err = 

[PATCH bpf-next v2 11/11] bpf: add bpf_xdp_adjust_tail sample prog

2018-04-18 Thread Nikita V. Shirokov
adding bpf's sample program which is using bpf_xdp_adjust_tail helper
by generating ICMPv4 "packet to big" message if ingress packet's size is
bigger then 600 bytes
---
 samples/bpf/Makefile  |   4 +
 samples/bpf/xdp_adjust_tail_kern.c| 152 ++
 samples/bpf/xdp_adjust_tail_user.c| 142 
 tools/testing/selftests/bpf/bpf_helpers.h |   2 +
 4 files changed, 300 insertions(+)
 create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
 create mode 100644 samples/bpf/xdp_adjust_tail_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4d6a6edd4bf6..aa8c392e2e52 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
 hostprogs-y += cpustat
+hostprogs-y += xdp_adjust_tail
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
+xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
 always += cpustat_kern.o
+always += xdp_adjust_tail_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
 HOSTLOADLIBES_cpustat += -lelf
+HOSTLOADLIBES_xdp_adjust_tail += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xdp_adjust_tail_kern.c 
b/samples/bpf/xdp_adjust_tail_kern.c
new file mode 100644
index ..411fdb21f8bc
--- /dev/null
+++ b/samples/bpf/xdp_adjust_tail_kern.c
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program shows how to use bpf_xdp_adjust_tail() by
+ * generating ICMPv4 "packet to big" (unreachable/ df bit set frag needed
+ * to be more preice in case of v4)" where receiving packets bigger then
+ * 600 bytes.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEFAULT_TTL 64
+#define MAX_PCKT_SIZE 600
+#define ICMP_TOOBIG_SIZE 98
+#define ICMP_TOOBIG_PAYLOAD_SIZE 92
+
+struct bpf_map_def SEC("maps") icmpcnt = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(__u32),
+   .value_size = sizeof(__u64),
+   .max_entries = 1,
+};
+
+static __always_inline void count_icmp(void)
+{
+   u64 key = 0;
+   u64 *icmp_count;
+
+   icmp_count = bpf_map_lookup_elem(, );
+   if (icmp_count)
+   *icmp_count += 1;
+}
+
+static __always_inline void swap_mac(void *data, struct ethhdr *orig_eth)
+{
+   struct ethhdr *eth;
+
+   eth = data;
+   memcpy(eth->h_source, orig_eth->h_dest, ETH_ALEN);
+   memcpy(eth->h_dest, orig_eth->h_source, ETH_ALEN);
+   eth->h_proto = orig_eth->h_proto;
+}
+
+static __always_inline __u16 csum_fold_helper(__u32 csum)
+{
+   return ~((csum & 0x) + (csum >> 16));
+}
+
+static __always_inline void ipv4_csum(void *data_start, int data_size,
+ __u32 *csum)
+{
+   *csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);
+   *csum = csum_fold_helper(*csum);
+}
+
+static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
+{
+   int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
+
+   if (bpf_xdp_adjust_head(xdp, 0 - headroom))
+   return XDP_DROP;
+   void *data = (void *)(long)xdp->data;
+   void *data_end = (void *)(long)xdp->data_end;
+
+   if (data + (ICMP_TOOBIG_SIZE + headroom) > data_end)
+   return XDP_DROP;
+
+   struct iphdr *iph, *orig_iph;
+   struct icmphdr *icmp_hdr;
+   struct ethhdr *orig_eth;
+   __u32 csum = 0;
+   __u64 off = 0;
+
+   orig_eth = data + headroom;
+   swap_mac(data, orig_eth);
+   off += sizeof(struct ethhdr);
+   iph = data + off;
+   off += sizeof(struct iphdr);
+   icmp_hdr = data + off;
+   off += sizeof(struct icmphdr);
+   orig_iph = data + off;
+   icmp_hdr->type = ICMP_DEST_UNREACH;
+   icmp_hdr->code = ICMP_FRAG_NEEDED;
+   icmp_hdr->un.frag.mtu = 

[PATCH bpf-next v2 04/11] bpf: make bnxt compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for bnxt driver we will just calculate packet's length unconditionally

Acked-by: Alexei Starovoitov 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 1389ab5e05df..1f0e872d0667 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -113,10 +113,10 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
bnxt_rx_ring_info *rxr, u16 cons,
if (tx_avail != bp->tx_ring_size)
*event &= ~BNXT_RX_EVENT;
 
+   *len = xdp.data_end - xdp.data;
if (orig_data != xdp.data) {
offset = xdp.data - xdp.data_hard_start;
*data_ptr = xdp.data_hard_start + offset;
-   *len = xdp.data_end - xdp.data;
}
switch (act) {
case XDP_PASS:
-- 
2.15.1



[PATCH bpf-next v2 02/11] bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for generic XDP we need to reflect this packet's length change by
adjusting skb's tail pointer

Acked-by: Alexei Starovoitov 
---
 net/core/dev.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 969462ebb296..11c789231a03 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3996,9 +3996,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 struct bpf_prog *xdp_prog)
 {
struct netdev_rx_queue *rxqueue;
+   void *orig_data, *orig_data_end;
u32 metalen, act = XDP_DROP;
struct xdp_buff xdp;
-   void *orig_data;
int hlen, off;
u32 mac_len;
 
@@ -4037,6 +4037,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
xdp.data_meta = xdp.data;
xdp.data_end = xdp.data + hlen;
xdp.data_hard_start = skb->data - skb_headroom(skb);
+   orig_data_end = xdp.data_end;
orig_data = xdp.data;
 
rxqueue = netif_get_rxqueue(skb);
@@ -4051,6 +4052,13 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
__skb_push(skb, -off);
skb->mac_header += off;
 
+   /* check if bpf_xdp_adjust_tail was used. it can only "shrink"
+* pckt.
+*/
+   off = orig_data_end - xdp.data_end;
+   if (off != 0)
+   skb_set_tail_pointer(skb, xdp.data_end - xdp.data);
+
switch (act) {
case XDP_REDIRECT:
case XDP_TX:
-- 
2.15.1



[PATCH bpf-next v2 00/11] introduction of bpf_xdp_adjust_tail

2018-04-18 Thread Nikita V. Shirokov
In this patch series i'm add new bpf helper which allow to manupulate
xdp's data_end pointer. right now only "shrinking" (reduce packet's size
by moving pointer) is supported (and i see no use case for "growing").
Main use case for such helper is to be able to generate controll (ICMP)
messages from XDP context. such messages usually contains first N bytes
from original packets as a payload, and this is exactly what this helper
would allow us to do (see patch 3 for sample program, where we generate
ICMP "packet too big" message). This helper could be usefull for load
balancing applications where after additional encapsulation, resulting
packet could be bigger then interface MTU.
Aside from new helper this patch series contains minor changes in device
drivers (for ones which requires), so they would recal packet's length
not only when head pointer was adjusted, but if tail's one as well.

v1->v2:
 * fixed kbuild warning
 * made offset eq 0 invalid for xdp_bpf_adjust_tail
 * splitted bpf_prog_test_run fix and selftests in sep commits
 * added SPDX licence where applicable
 * some reshuffling in patches order (tests now in the end)


Nikita V. Shirokov (11):
  bpf: making bpf_prog_test run aware of possible data_end ptr change
  bpf: adding tests for bpf_xdp_adjust_tail
  bpf: adding bpf_xdp_adjust_tail helper
  bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail
  bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail
  bpf: make bnxt compatible w/ bpf_xdp_adjust_tail
  bpf: make cavium thunder compatible w/ bpf_xdp_adjust_tail
  bpf: make netronome nfp compatible w/ bpf_xdp_adjust_tail
  bpf: make tun compatible w/ bpf_xdp_adjust_tail
  bpf: make virtio compatible w/ bpf_xdp_adjust_tail
  bpf: add bpf_xdp_adjust_tail sample prog

 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   2 +-
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   2 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   2 +-
 drivers/net/tun.c  |   3 +-
 drivers/net/virtio_net.c   |   7 +-
 include/uapi/linux/bpf.h   |  10 +-
 net/bpf/test_run.c |   3 +-
 net/core/dev.c |  10 +-
 net/core/filter.c  |  29 +++-
 samples/bpf/Makefile   |   4 +
 samples/bpf/xdp_adjust_tail_kern.c | 152 +
 samples/bpf/xdp_adjust_tail_user.c | 142 +++
 tools/include/uapi/linux/bpf.h |  10 +-
 tools/testing/selftests/bpf/Makefile   |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |   5 +
 tools/testing/selftests/bpf/test_adjust_tail.c |  30 
 tools/testing/selftests/bpf/test_progs.c   |  32 +
 18 files changed, 435 insertions(+), 12 deletions(-)
 create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
 create mode 100644 samples/bpf/xdp_adjust_tail_user.c
 create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c

-- 
2.15.1



[PATCH bpf-next 10/10] [bpf]: make virtio compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for virtio driver we need to adjust XDP_PASS handling by recalculating
length of the packet if it was passed to the TCP/IP stack

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/virtio_net.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7b187ec7411e..115d85f7360a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -604,6 +604,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
case XDP_PASS:
/* Recalculate length in case bpf program changed it */
delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
break;
case XDP_TX:
sent = __virtnet_xdp_xmit(vi, );
@@ -637,7 +638,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
goto err;
}
skb_reserve(skb, headroom - delta);
-   skb_put(skb, len + delta);
+   skb_put(skb, len);
if (!delta) {
buf += header_offset;
memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
@@ -752,6 +753,10 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
offset = xdp.data -
page_address(xdp_page) - vi->hdr_len;
 
+   /* recalculate len if xdp.data or xdp.data_end were
+* adjusted
+*/
+   len = xdp.data_end - xdp.data;
/* We can only create skb based on xdp_page. */
if (unlikely(xdp_page != page)) {
rcu_read_unlock();
-- 
2.15.1



[PATCH bpf-next 09/10] [bpf]: make tun compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for tun driver we need to adjust XDP_PASS handling by recalculating
length of the packet if it was passed to the TCP/IP stack
(in case if after xdp's prog run data_end pointer was adjusted)

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/tun.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 28583aa0c17d..0b488a958076 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1688,6 +1688,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
return NULL;
case XDP_PASS:
delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
break;
default:
bpf_warn_invalid_xdp_action(act);
@@ -1708,7 +1709,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
}
 
skb_reserve(skb, pad - delta);
-   skb_put(skb, len + delta);
+   skb_put(skb, len);
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
 
-- 
2.15.1



[PATCH bpf-next 08/10] [bpf]: make netronome nfp compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for nfp driver we will just calculate packet's length unconditionally

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1eb6549f2a54..d9111c077699 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1722,7 +1722,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
act = bpf_prog_run_xdp(xdp_prog, );
 
-   pkt_len -= xdp.data - orig_data;
+   pkt_len = xdp.data_end - xdp.data;
pkt_off += xdp.data - orig_data;
 
switch (act) {
-- 
2.15.1



[PATCH bpf-next 02/10] [bpf]: adding tests for bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
adding selftests for bpf_xdp_adjust_tail helper. in this syntetic test
we are testing that 1) if data_end < data helper will return EINVAL
2) for normal use case packet's length would be reduced.

aside from adding new tests i'm changing behaviour of bpf_prog_test_run
so it would recalculate packet's length if only data_end pointer was
changed

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 net/bpf/test_run.c |  3 ++-
 tools/include/uapi/linux/bpf.h | 11 -
 tools/testing/selftests/bpf/Makefile   |  2 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |  3 +++
 tools/testing/selftests/bpf/test_adjust_tail.c | 29 +++
 tools/testing/selftests/bpf/test_progs.c   | 32 ++
 6 files changed, 77 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2ced48662c1f..68c3578343b4 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -170,7 +170,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
xdp.rxq = >xdp_rxq;
 
retval = bpf_test_run(prog, , repeat, );
-   if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+   if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
+   xdp.data_end != xdp.data + size)
size = xdp.data_end - xdp.data;
ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
kfree(data);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9d07465023a2..9a2d1a04eb24 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -755,6 +755,13 @@ union bpf_attr {
  * @addr: pointer to struct sockaddr to bind socket to
  * @addr_len: length of sockaddr structure
  * Return: 0 on success or negative error code
+ *
+ * int bpf_xdp_adjust_tail(xdp_md, delta)
+ * Adjust the xdp_md.data_end by delta. Only shrinking of packet's
+ * size is supported.
+ * @xdp_md: pointer to xdp_md
+ * @delta: A negative integer to be added to xdp_md.data_end
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -821,7 +828,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(xdp_adjust_tail),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -864,6 +872,7 @@ enum bpf_func_id {
 /* BPF_FUNC_skb_set_tunnel_key flags. */
 #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
 #define BPF_F_DONT_FRAGMENT(1ULL << 2)
+#define BPF_F_SEQ_NUMBER   (1ULL << 3)
 
 /* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
  * BPF_FUNC_perf_event_read_value flags.
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 0a315ddabbf4..3e819dc70bee 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -31,7 +31,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
-   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o
+   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o 
test_adjust_tail.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index d8223d99f96d..50c607014b22 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -96,6 +96,9 @@ static int (*bpf_msg_pull_data)(void *ctx, int start, int 
end, int flags) =
(void *) BPF_FUNC_msg_pull_data;
 static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
(void *) BPF_FUNC_bind;
+static int (*bpf_xdp_adjust_tail)(void *ctx, int offset) =
+   (void *) BPF_FUNC_xdp_adjust_tail;
+
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_adjust_tail.c 
b/tools/testing/selftests/bpf/test_adjust_tail.c
new file mode 100644
index ..86239e792d6d
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_adjust_tail.c
@@ -0,0 +1,29 @@
+/* Copyright (c) 2016,2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Softw

[PATCH bpf-next 05/10] [bpf]: make mlx4 compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for mlx4 driver we will just calculate packet's length unconditionally
(the same way as it's already being done in mlx5)

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5c613c6663da..efc55feddc5c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -775,8 +775,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
 
act = bpf_prog_run_xdp(xdp_prog, );
 
+   length = xdp.data_end - xdp.data;
if (xdp.data != orig_data) {
-   length = xdp.data_end - xdp.data;
frags[0].page_offset = xdp.data -
xdp.data_hard_start;
va = xdp.data;
-- 
2.15.1



[PATCH bpf-next 04/10] [bpf]: make generic xdp compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for generic XDP we need to reflect this packet's length change by
adjusting skb's tail pointer

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 net/core/dev.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 969462ebb296..11c789231a03 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3996,9 +3996,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 struct bpf_prog *xdp_prog)
 {
struct netdev_rx_queue *rxqueue;
+   void *orig_data, *orig_data_end;
u32 metalen, act = XDP_DROP;
struct xdp_buff xdp;
-   void *orig_data;
int hlen, off;
u32 mac_len;
 
@@ -4037,6 +4037,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
xdp.data_meta = xdp.data;
xdp.data_end = xdp.data + hlen;
xdp.data_hard_start = skb->data - skb_headroom(skb);
+   orig_data_end = xdp.data_end;
orig_data = xdp.data;
 
rxqueue = netif_get_rxqueue(skb);
@@ -4051,6 +4052,13 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
__skb_push(skb, -off);
skb->mac_header += off;
 
+   /* check if bpf_xdp_adjust_tail was used. it can only "shrink"
+* pckt.
+*/
+   off = orig_data_end - xdp.data_end;
+   if (off != 0)
+   skb_set_tail_pointer(skb, xdp.data_end - xdp.data);
+
switch (act) {
case XDP_REDIRECT:
case XDP_TX:
-- 
2.15.1



[PATCH bpf-next 00/10] introduction of bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
In this patch series i'm adding new bpf helper which allow to manupulate
xdp's data_end pointer. right now only "shrinking" (reduce packet's size
by moving pointer) is supported (and i see no use case for "growing").
Main use case for such helper is to be able to generate controll (ICMP)
messages from XDP context. such messages usually contains first N bytes
from original packets as a payload, and this is exactly what this helper
would allow us to do (see patch 3 for sample program, where we generate
ICMP "packet too big" message). This helper could be usefull for load
balancing applications where after additional encapsulation, resulting
packet could be bigger then interface MTU.
Aside from new helper this patch series contains minor changes in device
drivers (for ones which requires), so they would recal packet's length
not only when head pointer was adjusted, but if tail's one as well.

Nikita V. Shirokov (10):
  [bpf]: adding bpf_xdp_adjust_tail helper
  [bpf]: adding tests for bpf_xdp_adjust_tail
  [bpf]: add bpf_xdp_adjust_tail sample prog
  [bpf]: make generic xdp compatible w/ bpf_xdp_adjust_tail
  [bpf]: make mlx4 compatible w/ bpf_xdp_adjust_tail
  [bpf]: make bnxt compatible w/ bpf_xdp_adjust_tail
  [bpf]: make cavium thunder compatible w/ bpf_xdp_adjust_tail
  [bpf]: make netronome nfp compatible w/ bpf_xdp_adjust_tail
  [bpf]: make tun compatible w/ bpf_xdp_adjust_tail
  [bpf]: make virtio compatible w/ bpf_xdp_adjust_tail

 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c  |   2 +-
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   2 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|   2 +-
 drivers/net/tun.c  |   3 +-
 drivers/net/virtio_net.c   |   7 +-
 include/uapi/linux/bpf.h   |  10 +-
 net/bpf/test_run.c |   3 +-
 net/core/dev.c |  10 +-
 net/core/filter.c  |  29 +++-
 samples/bpf/Makefile   |   4 +
 samples/bpf/xdp_adjust_tail_kern.c | 151 +
 samples/bpf/xdp_adjust_tail_user.c | 141 +++
 tools/include/uapi/linux/bpf.h |  11 +-
 tools/testing/selftests/bpf/Makefile   |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h  |   5 +
 tools/testing/selftests/bpf/test_adjust_tail.c |  29 
 tools/testing/selftests/bpf/test_progs.c   |  32 +
 18 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
 create mode 100644 samples/bpf/xdp_adjust_tail_user.c
 create mode 100644 tools/testing/selftests/bpf/test_adjust_tail.c

-- 
2.15.1



[PATCH bpf-next 06/10] [bpf]: make bnxt compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for bnxt driver we will just calculate packet's length unconditionally

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 1389ab5e05df..1f0e872d0667 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -113,10 +113,10 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct 
bnxt_rx_ring_info *rxr, u16 cons,
if (tx_avail != bp->tx_ring_size)
*event &= ~BNXT_RX_EVENT;
 
+   *len = xdp.data_end - xdp.data;
if (orig_data != xdp.data) {
offset = xdp.data - xdp.data_hard_start;
*data_ptr = xdp.data_hard_start + offset;
-   *len = xdp.data_end - xdp.data;
}
switch (act) {
case XDP_PASS:
-- 
2.15.1



[PATCH bpf-next 07/10] [bpf]: make cavium thunder compatible w/ bpf_xdp_adjust_tail

2018-04-17 Thread Nikita V. Shirokov
w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for cavium's thunder driver we will just calculate packet's length
unconditionally

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 707db3304396..7135db45927e 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -538,9 +538,9 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct 
bpf_prog *prog,
action = bpf_prog_run_xdp(prog, );
rcu_read_unlock();
 
+   len = xdp.data_end - xdp.data;
/* Check if XDP program has changed headers */
if (orig_data != xdp.data) {
-   len = xdp.data_end - xdp.data;
offset = orig_data - xdp.data;
dma_addr -= offset;
}
-- 
2.15.1



[PATCH bpf-next 01/10] [bpf]: adding bpf_xdp_adjust_tail helper

2018-04-17 Thread Nikita V. Shirokov
Adding new bpf helper which would allow us to manipulate
xdp's data_end pointer, and allow us to reduce packet's size
indended use case: to generate ICMP messages from XDP context,
where such message would contain truncated original packet.

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 include/uapi/linux/bpf.h | 10 +-
 net/core/filter.c| 29 -
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec89732a8d..9a2d1a04eb24 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -755,6 +755,13 @@ union bpf_attr {
  * @addr: pointer to struct sockaddr to bind socket to
  * @addr_len: length of sockaddr structure
  * Return: 0 on success or negative error code
+ *
+ * int bpf_xdp_adjust_tail(xdp_md, delta)
+ * Adjust the xdp_md.data_end by delta. Only shrinking of packet's
+ * size is supported.
+ * @xdp_md: pointer to xdp_md
+ * @delta: A negative integer to be added to xdp_md.data_end
+ * Return: 0 on success or negative on error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -821,7 +828,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(xdp_adjust_tail),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index d31aff93270d..6c8ac7b548d6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2717,6 +2717,30 @@ static const struct bpf_func_proto 
bpf_xdp_adjust_head_proto = {
.arg2_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
+{
+   /* only shrinking is allowed for now. */
+   if (unlikely(offset > 0))
+   return -EINVAL;
+
+   void *data_end = xdp->data_end + offset;
+
+   if (unlikely(data_end < xdp->data + ETH_HLEN))
+   return -EINVAL;
+
+   xdp->data_end = data_end;
+
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_adjust_tail_proto = {
+   .func   = bpf_xdp_adjust_tail,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
 {
void *meta = xdp->data_meta + offset;
@@ -3053,7 +3077,8 @@ bool bpf_helper_changes_pkt_data(void *func)
func == bpf_l4_csum_replace ||
func == bpf_xdp_adjust_head ||
func == bpf_xdp_adjust_meta ||
-   func == bpf_msg_pull_data)
+   func == bpf_msg_pull_data ||
+   func == bpf_xdp_adjust_tail)
return true;
 
return false;
@@ -3867,6 +3892,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
return _xdp_redirect_proto;
case BPF_FUNC_redirect_map:
return _xdp_redirect_map_proto;
+   case BPF_FUNC_xdp_adjust_tail:
+   return _xdp_adjust_tail_proto;
default:
return bpf_base_func_proto(func_id);
}
-- 
2.15.1



[PATCH bpf-next 03/10] [bpf]: add bpf_xdp_adjust_tail sample prog

2018-04-17 Thread Nikita V. Shirokov
adding bpf's sample program which is using bpf_xdp_adjust_tail helper
by generating ICMPv4 "packet to big" message if ingress packet's size is
bigger then 600 bytes

Signed-off-by: Nikita V. Shirokov <tehn...@tehnerd.com>
---
 samples/bpf/Makefile  |   4 +
 samples/bpf/xdp_adjust_tail_kern.c| 151 ++
 samples/bpf/xdp_adjust_tail_user.c| 141 
 tools/testing/selftests/bpf/bpf_helpers.h |   2 +
 4 files changed, 298 insertions(+)
 create mode 100644 samples/bpf/xdp_adjust_tail_kern.c
 create mode 100644 samples/bpf/xdp_adjust_tail_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4d6a6edd4bf6..aa8c392e2e52 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
 hostprogs-y += cpustat
+hostprogs-y += xdp_adjust_tail
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
+xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
 always += cpustat_kern.o
+always += xdp_adjust_tail_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
 HOSTLOADLIBES_cpustat += -lelf
+HOSTLOADLIBES_xdp_adjust_tail += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xdp_adjust_tail_kern.c 
b/samples/bpf/xdp_adjust_tail_kern.c
new file mode 100644
index ..17570559fd08
--- /dev/null
+++ b/samples/bpf/xdp_adjust_tail_kern.c
@@ -0,0 +1,151 @@
+/* Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program shows how to use bpf_xdp_adjust_tail() by
+ * generating ICMPv4 "packet to big" (unreachable/ df bit set frag needed
+ * to be more preice in case of v4)" where receiving packets bigger then
+ * 600 bytes.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define DEFAULT_TTL 64
+#define MAX_PCKT_SIZE 600
+#define ICMP_TOOBIG_SIZE 98
+#define ICMP_TOOBIG_PAYLOAD_SIZE 92
+
+struct bpf_map_def SEC("maps") icmpcnt = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(__u32),
+   .value_size = sizeof(__u64),
+   .max_entries = 1,
+};
+
+static __always_inline void count_icmp(void)
+{
+   u64 key = 0;
+   u64 *icmp_count;
+
+   icmp_count = bpf_map_lookup_elem(, );
+   if (icmp_count)
+   *icmp_count += 1;
+}
+
+static __always_inline void swap_mac(void *data, struct ethhdr *orig_eth)
+{
+   struct ethhdr *eth;
+
+   eth = data;
+   memcpy(eth->h_source, orig_eth->h_dest, ETH_ALEN);
+   memcpy(eth->h_dest, orig_eth->h_source, ETH_ALEN);
+   eth->h_proto = orig_eth->h_proto;
+}
+
+static __always_inline __u16 csum_fold_helper(__u32 csum)
+{
+   return ~((csum & 0x) + (csum >> 16));
+}
+
+static __always_inline void ipv4_csum(void *data_start, int data_size,
+ __u32 *csum)
+{
+   *csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);
+   *csum = csum_fold_helper(*csum);
+}
+
+static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
+{
+   int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
+
+   if (bpf_xdp_adjust_head(xdp, 0 - headroom))
+   return XDP_DROP;
+   void *data = (void *)(long)xdp->data;
+   void *data_end = (void *)(long)xdp->data_end;
+
+   if (data + (ICMP_TOOBIG_SIZE + headroom) > data_end)
+   return XDP_DROP;
+
+   struct iphdr *iph, *orig_iph;
+   struct icmphdr *icmp_hdr;
+   struct ethhdr *orig_eth;
+   __u32 csum = 0;
+   __u64 off = 0;
+
+   orig_eth = data + headroom;
+   swap_mac(data, orig_eth);
+   off += sizeof(struct ethhdr);
+   iph = data + off;
+   off += sizeof(struct iphdr);
+   icmp_hdr = data + off;
+   off += sizeof(struct icmphdr);
+   orig_iph = data + off;
+   icmp_hdr-&

[PATCH bpf-next]: add sock_ops R/W access to ipv4 tos

2018-03-26 Thread Nikita V. Shirokov
bpf: Add sock_ops R/W access to ipv4 tos

Sample usage for tos:

  bpf_getsockopt(skops, SOL_IP, IP_TOS, , sizeof(v))

where skops is a pointer to the ctx (struct bpf_sock_ops).

Signed-off-by: Nikita V. Shirokov <tehn...@fb.com>
---
 net/core/filter.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 00c711c..afd8255 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3462,6 +3462,27 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
ret = -EINVAL;
}
 #ifdef CONFIG_INET
+   } else if (level == SOL_IP) {
+   if (optlen != sizeof(int) || sk->sk_family != AF_INET)
+   return -EINVAL;
+
+   val = *((int *)optval);
+   /* Only some options are supported */
+   switch (optname) {
+   case IP_TOS:
+   if (val < -1 || val > 0xff) {
+   ret = -EINVAL;
+   } else {
+   struct inet_sock *inet = inet_sk(sk);
+
+   if (val == -1)
+   val = 0;
+   inet->tos = val;
+   }
+   break;
+   default:
+   ret = -EINVAL;
+   }
 #if IS_ENABLED(CONFIG_IPV6)
} else if (level == SOL_IPV6) {
if (optlen != sizeof(int) || sk->sk_family != AF_INET6)
@@ -3561,6 +3582,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
} else {
goto err_clear;
}
+   } else if (level == SOL_IP) {
+   struct inet_sock *inet = inet_sk(sk);
+
+   if (optlen != sizeof(int) || sk->sk_family != AF_INET)
+   goto err_clear;
+
+   /* Only some options are supported */
+   switch (optname) {
+   case IP_TOS:
+   *((int *)optval) = (int)inet->tos;
+   break;
+   default:
+   goto err_clear;
+   }
 #if IS_ENABLED(CONFIG_IPV6)
} else if (level == SOL_IPV6) {
struct ipv6_pinfo *np = inet6_sk(sk);
-- 
2.9.5



[PATCH net v2] adding missing rcu_read_unlock in ipxip6_rcv

2017-12-06 Thread Nikita V. Shirokov
commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in  ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this

v1->v2:
 instead of doing rcu_read_unlock in place, we are going to "drop"
 section (to prevent skb leakage)

Signed-off-by: Nikita V. Shirokov <tehn...@fb.com>
---
 net/ipv6/ip6_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3d3092a..db84f52 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -904,7 +904,7 @@ static int ipxip6_rcv(struct sk_buff *skb, u8 ipproto,
if (t->parms.collect_md) {
tun_dst = ipv6_tun_rx_dst(skb, 0, 0, 0);
if (!tun_dst)
-   return 0;
+   goto drop;
}
ret = __ip6_tnl_rcv(t, skb, tpi, tun_dst, dscp_ecn_decapsulate,
log_ecn_error);
-- 
2.9.5



[PATCH net] adding missing rcu_read_unlock in ipxip6_rcv

2017-12-06 Thread Nikita V. Shirokov
commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in  ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this

Signed-off-by: Nikita V. Shirokov <tehn...@fb.com>
---
 net/ipv6/ip6_tunnel.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3d3092a..00f2c79 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -903,8 +903,10 @@ static int ipxip6_rcv(struct sk_buff *skb, u8 ipproto,
goto drop;
if (t->parms.collect_md) {
tun_dst = ipv6_tun_rx_dst(skb, 0, 0, 0);
-   if (!tun_dst)
+   if (!tun_dst) {
+   rcu_read_unlock();
return 0;
+   }
}
ret = __ip6_tnl_rcv(t, skb, tpi, tun_dst, dscp_ecn_decapsulate,
log_ecn_error);
-- 
2.9.5