Re: [PATCH bpf-next] tools: bpftool: add a command to dump the trace pipe

2018-12-06 Thread Quentin Monnet
2018-12-05 19:18 UTC-0800 ~ Alexei Starovoitov

> On Wed, Dec 05, 2018 at 06:15:23PM +0000, Quentin Monnet wrote:
>>>> +
>>>> +  /* Allow room for NULL terminating byte and pipe file name */
>>>> +  snprintf(format, sizeof(format), "%%*s %%%zds %%99s %%*s %%*d %%*d\\n",
>>>> +   PATH_MAX - strlen(pipe_name) - 1);
>>>
>>> before scanning trace_pipe could you add a check that trace_options are 
>>> compatible?
>>> Otherwise there will be a lot of garbage printed.
>>> afaik default is rarely changed, so the patch is ok as-is.
>>> The followup some time in the future would be perfect.
>>
>> Sure. What do you mean exactly by compatible options? I can check that
>> "trace_printk" is set, is there any other option that would be relevant?
> 
> See Documentation/trace/ftrace.rst
> a lot of the flags will change the format significantly.
> Like 'bin' will make it binary.
> I'm not suggesting to support all possible output formats.
> Only to check that trace flags match scanf.

fscanf() is only used to retrieve the name of the sysfs directory where
the pipe is located, when listing all the mount points on the system. It
is not used to dump the content from the pipe (which is done with
getline(), so formatting does not matter much).

If the "bin" option is set, "bpftool prog tracelog" will dump the same
binary content as "cat /sys/kernel/debug/tracing/trace_pipe", which is
the expected behaviour (at least with the current patch). Let me know if
you would like me to change this somehow.

Thanks,
Quentin


Re: [PATCH bpf-next] tools: bpftool: add a command to dump the trace pipe

2018-12-05 Thread Quentin Monnet
2018-12-05 08:50 UTC-0800 ~ Alexei Starovoitov 


On Wed, Dec 05, 2018 at 10:28:24AM +, Quentin Monnet wrote:

BPF programs can use the bpf_trace_printk() helper to print debug
information into the trace pipe. Add a subcommand
"bpftool prog tracelog" to simply dump this pipe to the console.

This is for a good part copied from iproute2, where the feature is
available with "tc exec bpf dbg". Changes include dumping pipe content
to stdout instead of stderr and adding JSON support (content is dumped
as an array of strings, one per line read from the pipe). This version
is dual-licensed, with Daniel's permission.

Cc: Daniel Borkmann 
Suggested-by: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---



+static bool find_tracefs_pipe(char *mnt)
+{
+   static const char * const known_mnts[] = {
+   "/sys/kernel/debug/tracing",
+   "/sys/kernel/tracing",
+   "/tracing",
+   "/trace",


I wonder where this list came from?
I only knew of 1st one.


I only met the first form too. I took the list from iproute2. I can 
change it in the future if we find out trying all these paths is not 
relevant.



+   };
+   const char *pipe_name = "/trace_pipe";
+   const char *fstype = "tracefs";
+   char type[100], format[32];
+   const char * const *ptr;
+   bool found = false;
+   FILE *fp;
+
+   for (ptr = known_mnts; ptr < known_mnts + ARRAY_SIZE(known_mnts); ptr++)
+   if (find_tracefs_mnt_single(TRACEFS_MAGIC, mnt, *ptr))
+   goto exit_found;
+
+   fp = fopen("/proc/mounts", "r");
+   if (!fp)
+   return false;
+
+   /* Allow room for NULL terminating byte and pipe file name */
+   snprintf(format, sizeof(format), "%%*s %%%zds %%99s %%*s %%*d %%*d\\n",
+PATH_MAX - strlen(pipe_name) - 1);


before scanning trace_pipe could you add a check that trace_options are 
compatible?
Otherwise there will be a lot of garbage printed.
afaik default is rarely changed, so the patch is ok as-is.
The followup some time in the future would be perfect.


Sure. What do you mean exactly by compatible options? I can check that 
"trace_printk" is set, is there any other option that would be relevant?


Thanks,
Quentin


[PATCH bpf-next] tools: bpftool: add a command to dump the trace pipe

2018-12-05 Thread Quentin Monnet
BPF programs can use the bpf_trace_printk() helper to print debug
information into the trace pipe. Add a subcommand
"bpftool prog tracelog" to simply dump this pipe to the console.

This is for a good part copied from iproute2, where the feature is
available with "tc exec bpf dbg". Changes include dumping pipe content
to stdout instead of stderr and adding JSON support (content is dumped
as an array of strings, one per line read from the pipe). This version
is dual-licensed, with Daniel's permission.

Cc: Daniel Borkmann 
Suggested-by: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst |  13 +-
 tools/bpf/bpftool/bash-completion/bpftool|   5 +-
 tools/bpf/bpftool/main.h |   1 +
 tools/bpf/bpftool/prog.c |   4 +-
 tools/bpf/bpftool/tracelog.c | 157 +++
 5 files changed, 176 insertions(+), 4 deletions(-)
 create mode 100644 tools/bpf/bpftool/tracelog.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index ab36e920e552..5524b6dccd85 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -26,8 +26,9 @@ MAP COMMANDS
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
 |  **bpftool** **prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
-|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
-|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
+|  **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+|  **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
+|  **bpftool** **prog tracelog**
 |  **bpftool** **prog help**
 |
 |  *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -117,6 +118,14 @@ DESCRIPTION
  parameter, with the exception of *flow_dissector* which is
  detached from the current networking name space.
 
+   **bpftool prog tracelog**
+ Dump the trace pipe of the system to the console (stdout).
+ Hit  to stop printing. BPF programs can write to this
+ trace pipe at runtime with the **bpf_trace_printk()** helper.
+ This should be used only for debugging purposes. For
+ streaming data from BPF programs to user space, one can use
+ perf events (see also **bpftool-map**\ (8)).
+
**bpftool prog help**
  Print short help message.
 
diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index 9a60080f085f..44c189ba072a 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -398,10 +398,13 @@ _bpftool()
 ;;
 esac
 ;;
+tracelog)
+return 0
+;;
 *)
 [[ $prev == $object ]] && \
 COMPREPLY=( $( compgen -W 'dump help pin attach detach 
load \
-show list' -- "$cur" ) )
+show list tracelog' -- "$cur" ) )
 ;;
 esac
 ;;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 2761981669c8..0be0dd8f467f 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -167,6 +167,7 @@ int do_event_pipe(int argc, char **argv);
 int do_cgroup(int argc, char **arg);
 int do_perf(int argc, char **arg);
 int do_net(int argc, char **arg);
+int do_tracelog(int argc, char **arg);
 
 int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what);
 int prog_parse_fd(int *argc, char ***argv);
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 56db61c5a91f..54c8dbf05c9c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -1140,6 +1140,7 @@ static int do_help(int argc, char **argv)
" [pinmaps MAP_DIR]\n"
"   %s %s attach PROG ATTACH_TYPE [MAP]\n"
"   %s %s detach PROG ATTACH_TYPE [MAP]\n"
+   "   %s %s tracelog\n"
"   %s %s help\n"
"\n"
"   " HELP_SPEC_MAP "\n"
@@ -1158,7 +1159,7 @@ static int do_help(int argc, char **argv)
"",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
-   bin_name, argv[-2], bin_name, argv[-2

[PATCH bpf-next] bpf: fix documentation for eBPF helpers

2018-12-03 Thread Quentin Monnet
The missing indentation on the "Return" sections for bpf_map_pop_elem()
and bpf_map_peek_elem() helpers break RST and man pages generation. This
patch fixes them, and moves the description of those two helpers towards
the end of the list (even though they are somehow related to the three
first helpers for maps, the man page explicitly states that the helpers
are sorted in chronological order).

While at it, bring other minor formatting edits for eBPF helpers
documentation: mostly blank lines removal, RST formatting, or other
small nits for consistency.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/uapi/linux/bpf.h | 90 
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 597afdbc1ab9..41426e0efd5a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -482,18 +482,6 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
- * int bpf_map_pop_elem(struct bpf_map *map, void *value)
- * Description
- * Pop an element from *map*.
- * Return
- * 0 on success, or a negative error in case of failure.
- *
- * int bpf_map_peek_elem(struct bpf_map *map, void *value)
- * Description
- * Get an element from *map* without removing it.
- * Return
- * 0 on success, or a negative error in case of failure.
- *
  * int bpf_probe_read(void *dst, u32 size, const void *src)
  * Description
  * For tracing programs, safely attempt to read *size* bytes from
@@ -1917,9 +1905,9 @@ union bpf_attr {
  * is set to metric from route (IPv4/IPv6 only), and ifindex
  * is set to the device index of the nexthop from the FIB lookup.
  *
- * *plen* argument is the size of the passed in struct.
- * *flags* argument can be a combination of one or more of the
- * following values:
+ * *plen* argument is the size of the passed in struct.
+ * *flags* argument can be a combination of one or more of the
+ * following values:
  *
  * **BPF_FIB_LOOKUP_DIRECT**
  * Do a direct table lookup vs full lookup using FIB
@@ -1928,9 +1916,9 @@ union bpf_attr {
  * Perform lookup from an egress perspective (default is
  * ingress).
  *
- * *ctx* is either **struct xdp_md** for XDP programs or
- * **struct sk_buff** tc cls_act programs.
- * Return
+ * *ctx* is either **struct xdp_md** for XDP programs or
+ * **struct sk_buff** tc cls_act programs.
+ * Return
  * * < 0 if any input argument is invalid
  * *   0 on success (packet is forwarded, nexthop neighbor exists)
  * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
@@ -2075,8 +2063,8 @@ union bpf_attr {
  * translated to a keycode using the rc keymap, and reported as
  * an input key down event. After a period a key up event is
  * generated. This period can be extended by calling either
- * **bpf_rc_keydown** () again with the same values, or calling
- * **bpf_rc_repeat** ().
+ * **bpf_rc_keydown**\ () again with the same values, or calling
+ * **bpf_rc_repeat**\ ().
  *
  * Some protocols include a toggle bit, in case the button was
  * released and pressed again between consecutive scancodes.
@@ -2159,21 +2147,22 @@ union bpf_attr {
  * The *flags* meaning is specific for each map type,
  * and has to be 0 for cgroup local storage.
  *
- * Depending on the bpf program type, a local storage area
- * can be shared between multiple instances of the bpf program,
+ * Depending on the BPF program type, a local storage area
+ * can be shared between multiple instances of the BPF program,
  * running simultaneously.
  *
  * A user should care about the synchronization by himself.
- * For example, by using the BPF_STX_XADD instruction to alter
+ * For example, by using the **BPF_STX_XADD** instruction to alter
  * the shared data.
  * Return
- * Pointer to the local storage area.
+ * A pointer to the local storage area.
  *
  * int bpf_sk_select_reuseport(struct sk_reuseport_md *reuse, struct bpf_map 
*map, void *key, u64 flags)
  * Description
- * Select a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY map
- * It checks the selected sk is matching the incoming
- * request in the skb.
+ * Select a **SO_REUSEPORT** socket from a
+ * **BPF_MAP_TYPE_REUSEPORT_ARRAY** *map*.
+ * It checks the selected socket is match

[PATCH bpf-next 2/5] tools: bpftool: fix bash completion for bpftool prog (attach|detach)

2018-11-30 Thread Quentin Monnet
Fix bash completion for "bpftool prog (attach|detach) PROG TYPE MAP" so
that the list of indices proposed for MAP are map indices, and not PROG
indices. Also use variables for map and prog reference types ("id",
"pinned", and "tag" for programs).

Fixes: b7d3826c2ed6 ("bpf: bpftool, add support for attaching programs to maps")
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/bash-completion/bpftool | 73 +--
 1 file changed, 49 insertions(+), 24 deletions(-)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index 45c2db257d2b..b7e6c4f25ad1 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -243,16 +243,20 @@ _bpftool()
 # Completion depends on object and command in use
 case $object in
 prog)
-if [[ $command != "load" && $command != "loadall" ]]; then
-case $prev in
-id)
-_bpftool_get_prog_ids
-return 0
-;;
-esac
-fi
+# Complete id, only for subcommands that use prog (but no map) ids
+case $command in
+show|list|dump|pin)
+case $prev in
+id)
+_bpftool_get_prog_ids
+return 0
+;;
+esac
+;;
+esac
 
 local PROG_TYPE='id pinned tag'
+local MAP_TYPE='id pinned'
 case $command in
 show|list)
 [[ $prev != "$command" ]] && return 0
@@ -293,22 +297,43 @@ _bpftool()
 return 0
 ;;
 attach|detach)
-if [[ ${#words[@]} == 7 ]]; then
-COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
-return 0
-fi
-
-if [[ ${#words[@]} == 6 ]]; then
-COMPREPLY=( $( compgen -W "msg_verdict skb_verdict \
-skb_parse flow_dissector" -- "$cur" ) )
-return 0
-fi
-
-if [[ $prev == "$command" ]]; then
-COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
-return 0
-fi
-return 0
+case $cword in
+3)
+COMPREPLY=( $( compgen -W "$PROG_TYPE" -- "$cur" ) 
)
+return 0
+;;
+4)
+case $prev in
+id)
+_bpftool_get_prog_ids
+;;
+pinned)
+_filedir
+;;
+esac
+return 0
+;;
+5)
+COMPREPLY=( $( compgen -W 'msg_verdict skb_verdict 
\
+skb_parse flow_dissector' -- "$cur" ) )
+return 0
+;;
+6)
+COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
+return 0
+;;
+7)
+case $prev in
+id)
+_bpftool_get_map_ids
+;;
+pinned)
+_filedir
+;;
+esac
+return 0
+;;
+esac
 ;;
 load|loadall)
 local obj
-- 
2.7.4



[PATCH bpf-next 0/5] tools: bpftool: fixes and small improvements

2018-11-30 Thread Quentin Monnet
Hi,
Several items for bpftool are included in this set: the first three patches
are fixes for bpftool itself and bash completion, while the last two
slightly improve the information obtained when dumping programs or maps, on
Daniel's suggestion. Please refer to individual commit logs for more
details.

Quentin Monnet (5):
  tools: bpftool: use "/proc/self/" i.o. crafting links with getpid()
  tools: bpftool: fix bash completion for bpftool prog (attach|detach)
  tools: bpftool: fix bash completion for new map types (queue and
stack)
  tools: bpftool: mark offloaded programs more explicitly in plain
output
  tools: bpftool: add owner_prog_type and owner_jited to bpftool output

 tools/bpf/bpftool/bash-completion/bpftool | 75 ---
 tools/bpf/bpftool/common.c|  7 ++-
 tools/bpf/bpftool/jit_disasm.c| 11 +
 tools/bpf/bpftool/main.h  | 26 +++
 tools/bpf/bpftool/map.c   | 50 -
 tools/bpf/bpftool/prog.c  | 27 +--
 6 files changed, 130 insertions(+), 66 deletions(-)

-- 
2.7.4



[PATCH bpf-next 1/5] tools: bpftool: use "/proc/self/" i.o. crafting links with getpid()

2018-11-30 Thread Quentin Monnet
The getpid() function is called in a couple of places in bpftool to
craft links of the shape "/proc//...". Instead, it is possible to
use the "/proc/self/" shortcut, which makes things a bit easier, in
particular in jit_disasm.c.

Do the replacement, and remove the includes of  from the
relevant files, now we do not use getpid() anymore.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/common.c |  5 ++---
 tools/bpf/bpftool/jit_disasm.c | 11 +--
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 4e217d57118e..4349b6683ca8 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -48,7 +48,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
@@ -276,7 +275,7 @@ int get_fd_type(int fd)
char buf[512];
ssize_t n;
 
-   snprintf(path, sizeof(path), "/proc/%d/fd/%d", getpid(), fd);
+   snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
 
n = readlink(path, buf, sizeof(buf));
if (n < 0) {
@@ -304,7 +303,7 @@ char *get_fdinfo(int fd, const char *key)
ssize_t n;
FILE *fdi;
 
-   snprintf(path, sizeof(path), "/proc/%d/fdinfo/%d", getpid(), fd);
+   snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", fd);
 
fdi = fopen(path, "r");
if (!fdi) {
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index b2ed5ee1af5f..545a92471c33 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -28,20 +27,12 @@
 
 static void get_exec_path(char *tpath, size_t size)
 {
+   const char *path = "/proc/self/exe";
ssize_t len;
-   char *path;
-
-   snprintf(tpath, size, "/proc/%d/exe", (int) getpid());
-   tpath[size - 1] = 0;
-
-   path = strdup(tpath);
-   assert(path);
 
len = readlink(path, tpath, size - 1);
assert(len > 0);
tpath[len] = 0;
-
-   free(path);
 }
 
 static int oper_count;
-- 
2.7.4



[PATCH bpf-next 3/5] tools: bpftool: fix bash completion for new map types (queue and stack)

2018-11-30 Thread Quentin Monnet
Commit 197c2dac74e4 ("bpf: Add BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK
to bpftool-map") added support for queue and stack eBPF map types in
bpftool map handling. Let's update the bash completion accordingly.

Fixes: 197c2dac74e4 ("bpf: Add BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK to 
bpftool-map")
Signed-off-by: Quentin Monnet 
---
 tools/bpf/bpftool/bash-completion/bpftool | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index b7e6c4f25ad1..9a60080f085f 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -436,7 +436,7 @@ _bpftool()
 lru_percpu_hash lpm_trie array_of_maps \
 hash_of_maps devmap sockmap cpumap xskmap \
 sockhash cgroup_storage reuseport_sockarray \
-percpu_cgroup_storage' -- \
+percpu_cgroup_storage queue stack' -- \
"$cur" ) )
 return 0
 ;;
-- 
2.7.4



[PATCH bpf-next 5/5] tools: bpftool: add owner_prog_type and owner_jited to bpftool output

2018-11-30 Thread Quentin Monnet
For prog array maps, the type of the owner program, and the JIT-ed state
of that program, are available from the file descriptor information
under /proc. Add them to "bpftool map show" output. Example output:

# bpftool map show
158225: prog_array  name jmp_table  flags 0x0
key 4B  value 4B  max_entries 8  memlock 4096B
owner_prog_type flow_dissector  owner jited
# bpftool --json --pretty map show
[{
"id": 1337,
"type": "prog_array",
"name": "jmp_table",
"flags": 0,
"bytes_key": 4,
"bytes_value": 4,
"max_entries": 8,
"bytes_memlock": 4096,
"owner_prog_type": "flow_dissector",
"owner_jited": true
}
]

As we move the table used for associating names to program types,
complete it with the missing types (lwt_seg6local and sk_reuseport).
Also add missing types to the help message for "bpftool prog"
(sk_reuseport and flow_dissector).

Suggested-by: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.h | 26 +
 tools/bpf/bpftool/map.c  | 50 ++--
 tools/bpf/bpftool/prog.c | 25 +---
 3 files changed, 75 insertions(+), 26 deletions(-)

diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 7431669fae0a..2761981669c8 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -78,6 +78,32 @@
 #define HELP_SPEC_MAP  \
"MAP := { id MAP_ID | pinned FILE }"
 
+static const char * const prog_type_name[] = {
+   [BPF_PROG_TYPE_UNSPEC]  = "unspec",
+   [BPF_PROG_TYPE_SOCKET_FILTER]   = "socket_filter",
+   [BPF_PROG_TYPE_KPROBE]  = "kprobe",
+   [BPF_PROG_TYPE_SCHED_CLS]   = "sched_cls",
+   [BPF_PROG_TYPE_SCHED_ACT]   = "sched_act",
+   [BPF_PROG_TYPE_TRACEPOINT]  = "tracepoint",
+   [BPF_PROG_TYPE_XDP] = "xdp",
+   [BPF_PROG_TYPE_PERF_EVENT]  = "perf_event",
+   [BPF_PROG_TYPE_CGROUP_SKB]  = "cgroup_skb",
+   [BPF_PROG_TYPE_CGROUP_SOCK] = "cgroup_sock",
+   [BPF_PROG_TYPE_LWT_IN]  = "lwt_in",
+   [BPF_PROG_TYPE_LWT_OUT] = "lwt_out",
+   [BPF_PROG_TYPE_LWT_XMIT]= "lwt_xmit",
+   [BPF_PROG_TYPE_SOCK_OPS]= "sock_ops",
+   [BPF_PROG_TYPE_SK_SKB]  = "sk_skb",
+   [BPF_PROG_TYPE_CGROUP_DEVICE]   = "cgroup_device",
+   [BPF_PROG_TYPE_SK_MSG]  = "sk_msg",
+   [BPF_PROG_TYPE_RAW_TRACEPOINT]  = "raw_tracepoint",
+   [BPF_PROG_TYPE_CGROUP_SOCK_ADDR]= "cgroup_sock_addr",
+   [BPF_PROG_TYPE_LWT_SEG6LOCAL]   = "lwt_seg6local",
+   [BPF_PROG_TYPE_LIRC_MODE2]  = "lirc_mode2",
+   [BPF_PROG_TYPE_SK_REUSEPORT]= "sk_reuseport",
+   [BPF_PROG_TYPE_FLOW_DISSECTOR]  = "flow_dissector",
+};
+
 enum bpf_obj_type {
BPF_OBJ_UNKNOWN,
BPF_OBJ_PROG,
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 3850f8d65703..8469ea6cf1c8 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -487,7 +487,6 @@ static int show_map_close_json(int fd, struct bpf_map_info 
*info)
char *memlock;
 
memlock = get_fdinfo(fd, "memlock");
-   close(fd);
 
jsonw_start_object(json_wtr);
 
@@ -514,6 +513,30 @@ static int show_map_close_json(int fd, struct bpf_map_info 
*info)
jsonw_int_field(json_wtr, "bytes_memlock", atoi(memlock));
free(memlock);
 
+   if (info->type == BPF_MAP_TYPE_PROG_ARRAY) {
+   char *owner_prog_type = get_fdinfo(fd, "owner_prog_type");
+   char *owner_jited = get_fdinfo(fd, "owner_jited");
+
+   if (owner_prog_type) {
+   unsigned int prog_type = atoi(owner_prog_type);
+
+   if (prog_type < ARRAY_SIZE(prog_type_name))
+   jsonw_string_field(json_wtr, "owner_prog_type",
+  prog_type_name[prog_type]);
+   else
+   jsonw_uint_field(json_wtr, "owner_prog_type",
+prog_type);
+   }
+ 

[PATCH bpf-next 4/5] tools: bpftool: mark offloaded programs more explicitly in plain output

2018-11-30 Thread Quentin Monnet
In bpftool (plain) output for "bpftool prog show" or "bpftool map show",
an offloaded BPF object is simply denoted with "dev ifname", which is
not really explicit. Change it with something that clearly shows the
program is offloaded.

While at it also add an additional space, as done between other
information fields.

Example output, before:

# bpftool prog show
1337: xdp  tag a04f5eef06a7f555 dev foo
loaded_at 2018-10-19T16:40:36+0100  uid 0
xlated 16B  not jited  memlock 4096B

After:

# bpftool prog show
1337: xdp  tag a04f5eef06a7f555  offloaded_to foo
loaded_at 2018-10-19T16:40:36+0100  uid 0
xlated 16B  not jited  memlock 4096B

Suggested-by: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/bpf/bpftool/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 4349b6683ca8..172d3761d9ab 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -604,7 +604,7 @@ void print_dev_plain(__u32 ifindex, __u64 ns_dev, __u64 
ns_inode)
if (!ifindex)
return;
 
-   printf(" dev ");
+   printf("  offloaded_to ");
if (ifindex_to_name_ns(ifindex, ns_dev, ns_inode, name))
printf("%s", name);
else
-- 
2.7.4



Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name

2018-11-26 Thread Quentin Monnet
2018-11-26 11:08 UTC-0800 ~ Vlad Dumitrescu 
> On Fri, Nov 23, 2018 at 2:51 AM Quentin Monnet
>  wrote:
>>
>> 2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev 
>>> On 11/21, Quentin Monnet wrote:
>>>> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev 
>>>>> On 11/20, Alexei Starovoitov wrote:
>>>>>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
>>>>>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
>>>>>>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
>>>>>>>>> On 11/20, Alexei Starovoitov wrote:
>>>>>>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
>>>>>>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation 
>>>>>>>>>>> without
>>>>>>>>>>> the name") fixed this issue for maps, let's do the same for 
>>>>>>>>>>> programs.]
>>>>>>>>>>>
>>>>>>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
>>>>>>>>>>> to specify BPF obj name"), libbpf unconditionally sets 
>>>>>>>>>>> bpf_attr->name
>>>>>>>>>>> for programs. Pre v4.14 kernels don't know about programs names and
>>>>>>>>>>> return an error about unexpected non-zero data. Retry sys_bpf 
>>>>>>>>>>> without
>>>>>>>>>>> a program name to cover older kernels.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stanislav Fomichev 
>>>>>>>>>>> ---
>>>>>>>>>>>   tools/lib/bpf/bpf.c | 10 ++
>>>>>>>>>>>   1 file changed, 10 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>>>>>>>>>> index 961e1b9fc592..cbe9d757c646 100644
>>>>>>>>>>> --- a/tools/lib/bpf/bpf.c
>>>>>>>>>>> +++ b/tools/lib/bpf/bpf.c
>>>>>>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct 
>>>>>>>>>>> bpf_load_program_attr *load_attr,
>>>>>>>>>>>   if (fd >= 0 || !log_buf || !log_buf_sz)
>>>>>>>>>>>   return fd;
>>>>>>>>>>>
>>>>>>>>>>> + if (fd < 0 && errno == E2BIG && load_attr->name) {
>>>>>>>>>>> + /* Retry the same syscall, but without the name.
>>>>>>>>>>> +  * Pre v4.14 kernels don't support prog names.
>>>>>>>>>>> +  */
>>>>>>>>>>
>>>>>>>>>> I'm afraid that will put unnecessary stress on the kernel.
>>>>>>>>>> This check needs to be tighter.
>>>>>>>>>> Like E2BIG and anything in the log_buf probably means that
>>>>>>>>>> E2BIG came from the verifier and nothing to do with prog_name.
>>>>>>>>>> Asking kernel to repeat is an unnecessary work.
>>>>>>>>>>
>>>>>>>>>> In general we need to think beyond this single prog_name field.
>>>>>>>>>> There are bunch of other fields in bpf_load_program_xattr() and 
>>>>>>>>>> older kernels
>>>>>>>>>> won't support them. Are we going to zero them out one by one
>>>>>>>>>> and retry? I don't think that would be practical.
>>>>>>>>> I general, we don't want to zero anything out. However,
>>>>>>>>> for this particular problem the rationale is the following:
>>>>>>>>> In commit 88cda1c9da02 we started unconditionally setting 
>>>>>>>>> {prog,map}->name
>>>>>>>>> from the 'higher' libbpfc layer which breaks users on the older 
>>>>>>>>> kernels.
>>>>>>>>>
>>>>>>>>>> Also libbpf silently ignoring prog_name is not great for debugging.
>>>>>>&

Re: [PATCH bpf-next 1/3] bpf: helper to pop data from messages

2018-11-26 Thread Quentin Monnet
2018-11-26 02:05 UTC+0100 ~ Daniel Borkmann 
> On 11/23/2018 02:38 AM, John Fastabend wrote:
>> This adds a BPF SK_MSG program helper so that we can pop data from a
>> msg. We use this to pop metadata from a previous push data call.
>>
>> Signed-off-by: John Fastabend 
>> ---
>>  include/uapi/linux/bpf.h |  13 +++-
>>  net/core/filter.c| 169 
>> +++
>>  net/ipv4/tcp_bpf.c   |  14 +++-
>>  3 files changed, 192 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index c1554aa..64681f8 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2268,6 +2268,16 @@ union bpf_attr {
>>   *
>>   *  Return
>>   *  0 on success, or a negative error in case of failure.
>> + *
>> + * int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 pop, u64 
>> flags)
>> + *   Description
>> + *  Will remove 'pop' bytes from a msg starting at byte 'start'.
>> + *  This result in ENOMEM errors under certain situations where
>> + *  a allocation and copy are required due to a full ring buffer.
>> + *  However, the helper will try to avoid doing the allocation
>> + *  if possible. Other errors can occur if input parameters are
>> + *  invalid either do to start byte not being valid part of msg
>> + *  payload and/or pop value being to large.
>>   */

Hi John,

If you respin could you please update the helper documentation to use
RST syntax for argument and constant names (*pop* instead of 'pop',
*msg*, *start*, *flags*, **ENOMEM**), and document the return value from
the helper?

Thanks a lot,
Quentin


Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name

2018-11-23 Thread Quentin Monnet

2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev 

On 11/21, Quentin Monnet wrote:

2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev 

On 11/20, Alexei Starovoitov wrote:

On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:

On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:

On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:

On 11/20, Alexei Starovoitov wrote:

On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:

[Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
the name") fixed this issue for maps, let's do the same for programs.]

Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
for programs. Pre v4.14 kernels don't know about programs names and
return an error about unexpected non-zero data. Retry sys_bpf without
a program name to cover older kernels.

Signed-off-by: Stanislav Fomichev 
---
  tools/lib/bpf/bpf.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 961e1b9fc592..cbe9d757c646 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct 
bpf_load_program_attr *load_attr,
if (fd >= 0 || !log_buf || !log_buf_sz)
return fd;
  
+	if (fd < 0 && errno == E2BIG && load_attr->name) {

+   /* Retry the same syscall, but without the name.
+* Pre v4.14 kernels don't support prog names.
+*/


I'm afraid that will put unnecessary stress on the kernel.
This check needs to be tighter.
Like E2BIG and anything in the log_buf probably means that
E2BIG came from the verifier and nothing to do with prog_name.
Asking kernel to repeat is an unnecessary work.

In general we need to think beyond this single prog_name field.
There are bunch of other fields in bpf_load_program_xattr() and older kernels
won't support them. Are we going to zero them out one by one
and retry? I don't think that would be practical.

I general, we don't want to zero anything out. However,
for this particular problem the rationale is the following:
In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
from the 'higher' libbpfc layer which breaks users on the older kernels.


Also libbpf silently ignoring prog_name is not great for debugging.
A warning is needed.
But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
wrappers.
Imo such "old kernel -> lets retry" feature should probably be done
at lib/bpf/libbpf.c level. inside load_program().

For maps bpftools calls bpf_create_map_xattr directly, that's why
for maps I did the retry on the lower level (and why for programs I initially
thought about doing the same). However, in this case maybe asking
user to omit 'name' argument might be a better option.

For program names, I agree, we might think about doing it on the higher
level (although I'm not sure whether we want to have different API
expectations, i.e. bpf_create_map_xattr ignoring the name and
bpf_load_program_xattr not ignoring the name).

So given that rationale above, what do you think is the best way to
move forward?
1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
2. Move this retry logic into load_program and have different handling
for bpf_create_map_xattr vs bpf_load_program_xattr ?
3. Do 2 and move the retry check for maps from bpf_create_map_xattr
into bpf_object__create_maps ?

(I'm slightly leaning towards #3)


me too. I think it's cleaner for maps to do it in
bpf_object__create_maps().
Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
Whereas 'smart bits' would go into libbpf.c


Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
which would figure this out _once_ upon start with a few things to probe for
availability in the underlying kernel for maps and programs? E.g. programs
it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
things like prog name support etc. Given underlying kernel doesn't change, we
would only try this once and it doesn't require fallback every time.


+1. great idea!

Sounds good, let me try to do it.

It sounds more like a recent LPC proposal/idea to have some sys_bpf option
to query BPF features. This new bpf_object__probe_caps can probably query
that in the future if we eventually add support for it.



Hi,

LPC proposal indeed. I've been working on implementing this kind of
probes in bpftool. I don't probe name support for now (but I can
certainly add it), but I detect supported program types, map types,
header functions, and a couple of other parameters. The idea (initially
from Daniel) was to dump "#define" declarations that could later be
included in a header file and used for a BPF project (or alternatively,
JSON output).

Oh, ni

Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name

2018-11-20 Thread Quentin Monnet
2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev 
> On 11/20, Alexei Starovoitov wrote:
>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
 On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> On 11/20, Alexei Starovoitov wrote:
>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
>>> the name") fixed this issue for maps, let's do the same for programs.]
>>>
>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
>>> for programs. Pre v4.14 kernels don't know about programs names and
>>> return an error about unexpected non-zero data. Retry sys_bpf without
>>> a program name to cover older kernels.
>>>
>>> Signed-off-by: Stanislav Fomichev 
>>> ---
>>>  tools/lib/bpf/bpf.c | 10 ++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>> index 961e1b9fc592..cbe9d757c646 100644
>>> --- a/tools/lib/bpf/bpf.c
>>> +++ b/tools/lib/bpf/bpf.c
>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct 
>>> bpf_load_program_attr *load_attr,
>>> if (fd >= 0 || !log_buf || !log_buf_sz)
>>> return fd;
>>>  
>>> +   if (fd < 0 && errno == E2BIG && load_attr->name) {
>>> +   /* Retry the same syscall, but without the name.
>>> +* Pre v4.14 kernels don't support prog names.
>>> +*/
>>
>> I'm afraid that will put unnecessary stress on the kernel.
>> This check needs to be tighter.
>> Like E2BIG and anything in the log_buf probably means that
>> E2BIG came from the verifier and nothing to do with prog_name.
>> Asking kernel to repeat is an unnecessary work.
>>
>> In general we need to think beyond this single prog_name field.
>> There are bunch of other fields in bpf_load_program_xattr() and older 
>> kernels
>> won't support them. Are we going to zero them out one by one
>> and retry? I don't think that would be practical.
> I general, we don't want to zero anything out. However,
> for this particular problem the rationale is the following:
> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
> from the 'higher' libbpfc layer which breaks users on the older kernels.
>
>> Also libbpf silently ignoring prog_name is not great for debugging.
>> A warning is needed.
>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
>> wrappers.
>> Imo such "old kernel -> lets retry" feature should probably be done
>> at lib/bpf/libbpf.c level. inside load_program().
> For maps bpftools calls bpf_create_map_xattr directly, that's why
> for maps I did the retry on the lower level (and why for programs I 
> initially
> thought about doing the same). However, in this case maybe asking
> user to omit 'name' argument might be a better option.
>
> For program names, I agree, we might think about doing it on the higher
> level (although I'm not sure whether we want to have different API
> expectations, i.e. bpf_create_map_xattr ignoring the name and
> bpf_load_program_xattr not ignoring the name).
>
> So given that rationale above, what do you think is the best way to
> move forward?
> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
> 2. Move this retry logic into load_program and have different handling
>for bpf_create_map_xattr vs bpf_load_program_xattr ?
> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>into bpf_object__create_maps ?
>
> (I'm slightly leaning towards #3)

 me too. I think it's cleaner for maps to do it in
 bpf_object__create_maps().
 Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
 Whereas 'smart bits' would go into libbpf.c
>>>
>>> Can't we create in bpf_object__load() a small helper 
>>> bpf_object__probe_caps()
>>> which would figure this out _once_ upon start with a few things to probe for
>>> availability in the underlying kernel for maps and programs? E.g. programs
>>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
>>> things like prog name support etc. Given underlying kernel doesn't change, 
>>> we
>>> would only try this once and it doesn't require fallback every time.
>>
>> +1. great idea!
> Sounds good, let me try to do it.
> 
> It sounds more like a recent LPC proposal/idea to have some sys_bpf option
> to query BPF features. This new bpf_object__probe_caps can probably query
> that in the future if we eventually add support for it.
> 

Hi,

LPC proposal indeed. I've 

[PATCH iproute2] bpf: initialise map symbol before retrieving and comparing its type

2018-11-19 Thread Quentin Monnet
In order to compare BPF map symbol type correctly in regard to the
latest LLVM, commit 7a04dd84a7f9 ("bpf: check map symbol type properly
with newer llvm compiler") compares map symbol type to both NOTYPE and
OBJECT. To do so, it first retrieves the type from "sym.st_info" and
stores it into a temporary variable.

However, the type is collected from the symbol "sym" before this latter
symbol is actually updated. gelf_getsym() is called after that and
updates "sym", and when comparison with OBJECT or NOTYPE happens it is
done on the type of the symbol collected in the previous passage of the
loop (or on an uninitialised symbol on the first passage). This may
eventually break map collection from the ELF file.

Fix this by assigning the type to the temporary variable only after the
call to gelf_getsym().

Fixes: 7a04dd84a7f9 ("bpf: check map symbol type properly with newer llvm 
compiler")
Reported-by: Ron Philip 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
---
 lib/bpf.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index 45f279fa4a41..6aff8f7bad7f 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -1758,11 +1758,12 @@ static const char *bpf_map_fetch_name(struct 
bpf_elf_ctx *ctx, int which)
int i;
 
for (i = 0; i < ctx->sym_num; i++) {
-   int type = GELF_ST_TYPE(sym.st_info);
+   int type;
 
if (gelf_getsym(ctx->sym_tab, i, ) != )
continue;
 
+   type = GELF_ST_TYPE(sym.st_info);
if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
(type != STT_NOTYPE && type != STT_OBJECT) ||
sym.st_shndx != ctx->sec_maps ||
@@ -1851,11 +1852,12 @@ static int bpf_map_num_sym(struct bpf_elf_ctx *ctx)
GElf_Sym sym;
 
for (i = 0; i < ctx->sym_num; i++) {
-   int type = GELF_ST_TYPE(sym.st_info);
+   int type;
 
if (gelf_getsym(ctx->sym_tab, i, ) != )
continue;
 
+   type = GELF_ST_TYPE(sym.st_info);
if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
(type != STT_NOTYPE && type != STT_OBJECT) ||
sym.st_shndx != ctx->sec_maps)
@@ -1931,10 +1933,12 @@ static int bpf_map_verify_all_offs(struct bpf_elf_ctx 
*ctx, int end)
 * the table again.
 */
for (i = 0; i < ctx->sym_num; i++) {
-   int type = GELF_ST_TYPE(sym.st_info);
+   int type;
 
if (gelf_getsym(ctx->sym_tab, i, ) != )
continue;
+
+   type = GELF_ST_TYPE(sym.st_info);
if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
(type != STT_NOTYPE && type != STT_OBJECT) ||
sym.st_shndx != ctx->sec_maps)
-- 
2.7.4



Re: [PATCH bpf-next v2] bpftool: make libbfd optional

2018-11-12 Thread Quentin Monnet
2018-11-12 14:02 UTC-0800 ~ Jakub Kicinski 
> On Mon, 12 Nov 2018 13:44:10 -0800, Stanislav Fomichev wrote:
>> Make it possible to build bpftool without libbfd. libbfd and libopcodes are
>> typically provided in dev/dbg packages (binutils-dev in debian) which we
>> usually don't have installed on the fleet machines and we'd like a way to 
>> have
>> bpftool version that works without installing any additional packages.
>> This excludes support for disassembling jit-ted code and prints an error if
>> the user tries to use these features.
>>
>> Tested by:
>> cat > FEATURES_DUMP.bpftool <> feature-libbfd=0
>> feature-disassembler-four-args=1
>> feature-reallocarray=0
>> feature-libelf=1
>> feature-libelf-mmap=1
>> feature-bpf=1
>> EOF
>> FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
>> ldd bpftool | grep libbfd
>>
>> Signed-off-by: Stanislav Fomichev 
> 
> Seems reasonable, thanks!
> 
> Acked-by: Jakub Kicinski 
> 

Thanks Stanislav!

There is a problem with this patch on some distributions, Ubuntu at least.

Feature detection for libbfd has been used for perf before being also
used with bpftool. Since commit 280e7c48c3b8 the feature needs libz and
libiberty to be present on the system, otherwise the feature would not
compile (and be detected) on OpenSuse.

On Ubuntu, libiberty is not needed (libbfd might be statically linked
against it, if I remember correctly?), which means that we are able to
build bpftool as long as binutils-dev has been installed, even if
libiberty-dev has not been installed. The BFD feature, in that case,
will appear as “undetected”. It is a bug. But since the Makefile does
not stop compilation in that case (another bug), in the end we're good.

With your patch, the problem is that libbpf detection will fail on
Ubuntu if libiberty-dev is not present, even though all the necessary
libraries for using the JIT disassembler are available. And in that case
it _will_ make a difference, since the Makefile will no more compile the
libbfd-related bits.

So I'm not against the idea, but we have to fix libbfd detection first.

Thanks,
Quentin


Re: [PATCH v5 bpf-next 0/7] bpftool: support loading flow dissector

2018-11-09 Thread Quentin Monnet
2018-11-09 08:21 UTC-0800 ~ Stanislav Fomichev 
> v5 changes:
> * FILE -> PATH for load/loadall (can be either file or directory now)
> * simpler implementation for __bpf_program__pin_name
> * removed p_err for REQ_ARGS checks
> * parse_atach_detach_args -> parse_attach_detach_args
> * for -> while in bpf_object__pin_{programs,maps} recovery
> 
> v4 changes:
> * addressed another round of comments/style issues from Jakub Kicinski &
>   Quentin Monnet (thanks!)
> * implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
>   used them in bpf_program__pin
> * added new pin_name to bpf_program so bpf_program__pin
>   works with sections that contain '/'
> * moved *loadall* command implementation into a separate patch
> * added patch that implements *pinmaps* to pin maps when doing
>   load/loadall
> 
> v3 changes:
> * (maybe) better cleanup for partial failure in bpf_object__pin
> * added special case in bpf_program__pin for programs with single
>   instances
> 
> v2 changes:
> * addressed comments/style issues from Jakub Kicinski & Quentin Monnet
> * removed logic that populates jump table
> * added cleanup for partial failure in bpf_object__pin
> 
> This patch series adds support for loading and attaching flow dissector
> programs from the bpftool:
> 
> * first patch fixes flow dissector section name in the selftests (so
>   libbpf auto-detection works)
> * second patch adds proper cleanup to bpf_object__pin, parts of which are now
>   being used to attach all flow dissector progs/maps
> * third patch adds special case in bpf_program__pin for programs with
>   single instances (we don't create /0 pin anymore, just )
> * forth patch adds pin_name to the bpf_program struct
>   which is now used as a pin name in bpf_program__pin et al
> * fifth patch adds *loadall* command that pins all programs, not just
>   the first one
> * sixth patch adds *pinmaps* argument to load/loadall to let users pin
>   all maps of the obj file
> * seventh patch adds actual flow_dissector support to the bpftool and
>   an example

The series look good to me, thanks!

For the bpftool parts:
Acked-by: Quentin Monnet 



Re: [PATCH v4 bpf-next 5/7] bpftool: add loadall command

2018-11-09 Thread Quentin Monnet
2018-11-08 16:22 UTC-0800 ~ Stanislav Fomichev 
> From: Stanislav Fomichev 
> 
> This patch adds new *loadall* command which slightly differs from the
> existing *load*. *load* command loads all programs from the obj file,
> but pins only the first programs. *loadall* pins all programs from the
> obj file under specified directory.
> 
> The intended usecase is flow_dissector, where we want to load a bunch
> of progs, pin them all and after that construct a jump table.
> 
> Signed-off-by: Stanislav Fomichev 
> ---
>  .../bpftool/Documentation/bpftool-prog.rst| 14 +++-
>  tools/bpf/bpftool/bash-completion/bpftool |  4 +-
>  tools/bpf/bpftool/common.c| 31 
>  tools/bpf/bpftool/main.h  |  1 +
>  tools/bpf/bpftool/prog.c  | 74 ++-
>  5 files changed, 82 insertions(+), 42 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
> b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> index ac4e904b10fb..d943d9b67a1d 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst

> @@ -24,7 +25,7 @@ MAP COMMANDS
>  |**bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
> | **visual**}]
>  |**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
> **opcodes**}]
>  |**bpftool** **prog pin** *PROG* *FILE*
> -|**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
> {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> +|**bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
> [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>  |   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
>  |   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
>  |**bpftool** **prog help**
> @@ -79,8 +80,13 @@ DESCRIPTION
> contain a dot character ('.'), which is reserved for future
> extensions of *bpffs*.
>  
> - **bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** 
> *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> -   Load bpf program from binary *OBJ* and pin as *FILE*.
> + **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
> [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> +   Load bpf program(s) from binary *OBJ* and pin as *FILE*.
> +   Both **bpftool prog load** and **bpftool prog loadall** load
> +   all maps and programs from the *OBJ* and differ only in
> +   pinning. **load** pins only the first program from the *OBJ*
> +   as *FILE*. **loadall** pins all programs from the *OBJ*
> +   under *FILE* directory.
> **type** is optional, if not specified program type will be
> inferred from section names.
> By default bpftool will create new maps as declared in the ELF

Thanks a lot for all the changes! The series looks really good to me
now. The last nit I might have is that we could maybe replace "FILE"
with "PATH" (as it can now be a directory), in the doc an below. No need
to respin just for this, though.

> @@ -1035,7 +1067,8 @@ static int do_help(int argc, char **argv)
>   "   %s %s dump xlated PROG [{ file FILE | opcodes | visual 
> }]\n"
>   "   %s %s dump jited  PROG [{ file FILE | opcodes }]\n"
>   "   %s %s pin   PROG FILE\n"
> - "   %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
> + "   %s %s { load | loadall } OBJ  FILE \\\n"
> + " [type TYPE] [dev NAME] \\\n"
>   " [map { idx IDX | name NAME } MAP]\n"
>   "   %s %s attach PROG ATTACH_TYPE MAP\n"
>   "   %s %s detach PROG ATTACH_TYPE MAP\n"


[PATCH bpf-next 9/9] bpf: do not pass netdev to translate() and prepare() offload callbacks

2018-11-09 Thread Quentin Monnet
The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.

Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 9 -
 drivers/net/netdevsim/bpf.c  | 7 +++
 include/linux/bpf.h  | 4 ++--
 kernel/bpf/offload.c | 4 ++--
 4 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index e6b26d2f651d..f0283854fade 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -182,10 +182,9 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
kfree(nfp_prog);
 }
 
-static int
-nfp_bpf_verifier_prep(struct net_device *netdev, struct bpf_prog *prog)
+static int nfp_bpf_verifier_prep(struct bpf_prog *prog)
 {
-   struct nfp_net *nn = netdev_priv(netdev);
+   struct nfp_net *nn = netdev_priv(prog->aux->offload->netdev);
struct nfp_app *app = nn->app;
struct nfp_prog *nfp_prog;
int ret;
@@ -213,10 +212,10 @@ nfp_bpf_verifier_prep(struct net_device *netdev, struct 
bpf_prog *prog)
return ret;
 }
 
-static int nfp_bpf_translate(struct net_device *netdev, struct bpf_prog *prog)
+static int nfp_bpf_translate(struct bpf_prog *prog)
 {
+   struct nfp_net *nn = netdev_priv(prog->aux->offload->netdev);
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
-   struct nfp_net *nn = netdev_priv(netdev);
unsigned int max_instr;
int err;
 
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 560bdaf1c98b..6a5b7bd9a1f9 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -258,10 +258,9 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, 
struct bpf_prog *prog)
return 0;
 }
 
-static int
-nsim_bpf_verifier_prep(struct net_device *dev, struct bpf_prog *prog)
+static int nsim_bpf_verifier_prep(struct bpf_prog *prog)
 {
-   struct netdevsim *ns = netdev_priv(dev);
+   struct netdevsim *ns = netdev_priv(prog->aux->offload->netdev);
 
if (!ns->bpf_bind_accept)
return -EOPNOTSUPP;
@@ -269,7 +268,7 @@ nsim_bpf_verifier_prep(struct net_device *dev, struct 
bpf_prog *prog)
return nsim_bpf_create_prog(ns, prog);
 }
 
-static int nsim_bpf_translate(struct net_device *dev, struct bpf_prog *prog)
+static int nsim_bpf_translate(struct bpf_prog *prog)
 {
struct nsim_bpf_bound_prog *state = prog->aux->offload->dev_priv;
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 888111350d0e..987815152629 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -268,8 +268,8 @@ struct bpf_prog_offload_ops {
int (*insn_hook)(struct bpf_verifier_env *env,
 int insn_idx, int prev_insn_idx);
int (*finalize)(struct bpf_verifier_env *env);
-   int (*prepare)(struct net_device *netdev, struct bpf_prog *prog);
-   int (*translate)(struct net_device *netdev, struct bpf_prog *prog);
+   int (*prepare)(struct bpf_prog *prog);
+   int (*translate)(struct bpf_prog *prog);
void (*destroy)(struct bpf_prog *prog);
 };
 
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 397d206e184b..52c5617e3716 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -131,7 +131,7 @@ int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
down_read(_devs_lock);
offload = prog->aux->offload;
if (offload)
-   ret = offload->offdev->ops->prepare(offload->netdev, prog);
+   ret = offload->offdev->ops->prepare(prog);
offload->dev_state = !ret;
up_read(_devs_lock);
 
@@ -203,7 +203,7 @@ static int bpf_prog_offload_translate(struct bpf_prog *prog)
down_read(_devs_lock);
offload = prog->aux->offload;
if (offload)
-   ret = offload->offdev->ops->translate(offload->netdev, prog);
+   ret = offload->offdev->ops->translate(prog);
up_read(_devs_lock);
 
return ret;
-- 
2.17.1



[PATCH bpf-next 5/9] bpf: call verifier_prep from its callback in struct bpf_offload_dev

2018-11-09 Thread Quentin Monnet
In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.

Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/bpf/offload.c  | 11 +++
 drivers/net/netdevsim/bpf.c   | 32 ++-
 include/linux/bpf.h   |  2 +-
 include/linux/netdevice.h |  6 
 kernel/bpf/offload.c  | 22 ++---
 5 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 2fca996a7e77..16a3a9c55852 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -188,10 +188,11 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
 }
 
 static int
-nfp_bpf_verifier_prep(struct nfp_app *app, struct nfp_net *nn,
- struct netdev_bpf *bpf)
+nfp_bpf_verifier_prep(struct net_device *netdev, struct bpf_verifier_env *env)
 {
-   struct bpf_prog *prog = bpf->verifier.prog;
+   struct nfp_net *nn = netdev_priv(netdev);
+   struct bpf_prog *prog = env->prog;
+   struct nfp_app *app = nn->app;
struct nfp_prog *nfp_prog;
int ret;
 
@@ -209,7 +210,6 @@ nfp_bpf_verifier_prep(struct nfp_app *app, struct nfp_net 
*nn,
goto err_free;
 
nfp_prog->verifier_meta = nfp_prog_first_meta(nfp_prog);
-   bpf->verifier.ops = _bpf_dev_ops;
 
return 0;
 
@@ -422,8 +422,6 @@ nfp_bpf_map_free(struct nfp_app_bpf *bpf, struct 
bpf_offloaded_map *offmap)
 int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf 
*bpf)
 {
switch (bpf->command) {
-   case BPF_OFFLOAD_VERIFIER_PREP:
-   return nfp_bpf_verifier_prep(app, nn, bpf);
case BPF_OFFLOAD_TRANSLATE:
return nfp_bpf_translate(nn, bpf->offload.prog);
case BPF_OFFLOAD_DESTROY:
@@ -605,4 +603,5 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog 
*prog,
 const struct bpf_prog_offload_ops nfp_bpf_dev_ops = {
.insn_hook  = nfp_verify_insn,
.finalize   = nfp_bpf_finalize,
+   .prepare= nfp_bpf_verifier_prep,
 };
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 135aee864162..d045b7d666d9 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -91,11 +91,6 @@ static int nsim_bpf_finalize(struct bpf_verifier_env *env)
return 0;
 }
 
-static const struct bpf_prog_offload_ops nsim_bpf_dev_ops = {
-   .insn_hook  = nsim_bpf_verify_insn,
-   .finalize   = nsim_bpf_finalize,
-};
-
 static bool nsim_xdp_offload_active(struct netdevsim *ns)
 {
return ns->xdp_hw.prog;
@@ -263,6 +258,17 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, 
struct bpf_prog *prog)
return 0;
 }
 
+static int
+nsim_bpf_verifier_prep(struct net_device *dev, struct bpf_verifier_env *env)
+{
+   struct netdevsim *ns = netdev_priv(dev);
+
+   if (!ns->bpf_bind_accept)
+   return -EOPNOTSUPP;
+
+   return nsim_bpf_create_prog(ns, env->prog);
+}
+
 static void nsim_bpf_destroy_prog(struct bpf_prog *prog)
 {
struct nsim_bpf_bound_prog *state;
@@ -275,6 +281,12 @@ static void nsim_bpf_destroy_prog(struct bpf_prog *prog)
kfree(state);
 }
 
+static const struct bpf_prog_offload_ops nsim_bpf_dev_ops = {
+   .insn_hook  = nsim_bpf_verify_insn,
+   .finalize   = nsim_bpf_finalize,
+   .prepare= nsim_bpf_verifier_prep,
+};
+
 static int nsim_setup_prog_checks(struct netdevsim *ns, struct netdev_bpf *bpf)
 {
if (bpf->prog && bpf->prog->aux->offload) {
@@ -539,16 +551,6 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf 
*bpf)
ASSERT_RTNL();
 
switch (bpf->command) {
-   case BPF_OFFLOAD_VERIFIER_PREP:
-   if (!ns->bpf_bind_accept)
-   return -EOPNOTSUPP;
-
-   err = nsim_bpf_create_prog(ns, bpf->verifier.prog);
-   if (err)
-   return err;
-
-   bpf->verifier.ops = _bpf_dev_ops;
-   return 0;
case BPF_OFFLOAD_TRANSLATE:
state = bpf->offload.prog->aux->offload->dev_priv;
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 672714cd904f..f250494a4f56 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -268,6 +268,7 @@ struct bpf_prog_offload_ops {
int (*insn_hook)(struct bpf_verifier_env *env,
 int insn_idx

[PATCH bpf-next 4/9] bpf: call finalize() from its callback in struct bpf_offload_dev

2018-11-09 Thread Quentin Monnet
In a way similar to the change previously brought to the verify_insn
hook, switch to the newly added ops in struct bpf_prog_offload for
calling the functions used to perform final verification steps for
offloaded programs.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 kernel/bpf/offload.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 2cd3c0d0417b..2c88cb4ddfd8 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -183,8 +183,8 @@ int bpf_prog_offload_finalize(struct bpf_verifier_env *env)
down_read(_devs_lock);
offload = env->prog->aux->offload;
if (offload) {
-   if (offload->dev_ops->finalize)
-   ret = offload->dev_ops->finalize(env);
+   if (offload->offdev->ops->finalize)
+   ret = offload->offdev->ops->finalize(env);
else
ret = 0;
}
-- 
2.17.1



[PATCH bpf-next 7/9] bpf: pass destroy() as a callback and remove its ndo_bpf subcommand

2018-11-09 Thread Quentin Monnet
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.

Remove function __bpf_offload_ndo(), which is no longer used.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/bpf/offload.c  |  7 ++
 drivers/net/netdevsim/bpf.c   |  4 +---
 include/linux/bpf.h   |  1 +
 include/linux/netdevice.h |  5 
 kernel/bpf/offload.c  | 24 +--
 5 files changed, 5 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 8653a2189c19..91085cc3c843 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -238,15 +238,13 @@ static int nfp_bpf_translate(struct net_device *netdev, 
struct bpf_prog *prog)
return nfp_map_ptrs_record(nfp_prog->bpf, nfp_prog, prog);
 }
 
-static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
+static void nfp_bpf_destroy(struct bpf_prog *prog)
 {
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
 
kvfree(nfp_prog->prog);
nfp_map_ptrs_forget(nfp_prog->bpf, nfp_prog);
nfp_prog_free(nfp_prog);
-
-   return 0;
 }
 
 /* Atomic engine requires values to be in big endian, we need to byte swap
@@ -418,8 +416,6 @@ nfp_bpf_map_free(struct nfp_app_bpf *bpf, struct 
bpf_offloaded_map *offmap)
 int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf 
*bpf)
 {
switch (bpf->command) {
-   case BPF_OFFLOAD_DESTROY:
-   return nfp_bpf_destroy(nn, bpf->offload.prog);
case BPF_OFFLOAD_MAP_ALLOC:
return nfp_bpf_map_alloc(app->priv, bpf->offmap);
case BPF_OFFLOAD_MAP_FREE:
@@ -599,4 +595,5 @@ const struct bpf_prog_offload_ops nfp_bpf_dev_ops = {
.finalize   = nfp_bpf_finalize,
.prepare= nfp_bpf_verifier_prep,
.translate  = nfp_bpf_translate,
+   .destroy= nfp_bpf_destroy,
 };
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 30c2cd516d1c..33e3d54c3a0a 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -294,6 +294,7 @@ static const struct bpf_prog_offload_ops nsim_bpf_dev_ops = 
{
.finalize   = nsim_bpf_finalize,
.prepare= nsim_bpf_verifier_prep,
.translate  = nsim_bpf_translate,
+   .destroy= nsim_bpf_destroy_prog,
 };
 
 static int nsim_setup_prog_checks(struct netdevsim *ns, struct netdev_bpf *bpf)
@@ -560,9 +561,6 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf)
ASSERT_RTNL();
 
switch (bpf->command) {
-   case BPF_OFFLOAD_DESTROY:
-   nsim_bpf_destroy_prog(bpf->offload.prog);
-   return 0;
case XDP_QUERY_PROG:
return xdp_attachment_query(>xdp, bpf);
case XDP_QUERY_PROG_HW:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d1eb3c8a3fa9..867d2801db64 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -270,6 +270,7 @@ struct bpf_prog_offload_ops {
int (*finalize)(struct bpf_verifier_env *env);
int (*prepare)(struct net_device *netdev, struct bpf_verifier_env *env);
int (*translate)(struct net_device *netdev, struct bpf_prog *prog);
+   void (*destroy)(struct bpf_prog *prog);
 };
 
 struct bpf_prog_offload {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 27499127e038..17d52a647fe5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -863,7 +863,6 @@ enum bpf_netdev_command {
XDP_QUERY_PROG,
XDP_QUERY_PROG_HW,
/* BPF program for offload callbacks, invoked at program load time. */
-   BPF_OFFLOAD_DESTROY,
BPF_OFFLOAD_MAP_ALLOC,
BPF_OFFLOAD_MAP_FREE,
XDP_QUERY_XSK_UMEM,
@@ -889,10 +888,6 @@ struct netdev_bpf {
/* flags with which program was installed */
u32 prog_flags;
};
-   /* BPF_OFFLOAD_DESTROY */
-   struct {
-   struct bpf_prog *prog;
-   } offload;
/* BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE */
struct {
struct bpf_offloaded_map *offmap;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index ae0167366c12..d665e75a0ac3 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -123,23 +123,6 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union 
bpf_attr *attr)
return err;
 }
 
-static int __bpf_offload_ndo(struct bpf_p

[PATCH bpf-next 6/9] bpf: pass translate() as a callback and remove its ndo_bpf subcommand

2018-11-09 Thread Quentin Monnet
As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 11 +++
 drivers/net/netdevsim/bpf.c  | 14 +-
 include/linux/bpf.h  |  1 +
 include/linux/netdevice.h|  3 +--
 kernel/bpf/offload.c | 14 +++---
 5 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 16a3a9c55852..8653a2189c19 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -33,9 +33,6 @@ nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog 
*nfp_prog,
struct nfp_bpf_neutral_map *record;
int err;
 
-   /* Map record paths are entered via ndo, update side is protected. */
-   ASSERT_RTNL();
-
/* Reuse path - other offloaded program is already tracking this map. */
record = rhashtable_lookup_fast(>maps_neutral, >id,
nfp_bpf_maps_neutral_params);
@@ -84,8 +81,6 @@ nfp_map_ptrs_forget(struct nfp_app_bpf *bpf, struct nfp_prog 
*nfp_prog)
bool freed = false;
int i;
 
-   ASSERT_RTNL();
-
for (i = 0; i < nfp_prog->map_records_cnt; i++) {
if (--nfp_prog->map_records[i]->count) {
nfp_prog->map_records[i] = NULL;
@@ -219,9 +214,10 @@ nfp_bpf_verifier_prep(struct net_device *netdev, struct 
bpf_verifier_env *env)
return ret;
 }
 
-static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
+static int nfp_bpf_translate(struct net_device *netdev, struct bpf_prog *prog)
 {
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
+   struct nfp_net *nn = netdev_priv(netdev);
unsigned int max_instr;
int err;
 
@@ -422,8 +418,6 @@ nfp_bpf_map_free(struct nfp_app_bpf *bpf, struct 
bpf_offloaded_map *offmap)
 int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf 
*bpf)
 {
switch (bpf->command) {
-   case BPF_OFFLOAD_TRANSLATE:
-   return nfp_bpf_translate(nn, bpf->offload.prog);
case BPF_OFFLOAD_DESTROY:
return nfp_bpf_destroy(nn, bpf->offload.prog);
case BPF_OFFLOAD_MAP_ALLOC:
@@ -604,4 +598,5 @@ const struct bpf_prog_offload_ops nfp_bpf_dev_ops = {
.insn_hook  = nfp_verify_insn,
.finalize   = nfp_bpf_finalize,
.prepare= nfp_bpf_verifier_prep,
+   .translate  = nfp_bpf_translate,
 };
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index d045b7d666d9..30c2cd516d1c 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -269,6 +269,14 @@ nsim_bpf_verifier_prep(struct net_device *dev, struct 
bpf_verifier_env *env)
return nsim_bpf_create_prog(ns, env->prog);
 }
 
+static int nsim_bpf_translate(struct net_device *dev, struct bpf_prog *prog)
+{
+   struct nsim_bpf_bound_prog *state = prog->aux->offload->dev_priv;
+
+   state->state = "xlated";
+   return 0;
+}
+
 static void nsim_bpf_destroy_prog(struct bpf_prog *prog)
 {
struct nsim_bpf_bound_prog *state;
@@ -285,6 +293,7 @@ static const struct bpf_prog_offload_ops nsim_bpf_dev_ops = 
{
.insn_hook  = nsim_bpf_verify_insn,
.finalize   = nsim_bpf_finalize,
.prepare= nsim_bpf_verifier_prep,
+   .translate  = nsim_bpf_translate,
 };
 
 static int nsim_setup_prog_checks(struct netdevsim *ns, struct netdev_bpf *bpf)
@@ -551,11 +560,6 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf 
*bpf)
ASSERT_RTNL();
 
switch (bpf->command) {
-   case BPF_OFFLOAD_TRANSLATE:
-   state = bpf->offload.prog->aux->offload->dev_priv;
-
-   state->state = "xlated";
-   return 0;
case BPF_OFFLOAD_DESTROY:
nsim_bpf_destroy_prog(bpf->offload.prog);
return 0;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f250494a4f56..d1eb3c8a3fa9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -269,6 +269,7 @@ struct bpf_prog_offload_ops {
 int insn_idx, int prev_insn_idx);
int (*finalize)(struct bpf_verifier_env *env);
int (*prepare)(struct net_device *netdev, struct bpf_verifier_env *env);
+   int (*translate)(struct net_device *netdev, struct bpf_prog *prog);
 };
 
 struct bpf_prog_offload {
diff --git a/include/linux/netd

[PATCH bpf-next 8/9] bpf: pass prog instead of env to bpf_prog_offload_verifier_prep()

2018-11-09 Thread Quentin Monnet
Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.

Update the callback accordingly, on kernel side and in these two
drivers.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 3 +--
 drivers/net/netdevsim/bpf.c  | 4 ++--
 include/linux/bpf.h  | 2 +-
 include/linux/bpf_verifier.h | 2 +-
 kernel/bpf/offload.c | 6 +++---
 kernel/bpf/verifier.c| 2 +-
 6 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 91085cc3c843..e6b26d2f651d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -183,10 +183,9 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
 }
 
 static int
-nfp_bpf_verifier_prep(struct net_device *netdev, struct bpf_verifier_env *env)
+nfp_bpf_verifier_prep(struct net_device *netdev, struct bpf_prog *prog)
 {
struct nfp_net *nn = netdev_priv(netdev);
-   struct bpf_prog *prog = env->prog;
struct nfp_app *app = nn->app;
struct nfp_prog *nfp_prog;
int ret;
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 33e3d54c3a0a..560bdaf1c98b 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -259,14 +259,14 @@ static int nsim_bpf_create_prog(struct netdevsim *ns, 
struct bpf_prog *prog)
 }
 
 static int
-nsim_bpf_verifier_prep(struct net_device *dev, struct bpf_verifier_env *env)
+nsim_bpf_verifier_prep(struct net_device *dev, struct bpf_prog *prog)
 {
struct netdevsim *ns = netdev_priv(dev);
 
if (!ns->bpf_bind_accept)
return -EOPNOTSUPP;
 
-   return nsim_bpf_create_prog(ns, env->prog);
+   return nsim_bpf_create_prog(ns, prog);
 }
 
 static int nsim_bpf_translate(struct net_device *dev, struct bpf_prog *prog)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 867d2801db64..888111350d0e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -268,7 +268,7 @@ struct bpf_prog_offload_ops {
int (*insn_hook)(struct bpf_verifier_env *env,
 int insn_idx, int prev_insn_idx);
int (*finalize)(struct bpf_verifier_env *env);
-   int (*prepare)(struct net_device *netdev, struct bpf_verifier_env *env);
+   int (*prepare)(struct net_device *netdev, struct bpf_prog *prog);
int (*translate)(struct net_device *netdev, struct bpf_prog *prog);
void (*destroy)(struct bpf_prog *prog);
 };
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d93e89761a8b..11f5df1092d9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -245,7 +245,7 @@ static inline struct bpf_reg_state *cur_regs(struct 
bpf_verifier_env *env)
return cur_func(env)->regs;
 }
 
-int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
+int bpf_prog_offload_verifier_prep(struct bpf_prog *prog);
 int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
 int insn_idx, int prev_insn_idx);
 int bpf_prog_offload_finalize(struct bpf_verifier_env *env);
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index d665e75a0ac3..397d206e184b 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -123,15 +123,15 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union 
bpf_attr *attr)
return err;
 }
 
-int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env)
+int bpf_prog_offload_verifier_prep(struct bpf_prog *prog)
 {
struct bpf_prog_offload *offload;
int ret = -ENODEV;
 
down_read(_devs_lock);
-   offload = env->prog->aux->offload;
+   offload = prog->aux->offload;
if (offload)
-   ret = offload->offdev->ops->prepare(offload->netdev, env);
+   ret = offload->offdev->ops->prepare(offload->netdev, prog);
offload->dev_state = !ret;
up_read(_devs_lock);
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 75dab40b19a3..8d0977980cfa 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6368,7 +6368,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
goto skip_full_check;
 
if (bpf_prog_is_dev_bound(env->prog->aux)) {
-   ret = bpf_prog_offload_verifier_prep(env);
+   ret = bpf

[PATCH bpf-next 2/9] bpf: pass a struct with offload callbacks to bpf_offload_dev_create()

2018-11-09 Thread Quentin Monnet
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since
none of those operations rely on RTNL, we can turn these three commands
into hooks inside the struct bpf_prog_offload_ops, and pass them as part
of bpf_offload_dev_create().

This commit effectively passes a pointer to the struct to
bpf_offload_dev_create(). We temporarily have two struct
bpf_prog_offload_ops instances, one under offdev->ops and one under
offload->dev_ops. The next patches will make the transition towards the
former, so that offload->dev_ops can be removed, and callbacks relying
on ndo_bpf() added to offdev->ops as well.

While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and
similarly for netdevsim).

Suggested-by: Jakub Kicinski 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c| 2 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h| 2 +-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c | 4 ++--
 drivers/net/netdevsim/bpf.c  | 6 +++---
 include/linux/bpf.h  | 3 ++-
 kernel/bpf/offload.c | 5 -
 6 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 6243af0ab025..dccae0319204 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -465,7 +465,7 @@ static int nfp_bpf_init(struct nfp_app *app)
app->ctrl_mtu = nfp_bpf_ctrl_cmsg_mtu(bpf);
}
 
-   bpf->bpf_dev = bpf_offload_dev_create();
+   bpf->bpf_dev = bpf_offload_dev_create(_bpf_dev_ops);
err = PTR_ERR_OR_ZERO(bpf->bpf_dev);
if (err)
goto err_free_neutral_maps;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index abdd93d14439..941277936475 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -513,7 +513,7 @@ int nfp_verify_insn(struct bpf_verifier_env *env, int 
insn_idx,
int prev_insn_idx);
 int nfp_bpf_finalize(struct bpf_verifier_env *env);
 
-extern const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops;
+extern const struct bpf_prog_offload_ops nfp_bpf_dev_ops;
 
 struct netdev_bpf;
 struct nfp_app;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index dc548bb4089e..2fca996a7e77 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -209,7 +209,7 @@ nfp_bpf_verifier_prep(struct nfp_app *app, struct nfp_net 
*nn,
goto err_free;
 
nfp_prog->verifier_meta = nfp_prog_first_meta(nfp_prog);
-   bpf->verifier.ops = _bpf_analyzer_ops;
+   bpf->verifier.ops = _bpf_dev_ops;
 
return 0;
 
@@ -602,7 +602,7 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog 
*prog,
return 0;
 }
 
-const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
+const struct bpf_prog_offload_ops nfp_bpf_dev_ops = {
.insn_hook  = nfp_verify_insn,
.finalize   = nfp_bpf_finalize,
 };
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index cb3518474f0e..135aee864162 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -91,7 +91,7 @@ static int nsim_bpf_finalize(struct bpf_verifier_env *env)
return 0;
 }
 
-static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
+static const struct bpf_prog_offload_ops nsim_bpf_dev_ops = {
.insn_hook  = nsim_bpf_verify_insn,
.finalize   = nsim_bpf_finalize,
 };
@@ -547,7 +547,7 @@ int nsim_bpf(struct net_device *dev, struct netdev_bpf *bpf)
if (err)
return err;
 
-   bpf->verifier.ops = _bpf_analyzer_ops;
+   bpf->verifier.ops = _bpf_dev_ops;
return 0;
case BPF_OFFLOAD_TRANSLATE:
state = bpf->offload.prog->aux->offload->dev_priv;
@@ -599,7 +599,7 @@ int nsim_bpf_init(struct netdevsim *ns)
if (IS_ERR_OR_NULL(ns->sdev->ddir_bpf_bound_progs))
return -ENOMEM;
 
-   ns->sdev->bpf_dev = bpf_offload_dev_create();
+   ns->sdev->bpf_dev = bpf_offload_dev_create(_bpf_dev_ops);
err = PTR_ERR_OR_ZERO(ns->sdev->bpf_dev);
if (err)
return err;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b6a296e

[PATCH bpf-next 1/9] nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c

2018-11-09 Thread Quentin Monnet
We are about to add several new callbacks to the struct, all of them
defined in offload.c. Move the struct bpf_prog_offload_ops object in
that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no
longer be static.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  4 
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  5 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 11 +++
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 7f591d71ab28..abdd93d14439 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -509,6 +509,10 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, 
unsigned int cnt);
 int nfp_bpf_jit(struct nfp_prog *prog);
 bool nfp_bpf_supported_opcode(u8 code);
 
+int nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx,
+   int prev_insn_idx);
+int nfp_bpf_finalize(struct bpf_verifier_env *env);
+
 extern const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops;
 
 struct netdev_bpf;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 07bdc1f61996..dc548bb4089e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -601,3 +601,8 @@ int nfp_net_bpf_offload(struct nfp_net *nn, struct bpf_prog 
*prog,
 
return 0;
 }
+
+const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
+   .insn_hook  = nfp_verify_insn,
+   .finalize   = nfp_bpf_finalize,
+};
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 99f977bfd8cc..337bb862ec1d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -623,8 +623,8 @@ nfp_bpf_check_alu(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta,
return 0;
 }
 
-static int
-nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
+int nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx,
+   int prev_insn_idx)
 {
struct nfp_prog *nfp_prog = env->prog->aux->offload->dev_priv;
struct nfp_insn_meta *meta = nfp_prog->verifier_meta;
@@ -745,7 +745,7 @@ nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog, unsigned 
int cnt)
goto continue_subprog;
 }
 
-static int nfp_bpf_finalize(struct bpf_verifier_env *env)
+int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
struct bpf_subprog_info *info;
struct nfp_prog *nfp_prog;
@@ -788,8 +788,3 @@ static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 
return 0;
 }
-
-const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
-   .insn_hook  = nfp_verify_insn,
-   .finalize   = nfp_bpf_finalize,
-};
-- 
2.17.1



[PATCH bpf-next 3/9] bpf: call verify_insn from its callback in struct bpf_offload_dev

2018-11-09 Thread Quentin Monnet
We intend to remove the dev_ops in struct bpf_prog_offload, and to only
keep the ops in struct bpf_offload_dev instead, which is accessible from
more locations for passing function pointers.

But dev_ops is used for calling the verify_insn hook. Switch to the
newly added ops in struct bpf_prog_offload instead.

To avoid table lookups for each eBPF instruction to verify, we remember
the offdev attached to a netdev and modify bpf_offload_find_netdev() to
avoid performing more than once a lookup for a given offload object.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/linux/bpf.h  | 1 +
 kernel/bpf/offload.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c0197c37b2b2..672714cd904f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -273,6 +273,7 @@ struct bpf_prog_offload_ops {
 struct bpf_prog_offload {
struct bpf_prog *prog;
struct net_device   *netdev;
+   struct bpf_offload_dev  *offdev;
void*dev_priv;
struct list_headoffloads;
booldev_state;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index d513fbf9ca53..2cd3c0d0417b 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -107,6 +107,7 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union 
bpf_attr *attr)
err = -EINVAL;
goto err_unlock;
}
+   offload->offdev = ondev->offdev;
prog->aux->offload = offload;
list_add_tail(>offloads, >progs);
dev_put(offload->netdev);
@@ -167,7 +168,8 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env 
*env,
down_read(_devs_lock);
offload = env->prog->aux->offload;
if (offload)
-   ret = offload->dev_ops->insn_hook(env, insn_idx, prev_insn_idx);
+   ret = offload->offdev->ops->insn_hook(env, insn_idx,
+ prev_insn_idx);
up_read(_devs_lock);
 
return ret;
-- 
2.17.1



[PATCH bpf-next 0/9] bpf: pass device ops as callbacks and remove some ndo_bpf subcommands

2018-11-09 Thread Quentin Monnet
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since none
of those operations rely on RTNL, we can turn these three commands into
hooks inside the struct bpf_prog_offload_ops, and pass them as part of
bpf_offload_dev_create().

This patch set changes the offload architecture to do so, and brings the
relevant changes to the nfp and netdevsim drivers.

Quentin Monnet (9):
  nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c
  bpf: pass a struct with offload callbacks to bpf_offload_dev_create()
  bpf: call verify_insn from its callback in struct bpf_offload_dev
  bpf: call finalize() from its callback in struct bpf_offload_dev
  bpf: call verifier_prep from its callback in struct bpf_offload_dev
  bpf: pass translate() as a callback and remove its ndo_bpf subcommand
  bpf: pass destroy() as a callback and remove its ndo_bpf subcommand
  bpf: pass prog instead of env to bpf_prog_offload_verifier_prep()
  bpf: do not pass netdev to translate() and prepare() offload callbacks

 drivers/net/ethernet/netronome/nfp/bpf/main.c |  2 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  6 +-
 .../net/ethernet/netronome/nfp/bpf/offload.c  | 34 -
 .../net/ethernet/netronome/nfp/bpf/verifier.c | 11 +--
 drivers/net/netdevsim/bpf.c   | 51 +++--
 include/linux/bpf.h   |  8 +-
 include/linux/bpf_verifier.h  |  2 +-
 include/linux/netdevice.h | 12 ---
 kernel/bpf/offload.c  | 75 +++
 kernel/bpf/verifier.c |  2 +-
 10 files changed, 85 insertions(+), 118 deletions(-)

-- 
2.17.1



Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Quentin Monnet
2018-11-08 10:01 UTC-0800 ~ Stanislav Fomichev 
> On 11/08, Quentin Monnet wrote:
>> Hi Stanislav, thanks for the changes! More comments below.
> Thank you for another round of review!
> 
>> 2018-11-07 21:39 UTC-0800 ~ Stanislav Fomichev 
>>> This commit adds support for loading/attaching/detaching flow
>>> dissector program. The structure of the flow dissector program is
>>> assumed to be the same as in the selftests:
>>>
>>> * flow_dissector section with the main entry point
>>> * a bunch of tail call progs
>>> * a jmp_table map that is populated with the tail call progs
>>>
>>> When `bpftool load` is called with a flow_dissector prog (i.e. when the
>>> first section is flow_dissector of 'type flow_dissector' argument is
>>> passed), we load and pin all the programs/maps. User is responsible to
>>> construct the jump table for the tail calls.
>>>
>>> The last argument of `bpftool attach` is made optional for this use
>>> case.
>>>
>>> Example:
>>> bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
>>> /sys/fs/bpf/flow type flow_dissector
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>  key 0 0 0 0 \
>>>  value pinned /sys/fs/bpf/flow/IP
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>  key 1 0 0 0 \
>>>  value pinned /sys/fs/bpf/flow/IPV6
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>  key 2 0 0 0 \
>>>  value pinned /sys/fs/bpf/flow/IPV6OP
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>  key 3 0 0 0 \
>>>  value pinned /sys/fs/bpf/flow/IPV6FR
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>  key 4 0 0 0 \
>>>  value pinned /sys/fs/bpf/flow/MPLS
>>>
>>> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>>>  key 5 0 0 0 \
>>>  value pinned /sys/fs/bpf/flow/VLAN
>>>
>>> bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector
>>>
>>> Tested by using the above lines to load the prog in
>>> the test_flow_dissector.sh selftest.
>>>
>>> Signed-off-by: Stanislav Fomichev 
>>> ---
>>>   .../bpftool/Documentation/bpftool-prog.rst|  36 --
>>>   tools/bpf/bpftool/bash-completion/bpftool |   6 +-
>>>   tools/bpf/bpftool/common.c|  30 ++---
>>>   tools/bpf/bpftool/main.h  |   1 +
>>>   tools/bpf/bpftool/prog.c  | 112 +-
>>>   5 files changed, 126 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
>>> b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
>>> index ac4e904b10fb..0374634c3087 100644
>>> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
>>> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
>>> @@ -15,7 +15,8 @@ SYNOPSIS
>>> *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
>>> **-f** | **--bpffs** } }
>>> *COMMANDS* :=
>>> -   { **show** | **list** | **dump xlated** | **dump jited** | **pin** | 
>>> **load** | **help** }
>>> +   { **show** | **list** | **dump xlated** | **dump jited** | **pin** | 
>>> **load**
>>> +   | **loadall** | **help** }
>>>   MAP COMMANDS
>>>   =
>>> @@ -24,9 +25,9 @@ MAP COMMANDS
>>>   | **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
>>> | **visual**}]
>>>   | **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
>>> **opcodes**}]
>>>   | **bpftool** **prog pin** *PROG* *FILE*
>>> -|  **bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
>>> {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>>> -|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
>>> -|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
>>> +|  **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
>>> [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
>>> +|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
>>> +|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>>>   | **bpftool** **prog help**
>>>   |
>>>   | *MAP* := { **id** *MAP_ID* | **pinn

[PATCH bpf 2/4] tools: bpftool: fix plain output and doc for --bpffs option

2018-11-08 Thread Quentin Monnet
Edit the documentation of the -f|--bpffs option to make it explicit that
it dumps paths of pinned programs when bpftool is used to list the
programs only, so that users do not believe they will see the name of
the newly pinned program with "bpftool prog pin" or "bpftool prog load".

Also fix the plain output: do not add a blank line after each program
block, in order to remain consistent with what bpftool does when the
option is not passed.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 3 ++-
 tools/bpf/bpftool/prog.c | 3 +--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index ac4e904b10fb..81fb97acfaeb 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -124,7 +124,8 @@ OPTIONS
  Generate human-readable JSON output. Implies **-j**.
 
-f, --bpffs
- Show file names of pinned programs.
+ When showing BPF programs, show file names of pinned
+ programs.
 
 EXAMPLES
 
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index b9b84553bec4..763ddfa29045 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -357,10 +357,9 @@ static void print_prog_plain(struct bpf_prog_info *info, 
int fd)
if (!hash_empty(prog_table.table)) {
struct pinned_obj *obj;
 
-   printf("\n");
hash_for_each_possible(prog_table.table, obj, hash, info->id) {
if (obj->id == info->id)
-   printf("\tpinned %s\n", obj->path);
+   printf("\n\tpinned %s", obj->path);
}
}
 
-- 
2.7.4



[PATCH bpf 4/4] tools: bpftool: update references to other man pages in documentation

2018-11-08 Thread Quentin Monnet
Update references to other bpftool man pages at the bottom of each
manual page. Also reference the "bpf(2)" and "bpf-helpers(7)" man pages.

References are sorted by number of man section, then by
"prog-and-map-go-first", the other pages in alphabetical order.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst | 8 +++-
 tools/bpf/bpftool/Documentation/bpftool-map.rst| 8 +++-
 tools/bpf/bpftool/Documentation/bpftool-net.rst| 8 +++-
 tools/bpf/bpftool/Documentation/bpftool-perf.rst   | 8 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   | 8 +++-
 tools/bpf/bpftool/Documentation/bpftool.rst| 9 +++--
 6 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst 
b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
index edbe81534c6d..d07ccf8a23f7 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -137,4 +137,10 @@ EXAMPLES
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
+   **bpf**\ (2),
+   **bpf-helpers**\ (7),
+   **bpftool**\ (8),
+   **bpftool-prog**\ (8),
+   **bpftool-map**\ (8),
+   **bpftool-net**\ (8),
+   **bpftool-perf**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index f55a2daed59b..7bb787cfa971 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -171,4 +171,10 @@ The following three commands are equivalent:
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
+   **bpf**\ (2),
+   **bpf-helpers**\ (7),
+   **bpftool**\ (8),
+   **bpftool-prog**\ (8),
+   **bpftool-cgroup**\ (8),
+   **bpftool-net**\ (8),
+   **bpftool-perf**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst 
b/tools/bpf/bpftool/Documentation/bpftool-net.rst
index 408ec30d8872..ed87c9b619ad 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-net.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
@@ -136,4 +136,10 @@ EXAMPLES
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
+   **bpf**\ (2),
+   **bpf-helpers**\ (7),
+   **bpftool**\ (8),
+   **bpftool-prog**\ (8),
+   **bpftool-map**\ (8),
+   **bpftool-cgroup**\ (8),
+   **bpftool-perf**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-perf.rst 
b/tools/bpf/bpftool/Documentation/bpftool-perf.rst
index e3eb0eab7641..f4c5e5538bb8 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-perf.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-perf.rst
@@ -78,4 +78,10 @@ EXAMPLES
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
+   **bpf**\ (2),
+   **bpf-helpers**\ (7),
+   **bpftool**\ (8),
+   **bpftool-prog**\ (8),
+   **bpftool-map**\ (8),
+   **bpftool-cgroup**\ (8),
+   **bpftool-net**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 81fb97acfaeb..ecf618807125 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -207,4 +207,10 @@ EXAMPLES
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-map**\ (8), **bpftool-cgroup**\ (8)
+   **bpf**\ (2),
+   **bpf-helpers**\ (7),
+   **bpftool**\ (8),
+   **bpftool-map**\ (8),
+   **bpftool-cgroup**\ (8),
+   **bpftool-net**\ (8),
+   **bpftool-perf**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst 
b/tools/bpf/bpftool/Documentation/bpftool.rst
index 04cd4f92ab89..129b7a9c0f9b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -63,5 +63,10 @@ OPTIONS
 
 SEE ALSO
 
-   **bpftool-map**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
-**bpftool-perf**\ (8), **bpftool-net**\ (8)
+   **bpf**\ (2),
+   **bpf-helpers**\ (7),
+   **bpftool-prog**\ (8),
+   **bpftool-map**\ (8),
+   **bpftool-cgroup**\ (8),
+   **bpftool-net**\ (8),
+   **bpftool-perf**\ (8)
-- 
2.7.4



[PATCH bpf 0/4] tools: bpftool: bring several minor fixes to bpftool

2018-11-08 Thread Quentin Monnet
Hi,
This set contains minor fixes for bpftool code and documentation.
Please refer to individual patches for details.

Quentin Monnet (4):
  tools: bpftool: prevent infinite loop in get_fdinfo()
  tools: bpftool: fix plain output and doc for --bpffs option
  tools: bpftool: pass an argument to silence open_obj_pinned()
  tools: bpftool: update references to other man pages in documentation

 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  8 +++-
 tools/bpf/bpftool/Documentation/bpftool-map.rst|  8 +++-
 tools/bpf/bpftool/Documentation/bpftool-net.rst|  8 +++-
 tools/bpf/bpftool/Documentation/bpftool-perf.rst   |  8 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   | 11 +--
 tools/bpf/bpftool/Documentation/bpftool.rst|  9 +++--
 tools/bpf/bpftool/common.c | 17 +
 tools/bpf/bpftool/main.h   |  2 +-
 tools/bpf/bpftool/prog.c   |  3 +--
 9 files changed, 55 insertions(+), 19 deletions(-)

-- 
2.7.4



[PATCH bpf 1/4] tools: bpftool: prevent infinite loop in get_fdinfo()

2018-11-08 Thread Quentin Monnet
Function getline() returns -1 on failure to read a line, thus creating
an infinite loop in get_fdinfo() if the key is not found. Fix it by
calling the function only as long as we get a strictly positive return
value.

Found by copying the code for a key which is not always present...

Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 1149565be4b1..acd839e0e801 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -312,7 +312,7 @@ char *get_fdinfo(int fd, const char *key)
return NULL;
}
 
-   while ((n = getline(, _n, fdi))) {
+   while ((n = getline(, _n, fdi)) > 0) {
char *value;
int len;
 
-- 
2.7.4



[PATCH bpf 3/4] tools: bpftool: pass an argument to silence open_obj_pinned()

2018-11-08 Thread Quentin Monnet
Function open_obj_pinned() prints error messages when it fails to open a
link in the BPF virtual file system. However, in some occasions it is
not desirable to print an error, for example when we parse all links
under the bpffs root, and the error is due to some paths actually being
symbolic links.

Example output:

# ls -l /sys/fs/bpf/
lrwxrwxrwx 1 root root 0 Oct 18 19:00 ip -> /sys/fs/bpf/tc/
drwx-- 3 root root 0 Oct 18 19:00 tc
lrwxrwxrwx 1 root root 0 Oct 18 19:00 xdp -> /sys/fs/bpf/tc/

# bpftool --bpffs prog show
Error: bpf obj get (/sys/fs/bpf): Permission denied
Error: bpf obj get (/sys/fs/bpf): Permission denied

# strace -e bpf bpftool --bpffs prog show
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/ip", bpf_fd=0}, 72) = -1 EACCES 
(Permission denied)
Error: bpf obj get (/sys/fs/bpf): Permission denied
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/xdp", bpf_fd=0}, 72) = -1 EACCES 
(Permission denied)
Error: bpf obj get (/sys/fs/bpf): Permission denied
...

To fix it, pass a bool as a second argument to the function, and prevent
it from printing an error when the argument is set to true.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/common.c | 15 ---
 tools/bpf/bpftool/main.h   |  2 +-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index acd839e0e801..7b2388bec4a9 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -138,16 +138,17 @@ static int mnt_bpffs(const char *target, char *buff, 
size_t bufflen)
return 0;
 }
 
-int open_obj_pinned(char *path)
+int open_obj_pinned(char *path, bool quiet)
 {
int fd;
 
fd = bpf_obj_get(path);
if (fd < 0) {
-   p_err("bpf obj get (%s): %s", path,
- errno == EACCES && !is_bpffs(dirname(path)) ?
-   "directory not in bpf file system (bpffs)" :
-   strerror(errno));
+   if (!quiet)
+   p_err("bpf obj get (%s): %s", path,
+ errno == EACCES && !is_bpffs(dirname(path)) ?
+   "directory not in bpf file system (bpffs)" :
+   strerror(errno));
return -1;
}
 
@@ -159,7 +160,7 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
exp_type)
enum bpf_obj_type type;
int fd;
 
-   fd = open_obj_pinned(path);
+   fd = open_obj_pinned(path, false);
if (fd < 0)
return -1;
 
@@ -392,7 +393,7 @@ int build_pinned_obj_table(struct pinned_obj_table *tab,
while ((ftse = fts_read(fts))) {
if (!(ftse->fts_info & FTS_F))
continue;
-   fd = open_obj_pinned(ftse->fts_path);
+   fd = open_obj_pinned(ftse->fts_path, true);
if (fd < 0)
continue;
 
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 14857c273bf6..6d33baa51273 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -129,7 +129,7 @@ int cmd_select(const struct cmd *cmds, int argc, char 
**argv,
 int get_fd_type(int fd);
 const char *get_fd_type_name(enum bpf_obj_type type);
 char *get_fdinfo(int fd, const char *key);
-int open_obj_pinned(char *path);
+int open_obj_pinned(char *path, bool quiet);
 int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
 int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
 int do_pin_fd(int fd, const char *name);
-- 
2.7.4



Re: [PATCH v3 bpf-next 4/4] bpftool: support loading flow dissector

2018-11-08 Thread Quentin Monnet

Hi Stanislav, thanks for the changes! More comments below.

2018-11-07 21:39 UTC-0800 ~ Stanislav Fomichev 

This commit adds support for loading/attaching/detaching flow
dissector program. The structure of the flow dissector program is
assumed to be the same as in the selftests:

* flow_dissector section with the main entry point
* a bunch of tail call progs
* a jmp_table map that is populated with the tail call progs

When `bpftool load` is called with a flow_dissector prog (i.e. when the
first section is flow_dissector of 'type flow_dissector' argument is
passed), we load and pin all the programs/maps. User is responsible to
construct the jump table for the tail calls.

The last argument of `bpftool attach` is made optional for this use
case.

Example:
bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
/sys/fs/bpf/flow type flow_dissector

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
 key 0 0 0 0 \
 value pinned /sys/fs/bpf/flow/IP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
 key 1 0 0 0 \
 value pinned /sys/fs/bpf/flow/IPV6

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
 key 2 0 0 0 \
 value pinned /sys/fs/bpf/flow/IPV6OP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
 key 3 0 0 0 \
 value pinned /sys/fs/bpf/flow/IPV6FR

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
 key 4 0 0 0 \
 value pinned /sys/fs/bpf/flow/MPLS

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
 key 5 0 0 0 \
 value pinned /sys/fs/bpf/flow/VLAN

bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector

Tested by using the above lines to load the prog in
the test_flow_dissector.sh selftest.

Signed-off-by: Stanislav Fomichev 
---
  .../bpftool/Documentation/bpftool-prog.rst|  36 --
  tools/bpf/bpftool/bash-completion/bpftool |   6 +-
  tools/bpf/bpftool/common.c|  30 ++---
  tools/bpf/bpftool/main.h  |   1 +
  tools/bpf/bpftool/prog.c  | 112 +-
  5 files changed, 126 insertions(+), 59 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index ac4e904b10fb..0374634c3087 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -15,7 +15,8 @@ SYNOPSIS
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
  
  	*COMMANDS* :=

-   { **show** | **list** | **dump xlated** | **dump jited** | **pin** | 
**load** | **help** }
+   { **show** | **list** | **dump xlated** | **dump jited** | **pin** | 
**load**
+   | **loadall** | **help** }
  
  MAP COMMANDS

  =
@@ -24,9 +25,9 @@ MAP COMMANDS
  | **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
| **visual**}]
  | **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
  | **bpftool** **prog pin** *PROG* *FILE*
-|  **bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
{**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
-|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
-|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
+|  **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
+|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
  | **bpftool** **prog help**
  |
  | *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -39,7 +40,9 @@ MAP COMMANDS
  | **cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | 
**cgroup/post_bind6** |
  | **cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** 
| **cgroup/sendmsg6**
  | }
-|   *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
+|   *ATTACH_TYPE* := {
+|  **msg_verdict** | **skb_verdict** | **skb_parse** | 
**flow_dissector**
+|  }
  
  
  DESCRIPTION

@@ -79,8 +82,11 @@ DESCRIPTION
  contain a dot character ('.'), which is reserved for future
  extensions of *bpffs*.
  
-	**bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]

+   **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
[**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
  Load bpf program from binary *OBJ* and pin as *FILE*.
+ **bpftool prog load** will pin only the first bpf program
+ from the *OBJ*, **bpftool prog loadall** will pin all maps
+ and programs from the *OBJ*.


This could be improved regarding maps: with "bpftool prog load" I think 
we also load 

Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector

2018-11-07 Thread Quentin Monnet

2018-11-07 12:32 UTC-0800 ~ Jakub Kicinski 

On Wed, 7 Nov 2018 20:08:53 +, Quentin Monnet wrote:

+   err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
+   if (err) {
+   p_err("failed to pin program %s",
+ bpf_program__title(prog, false));
+   goto err_close_obj;
+   }


I don't have the same opinion as Jakub for pinning :). I was hoping we
could also load additional programs (for tail calls) for
non-flow_dissector programs. Could this be an occasion to update the
code in that direction?


Do you mean having the bpftool construct an array for tail calling
automatically when loading an object?  Or do a "mass pin" of all
programs in an object file?

I'm not convinced about this strategy of auto assembling a tail call
array by assuming that a flow dissector object carries programs for
protocols in order (apart from the main program which doesn't have to
be first, for some reason).


Not constructing the prog array, I don't think this should be the role 
of bpftool either. Much more a "mass pin", so that you have a link to 
each program loaded from the object file and can later add them to a 
prog array map with subsequent calls to bpftool.


Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector

2018-11-07 Thread Quentin Monnet
Hi Stanislav,

2018-11-07 11:35 UTC-0800 ~ Stanislav Fomichev 
> This commit adds support for loading/attaching/detaching flow
> dissector program. The structure of the flow dissector program is
> assumed to be the same as in the selftests:
> 
> * flow_dissector section with the main entry point
> * a bunch of tail call progs
> * a jmp_table map that is populated with the tail call progs
> 
> When `bpftool load` is called with a flow_dissector prog (i.e. when the
> first section is flow_dissector of 'type flow_dissector' argument is
> passed), we load and pin all the programs and build the jump table.
> 
> The last argument of `bpftool attach` is made optional for this use
> case.
> 
> Example:
> bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
>   /sys/fs/bpf/flow type flow_dissector
> bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector/0 flow_dissector
> 
> Tested by using the above two lines to load the prog in
> the test_flow_dissector.sh selftest.
> 
> Signed-off-by: Stanislav Fomichev 
> ---
>  .../bpftool/Documentation/bpftool-prog.rst|  16 ++-
>  tools/bpf/bpftool/common.c|  32 +++--
>  tools/bpf/bpftool/main.h  |   1 +
>  tools/bpf/bpftool/prog.c  | 135 +++---
>  4 files changed, 141 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
> b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> index ac4e904b10fb..3caa9153435b 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> @@ -25,8 +25,8 @@ MAP COMMANDS
>  |**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
> **opcodes**}]
>  |**bpftool** **prog pin** *PROG* *FILE*
>  |**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
> {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> -|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
> -|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
> +|   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
> +|   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>  |**bpftool** **prog help**
>  |
>  |*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> @@ -39,7 +39,9 @@ MAP COMMANDS
>  |**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | 
> **cgroup/post_bind6** |
>  |**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** 
> | **cgroup/sendmsg6**
>  |}
> -|   *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** 
> }
> +|   *ATTACH_TYPE* := {
> +||   **msg_verdict** | **skb_verdict** | **skb_parse** | 
> **flow_dissector**

^
Nitpick: Could you please remove the above pipe?

> +|}
>  
>  
>  DESCRIPTION
> @@ -97,13 +99,13 @@ DESCRIPTION
> contain a dot character ('.'), which is reserved for future
> extensions of *bpffs*.
>  
> -**bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
> +**bpftool prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
>Attach bpf program *PROG* (with type specified by 
> *ATTACH_TYPE*)
> -  to the map *MAP*.
> +  to the optional map *MAP*.
>  
> -**bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
> +**bpftool prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
>Detach bpf program *PROG* (with type specified by 
> *ATTACH_TYPE*)
> -  from the map *MAP*.
> +  from the optional map *MAP*.
>  
>   **bpftool prog help**
> Print short help message.
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index 25af85304ebe..963881142dfb 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -169,34 +169,24 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
> exp_type)
>   return fd;
>  }
>  
> -int do_pin_fd(int fd, const char *name)
> +int mount_bpffs_for_pin(const char *name)
>  {
>   char err_str[ERR_MAX_LEN];
>   char *file;
>   char *dir;
>   int err = 0;
>  
> - err = bpf_obj_pin(fd, name);
> - if (!err)
> - goto out;
> -
>   file = malloc(strlen(name) + 1);
>   strcpy(file, name);
>   dir = dirname(file);
>  
> - if (errno != EPERM || is_bpffs(dir)) {
> - p_err("can't pin the object (%s): %s", name, strerror(errno));
> + if (is_bpffs(dir)) {
> + /* nothing to do if already mounted */
>   goto out_free;
>   }
>  
> - /* Attempt to mount bpffs, then retry pinning. */
>   err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
> - if (!err) {
> - err = bpf_obj_pin(fd, name);
> - if (err)
> - p_err("can't pin the object (%s): %s", name,
> -   strerror(errno));
> - } else {
> + if (err) {
>   

Re: [PATCH bpf-next] tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps

2018-11-07 Thread Quentin Monnet
2018-11-07 16:59 UTC+ ~ Martin Lau 
> On Wed, Nov 07, 2018 at 12:29:30PM +0000, Quentin Monnet wrote:
>> The limit for memory locked in the kernel by a process is usually set to
>> 64 bytes by default. This can be an issue when creating large BPF maps
> hmm... 64 _k_bytes?

Ouch. That's true. Thanks! I can respin to fix the commit log if needed.

>> and/or loading many programs. A workaround is to raise this limit for
>> the current process before trying to create a new BPF map. Changing the
>> hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
>> root user (for non-root users, a call to setrlimit fails (and sets
>> errno) and the program simply goes on with its rlimit unchanged).
>>
>> There is no API to get the current amount of memory locked for a user,
>> therefore we cannot raise the limit only when required. One solution,
>> used by bcc, is to try to create the map, and on getting a EPERM error,
>> raising the limit to infinity before giving another try. Another
>> approach, used in iproute2, is to raise the limit in all cases, before
>> trying to create the map.
>>
>> Here we do the same as in iproute2: the rlimit is raised to infinity
>> before trying to load programs or to create maps with bpftool.
>>
>> Signed-off-by: Quentin Monnet 
>> Reviewed-by: Jakub Kicinski 
> Patch LGTM.
> 
> Acked-by: Martin KaFai Lau 
> 

Thanks for this as well.
Quentin


[PATCH bpf-next] tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps

2018-11-07 Thread Quentin Monnet
The limit for memory locked in the kernel by a process is usually set to
64 bytes by default. This can be an issue when creating large BPF maps
and/or loading many programs. A workaround is to raise this limit for
the current process before trying to create a new BPF map. Changing the
hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
root user (for non-root users, a call to setrlimit fails (and sets
errno) and the program simply goes on with its rlimit unchanged).

There is no API to get the current amount of memory locked for a user,
therefore we cannot raise the limit only when required. One solution,
used by bcc, is to try to create the map, and on getting a EPERM error,
raising the limit to infinity before giving another try. Another
approach, used in iproute2, is to raise the limit in all cases, before
trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load programs or to create maps with bpftool.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/common.c | 8 
 tools/bpf/bpftool/main.h   | 2 ++
 tools/bpf/bpftool/map.c| 2 ++
 tools/bpf/bpftool/prog.c   | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 25af85304ebe..1149565be4b1 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -99,6 +100,13 @@ static bool is_bpffs(char *path)
return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
 }
 
+void set_max_rlimit(void)
+{
+   struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+   setrlimit(RLIMIT_MEMLOCK, );
+}
+
 static int mnt_bpffs(const char *target, char *buff, size_t bufflen)
 {
bool bind_done = false;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 28322ace2856..14857c273bf6 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -100,6 +100,8 @@ bool is_prefix(const char *pfx, const char *str);
 void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep);
 void usage(void) __noreturn;
 
+void set_max_rlimit(void);
+
 struct pinned_obj_table {
DECLARE_HASHTABLE(table, 16);
 };
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 7bf38f0e152e..101b8a881225 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1140,6 +1140,8 @@ static int do_create(int argc, char **argv)
return -1;
}
 
+   set_max_rlimit();
+
fd = bpf_create_map_xattr();
if (fd < 0) {
p_err("map create failed: %s", strerror(errno));
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 5302ee282409..b9b84553bec4 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -995,6 +995,8 @@ static int do_load(int argc, char **argv)
goto err_close_obj;
}
 
+   set_max_rlimit();
+
err = bpf_object__load(obj);
if (err) {
p_err("failed to load object file");
-- 
2.7.4



[PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh

2018-11-07 Thread Quentin Monnet
libbpf is now able to load successfully test_l4lb_noinline.o and
samples/bpf/tracex3_kern.o.

For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
and remove the associated "TODO".

For tracex3_kern.o, instead of loading a program from samples/bpf/ that
might not have been compiled at this stage, try loading a program from
BPF selftests. Since this test case is about loading a program compiled
without the "-target bpf" flag, change the Makefile to compile one
program accordingly (instead of passing the flag for compiling all
programs).

Regarding test_xdp_noinline.o: in its current shape the program fails to
load because it provides no version section, but the loader needs one.
The test was added to make sure that libbpf could load XDP programs even
if they do not provide a version number in a dedicated section. But
libbpf is already capable of doing that: in our case loading fails
because the loader does not know that this is an XDP program (it does
not need to, since it does not attach the program). So trying to load
test_xdp_noinline.o does not bring much here: just delete this subtest.

For the record, the error message obtained with tracex3_kern.o was
fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF
objects containing .eh_frames")

I have not been abled to reproduce the "libbpf: incorrect bpf_call
opcode" error for test_l4lb_noinline.o, even with the version of libbpf
present at the time when test_libbpf.sh and test_libbpf_open.c were
created.

RFC -> v1:
- Compile test_xdp without the "-target bpf" flag, and try to load it
  instead of ../../samples/bpf/tracex3_kern.o.
- Delete test_xdp_noinline.o subtest.

Cc: Jesper Dangaard Brouer 
Signed-off-by: Quentin Monnet 
Acked-by: Jakub Kicinski 
---
 tools/testing/selftests/bpf/Makefile   | 10 ++
 tools/testing/selftests/bpf/test_libbpf.sh | 14 --
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index e39dfb4e7970..ecd79b7fb107 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -135,6 +135,16 @@ endif
 endif
 endif
 
+# Have one program compiled without "-target bpf" to test whether libbpf loads
+# it successfully
+$(OUTPUT)/test_xdp.o: test_xdp.c
+   $(CLANG) $(CLANG_FLAGS) \
+   -O2 -emit-llvm -c $< -o - | \
+   $(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+   $(BTF_PAHOLE) -J $@
+endif
+
 $(OUTPUT)/%.o: %.c
$(CLANG) $(CLANG_FLAGS) \
 -O2 -target bpf -emit-llvm -c $< -o - |  \
diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
index 156d89f1edcc..2989b2e2d856 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -33,17 +33,11 @@ trap exit_handler 0 2 3 6 9
 
 libbpf_open_file test_l4lb.o
 
-# TODO: fix libbpf to load noinline functions
-# [warning] libbpf: incorrect bpf_call opcode
-#libbpf_open_file test_l4lb_noinline.o
+# Load a program with BPF-to-BPF calls
+libbpf_open_file test_l4lb_noinline.o
 
-# TODO: fix test_xdp_meta.c to load with libbpf
-# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
-#libbpf_open_file test_xdp_meta.o
-
-# TODO: fix libbpf to handle .eh_frame
-# [warning] libbpf: relocation failed: no section(10)
-#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+# Load a program compiled without the "-target bpf" flag
+libbpf_open_file test_xdp.o
 
 # Success
 exit 0
-- 
2.7.4



Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps

2018-11-02 Thread Quentin Monnet
2018-11-02 10:08 UTC+0100 ~ Daniel Borkmann 
> On 11/01/2018 06:18 PM, Quentin Monnet wrote:
>> 2018-10-30 15:23 UTC+ ~ Quentin Monnet 
>>> The limit for memory locked in the kernel by a process is usually set to
>>> 64 bytes by default. This can be an issue when creating large BPF maps.
>>> A workaround is to raise this limit for the current process before
>>> trying to create a new BPF map. Changing the hard limit requires the
>>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>>> only root can create BPF maps).
>>
>> Sorry, the parenthesis is not correct: non-root users can in fact create
>> BPF maps as well. If a non-root user calls the function to create a map,
>> setrlimit() will fail silently (but set errno), and the program will
>> simply go on with its rlimit unchanged.
>>
>>> As far as I know there is not API to get the current amount of memory
>>> locked for a user, therefore we cannot raise the limit only when
>>> required. One solution, used by bcc, is to try to create the map, and on
>>> getting a EPERM error, raising the limit to infinity before giving
>>> another try. Another approach, used in iproute, is to raise the limit in
>>> all cases, before trying to create the map.
>>>
>>> Here we do the same as in iproute2: the rlimit is raised to infinity
>>> before trying to load the map.
>>>
>>> I send this patch as a RFC to see if people would prefer the bcc
>>> approach instead, or the rlimit change to be in bpftool rather than in
>>> libbpf.
> 
> I'd avoid doing something like this in a generic library; it's basically an
> ugly hack for the kind of accounting we're doing and only shows that while
> this was "good enough" to start off with in the early days, we should be
> doing something better today if every application raises it to inf anyway
> then it's broken. :) It just shows that this missed its purpose. Similarly
> to the jit_limit discussion on rlimit, perhaps we should be considering
> switching to something else entirely from kernel side. Could be something
> like memcg but this definitely needs some more evaluation first.

Changing the way limitations are enforced sounds like a cleaner
long-term approach indeed.

> (Meanwhile
> I'd not change the lib but callers instead and once we have something better
> in place we remove this type of "raising to inf" from the tree ...)

Understood, for the time beeing I'll repost a patch adding the
modification to bpftool once bpf-next is open.

Thanks Daniel!
Quentin


Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps

2018-11-01 Thread Quentin Monnet
2018-10-30 15:23 UTC+ ~ Quentin Monnet 
> The limit for memory locked in the kernel by a process is usually set to
> 64 bytes by default. This can be an issue when creating large BPF maps.
> A workaround is to raise this limit for the current process before
> trying to create a new BPF map. Changing the hard limit requires the
> CAP_SYS_RESOURCE and can usually only be done by root user (but then
> only root can create BPF maps).

Sorry, the parenthesis is not correct: non-root users can in fact create
BPF maps as well. If a non-root user calls the function to create a map,
setrlimit() will fail silently (but set errno), and the program will
simply go on with its rlimit unchanged.

> 
> As far as I know there is not API to get the current amount of memory
> locked for a user, therefore we cannot raise the limit only when
> required. One solution, used by bcc, is to try to create the map, and on
> getting a EPERM error, raising the limit to infinity before giving
> another try. Another approach, used in iproute, is to raise the limit in
> all cases, before trying to create the map.
> 
> Here we do the same as in iproute2: the rlimit is raised to infinity
> before trying to load the map.
> 
> I send this patch as a RFC to see if people would prefer the bcc
> approach instead, or the rlimit change to be in bpftool rather than in
> libbpf.
> 
> Signed-off-by: Quentin Monnet 
> ---
>  tools/lib/bpf/bpf.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 03f9bcc4ef50..456a5a7b112c 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -26,6 +26,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "bpf.h"
>  #include "libbpf.h"
>  #include 
> @@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr 
> *attr,
>  int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>  {
>   __u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
> + struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>   union bpf_attr attr;
>  
> + setrlimit(RLIMIT_MEMLOCK, );
> +
>   memset(, '\0', sizeof(attr));
>  
>   attr.map_type = create_attr->map_type;
> 



[RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps

2018-10-30 Thread Quentin Monnet
The limit for memory locked in the kernel by a process is usually set to
64 bytes by default. This can be an issue when creating large BPF maps.
A workaround is to raise this limit for the current process before
trying to create a new BPF map. Changing the hard limit requires the
CAP_SYS_RESOURCE and can usually only be done by root user (but then
only root can create BPF maps).

As far as I know there is not API to get the current amount of memory
locked for a user, therefore we cannot raise the limit only when
required. One solution, used by bcc, is to try to create the map, and on
getting a EPERM error, raising the limit to infinity before giving
another try. Another approach, used in iproute, is to raise the limit in
all cases, before trying to create the map.

Here we do the same as in iproute2: the rlimit is raised to infinity
before trying to load the map.

I send this patch as a RFC to see if people would prefer the bcc
approach instead, or the rlimit change to be in bpftool rather than in
libbpf.

Signed-off-by: Quentin Monnet 
---
 tools/lib/bpf/bpf.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 03f9bcc4ef50..456a5a7b112c 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "bpf.h"
 #include "libbpf.h"
 #include 
@@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr 
*attr,
 int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
 {
__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
+   struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
union bpf_attr attr;
 
+   setrlimit(RLIMIT_MEMLOCK, );
+
memset(, '\0', sizeof(attr));
 
attr.map_type = create_attr->map_type;
-- 
2.7.4



Re: [PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh

2018-10-22 Thread Quentin Monnet
2018-10-21 23:04 UTC+0200 ~ Jesper Dangaard Brouer 
> On Sun, 21 Oct 2018 16:37:08 +0100
> Quentin Monnet  wrote:
> 
>> 2018-10-21 11:57 UTC+0200 ~ Jesper Dangaard Brouer 
>>> On Sat, 20 Oct 2018 23:00:24 +0100
>>> Quentin Monnet  wrote:
>>>   
>>
>> [...]
>>
>>>> --- a/tools/testing/selftests/bpf/test_libbpf.sh
>>>> +++ b/tools/testing/selftests/bpf/test_libbpf.sh
>>>> @@ -33,17 +33,11 @@ trap exit_handler 0 2 3 6 9
>>>>   
>>>>   libbpf_open_file test_l4lb.o
>>>>   
>>>> -# TODO: fix libbpf to load noinline functions
>>>> -# [warning] libbpf: incorrect bpf_call opcode
>>>> -#libbpf_open_file test_l4lb_noinline.o
>>>> +libbpf_open_file test_l4lb_noinline.o
>>>>   
>>>> -# TODO: fix test_xdp_meta.c to load with libbpf
>>>> -# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
>>>> -#libbpf_open_file test_xdp_meta.o
>>>> +libbpf_open_file test_xdp_meta.o
>>>>   
>>>> -# TODO: fix libbpf to handle .eh_frame
>>>> -# [warning] libbpf: relocation failed: no section(10)
>>>> -#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
>>>> +libbpf_open_file ../../../../samples/bpf/tracex3_kern.o  
>>>
>>> I don't like the ../../../../samples/bpf/ reference (even-through I
>>> added this TODO), as the kselftests AFAIK support installing the
>>> selftests and then this tests will fail.
>>> Maybe we can find another example kern.o file?
>>> (which isn't compiled with -target bpf)  
>>
>> Hi Jesper, yeah maybe making the test rely on something from samples/bpf
>> instead of just the selftests/bpf directory is not a good idea. But
>> there is no program compiled without the "-target-bpf" in that
>> directory. What we could do is explicitly compile one without the flag
>> in the Makefile, as in the patch below, but I am not sure that doing so
>> is acceptable?
> 
> I think it makes sense to have a test program compiled without the
> "-target-bpf", as that will happen for users.  And I guess we can add
> some more specific test that are related to "-target-bpf".

Alright, I can repost my second version that takes a test out of the
default target for building BPF programs, after the merge window.

>> Or should tests for libbpf have a directory of their own,
>> with another Makefile?
> 
> Hmm, I'm not sure about that idea.
> 
> I did plan by naming the test "libbpf_open_file", what we add more
> libbpf_ prefixed tests to the test_libbpf.sh script, which should
> cover more aspects of the _base_ libbpf functionality.
> 
>> Another question regarding the test with test_xdp_meta.o: does the fix I
>> suggested (setting a version in the .C file) makes sense, or did you
>> leave this test for testing someday that libbpf would be able to open
>> even programs that do not set a version (in which case this is still not
>> the case if program type is not provided, and in fact my fix ruins
>> everything? :s).
> 
> Well, yes.  I was hinting if we should relax the version requirement
> for e.g. XDP BPF progs.

This is already the case. What happens for this test is that we never
tell libbpf that this program is XDP, we just ask it to open the ELF
file and the whole time libbpf treats it as a program of type
BPF_PROG_TYPE_UNSPEC. So we can fix the BPF source (by adding a version)
or we can fix test_libbpf_open.c (to tell libbpf this is XDP), but I
don't believe there is anything to add to libbpf in that regard. I think
we could simply remove the test on test_xdp_meta.o from test_libbpf.h,
actually. What is you opinion?

Thanks,
Quentin


Re: [PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh

2018-10-21 Thread Quentin Monnet
2018-10-21 11:57 UTC+0200 ~ Jesper Dangaard Brouer 
> On Sat, 20 Oct 2018 23:00:24 +0100
> Quentin Monnet  wrote:
> 

[...]

>> --- a/tools/testing/selftests/bpf/test_libbpf.sh
>> +++ b/tools/testing/selftests/bpf/test_libbpf.sh
>> @@ -33,17 +33,11 @@ trap exit_handler 0 2 3 6 9
>>   
>>   libbpf_open_file test_l4lb.o
>>   
>> -# TODO: fix libbpf to load noinline functions
>> -# [warning] libbpf: incorrect bpf_call opcode
>> -#libbpf_open_file test_l4lb_noinline.o
>> +libbpf_open_file test_l4lb_noinline.o
>>   
>> -# TODO: fix test_xdp_meta.c to load with libbpf
>> -# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
>> -#libbpf_open_file test_xdp_meta.o
>> +libbpf_open_file test_xdp_meta.o
>>   
>> -# TODO: fix libbpf to handle .eh_frame
>> -# [warning] libbpf: relocation failed: no section(10)
>> -#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
>> +libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
> 
> I don't like the ../../../../samples/bpf/ reference (even-through I
> added this TODO), as the kselftests AFAIK support installing the
> selftests and then this tests will fail.
> Maybe we can find another example kern.o file?
> (which isn't compiled with -target bpf)

Hi Jesper, yeah maybe making the test rely on something from samples/bpf
instead of just the selftests/bpf directory is not a good idea. But
there is no program compiled without the "-target-bpf" in that
directory. What we could do is explicitly compile one without the flag
in the Makefile, as in the patch below, but I am not sure that doing so
is acceptable? Or should tests for libbpf have a directory of their own,
with another Makefile?

Another question regarding the test with test_xdp_meta.o: does the fix I
suggested (setting a version in the .C file) makes sense, or did you
leave this test for testing someday that libbpf would be able to open
even programs that do not set a version (in which case this is still not
the case if program type is not provided, and in fact my fix ruins
everything? :s).

Thanks,
Quentin

---
 tools/testing/selftests/bpf/Makefile| 10 ++
 tools/testing/selftests/bpf/test_libbpf.sh  | 14 +-
 tools/testing/selftests/bpf/test_xdp_meta.c |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index e39dfb4e7970..ecd79b7fb107 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -135,6 +135,16 @@ endif
 endif
 endif
 
+# Have one program compiled without "-target bpf" to test whether libbpf loads
+# it successfully
+$(OUTPUT)/test_xdp.o: test_xdp.c
+   $(CLANG) $(CLANG_FLAGS) \
+   -O2 -emit-llvm -c $< -o - | \
+   $(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+   $(BTF_PAHOLE) -J $@
+endif
+
 $(OUTPUT)/%.o: %.c
$(CLANG) $(CLANG_FLAGS) \
 -O2 -target bpf -emit-llvm -c $< -o - |  \
diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
index 156d89f1edcc..b45962a44243 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -33,17 +33,13 @@ trap exit_handler 0 2 3 6 9
 
 libbpf_open_file test_l4lb.o
 
-# TODO: fix libbpf to load noinline functions
-# [warning] libbpf: incorrect bpf_call opcode
-#libbpf_open_file test_l4lb_noinline.o
+# Load a program with BPF-to-BPF calls
+libbpf_open_file test_l4lb_noinline.o
 
-# TODO: fix test_xdp_meta.c to load with libbpf
-# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
-#libbpf_open_file test_xdp_meta.o
+libbpf_open_file test_xdp_meta.o
 
-# TODO: fix libbpf to handle .eh_frame
-# [warning] libbpf: relocation failed: no section(10)
-#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+# Load a program compiled without the "-target bpf" flag
+libbpf_open_file test_xdp.o
 
 # Success
 exit 0
diff --git a/tools/testing/selftests/bpf/test_xdp_meta.c 
b/tools/testing/selftests/bpf/test_xdp_meta.c
index 8d0182650653..2f42de66e2bb 100644
--- a/tools/testing/selftests/bpf/test_xdp_meta.c
+++ b/tools/testing/selftests/bpf/test_xdp_meta.c
@@ -8,6 +8,8 @@
 #define round_up(x, y) x) - 1) | __round_mask(x, y)) + 1)
 #define ctx_ptr(ctx, mem) (void *)(unsigned long)ctx->mem
 
+int _version SEC("version") = 1;
+
 SEC("t")
 int ing_cls(struct __sk_buff *ctx)
 {
-- 
2.7.4



[PATCH bpf-next 3/3] tools: bpftool: fix completion for "bpftool map update"

2018-10-20 Thread Quentin Monnet
When trying to complete "bpftool map update" commands, the call to
printf would print an error message that would show on the command line
if no map is found to complete the command line.

Fix it by making sure we have map ids to complete the line with, before
we try to print something.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/bash-completion/bpftool | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool 
b/tools/bpf/bpftool/bash-completion/bpftool
index c56545e87b0d..3f78e6404589 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -143,7 +143,7 @@ _bpftool_map_update_map_type()
 local type
 type=$(bpftool -jp map show $keyword $ref | \
 command sed -n 's/.*"type": "\(.*\)",$/\1/p')
-printf $type
+[[ -n $type ]] && printf $type
 }
 
 _bpftool_map_update_get_id()
-- 
2.7.4



[PATCH bpf-next 2/3] tools: bpftool: print nb of cmds to stdout (not stderr) for batch mode

2018-10-20 Thread Quentin Monnet
When batch mode is used and all commands succeeds, bpftool prints the
number of commands processed to stderr. There is no particular reason to
use stderr for this, we could as well use stdout. It would avoid getting
unnecessary output on stderr if the standard ouptut is redirected, for
example.

Reported-by: David Beckett 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 828dde30e9ec..75a3296dc0bc 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -321,7 +321,8 @@ static int do_batch(int argc, char **argv)
p_err("reading batch file failed: %s", strerror(errno));
err = -1;
} else {
-   p_info("processed %d commands", lines);
+   if (!json_output)
+   printf("processed %d commands\n", lines);
err = 0;
}
 err_close:
-- 
2.7.4



[PATCH bpf-next 1/3] tools: bpftool: document restriction on '.' in names to pin in bpffs

2018-10-20 Thread Quentin Monnet
Names used to pin eBPF programs and maps under the eBPF virtual file
system cannot contain a dot character, which is reserved for future
extensions of this file system.

Document this in bpftool man pages to avoid users getting confused if
pinning fails because of a dot.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-map.rst  | 4 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 3497f2d80328..f55a2daed59b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -86,7 +86,9 @@ DESCRIPTION
**bpftool map pin** *MAP*  *FILE*
  Pin map *MAP* as *FILE*.
 
- Note: *FILE* must be located in *bpffs* mount.
+ Note: *FILE* must be located in *bpffs* mount. It must not
+ contain a dot character ('.'), which is reserved for future
+ extensions of *bpffs*.
 
**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
  Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 12c803003ab2..ac4e904b10fb 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -75,7 +75,9 @@ DESCRIPTION
**bpftool prog pin** *PROG* *FILE*
  Pin program *PROG* as *FILE*.
 
- Note: *FILE* must be located in *bpffs* mount.
+ Note: *FILE* must be located in *bpffs* mount. It must not
+ contain a dot character ('.'), which is reserved for future
+ extensions of *bpffs*.
 
**bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** 
*IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
  Load bpf program from binary *OBJ* and pin as *FILE*.
@@ -91,7 +93,9 @@ DESCRIPTION
  If **dev** *NAME* is specified program will be loaded onto
  given networking device (offload).
 
- Note: *FILE* must be located in *bpffs* mount.
+ Note: *FILE* must be located in *bpffs* mount. It must not
+ contain a dot character ('.'), which is reserved for future
+ extensions of *bpffs*.
 
 **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
   Attach bpf program *PROG* (with type specified by 
*ATTACH_TYPE*)
-- 
2.7.4



[PATCH bpf-next 0/3] tools: bpftool: bring minor fixes to bpftool

2018-10-20 Thread Quentin Monnet
Hi,
These are three minor fixes for bpftool, its documentation and its bash
completion function. Please refer to individual patches for details.

Quentin Monnet (3):
  tools: bpftool: document restriction on '.' in names to pin in bpffs
  tools: bpftool: print nb of cmds to stdout (not stderr) for batch mode
  tools: bpftool: fix completion for "bpftool map update"

 tools/bpf/bpftool/Documentation/bpftool-map.rst  | 4 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 ++--
 tools/bpf/bpftool/bash-completion/bpftool| 2 +-
 tools/bpf/bpftool/main.c | 3 ++-
 4 files changed, 12 insertions(+), 5 deletions(-)

-- 
2.7.4



[PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh

2018-10-20 Thread Quentin Monnet
libbpf is now able to load successfully test_l4lb_noinline.o and
samples/bpf/tracex3_kern.o, so we can uncomment related tests from
test_libbpf.c and remove the associated "TODO"s.

It is also trivial to fix test_xdp_noinline.o so that it provides a
version and can be loaded. Fix it and uncomment this test as well.

For the record, the error message obtainted with tracex3_kern.o was
fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF
objects containing .eh_frames")

I have not been abled to reproduce the "libbpf: incorrect bpf_call
opcode" error for test_l4lb_noinline.o, even with the version of libbpf
present at the time when test_libbpf.sh and test_libbpf_open.c were
created.

Cc: Jesper Dangaard Brouer 
Signed-off-by: Quentin Monnet 
---
 tools/testing/selftests/bpf/test_libbpf.sh  | 12 +++-
 tools/testing/selftests/bpf/test_xdp_meta.c |  2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
index 156d89f1edcc..a426f28163a5 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -33,17 +33,11 @@ trap exit_handler 0 2 3 6 9
 
 libbpf_open_file test_l4lb.o
 
-# TODO: fix libbpf to load noinline functions
-# [warning] libbpf: incorrect bpf_call opcode
-#libbpf_open_file test_l4lb_noinline.o
+libbpf_open_file test_l4lb_noinline.o
 
-# TODO: fix test_xdp_meta.c to load with libbpf
-# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
-#libbpf_open_file test_xdp_meta.o
+libbpf_open_file test_xdp_meta.o
 
-# TODO: fix libbpf to handle .eh_frame
-# [warning] libbpf: relocation failed: no section(10)
-#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
 
 # Success
 exit 0
diff --git a/tools/testing/selftests/bpf/test_xdp_meta.c 
b/tools/testing/selftests/bpf/test_xdp_meta.c
index 8d0182650653..2f42de66e2bb 100644
--- a/tools/testing/selftests/bpf/test_xdp_meta.c
+++ b/tools/testing/selftests/bpf/test_xdp_meta.c
@@ -8,6 +8,8 @@
 #define round_up(x, y) x) - 1) | __round_mask(x, y)) + 1)
 #define ctx_ptr(ctx, mem) (void *)(unsigned long)ctx->mem
 
+int _version SEC("version") = 1;
+
 SEC("t")
 int ing_cls(struct __sk_buff *ctx)
 {
-- 
2.7.4



[PATCH bpf-next] selftests/bpf: fix return value comparison for tests in test_libbpf.sh

2018-10-20 Thread Quentin Monnet
The return value for each test in test_libbpf.sh is compared with

if (( $? == 0 )) ; then ...

This works well with bash, but not with dash, that /bin/sh is aliased to
on some systems (such as Ubuntu).

Let's replace this comparison by something that works on both shells.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/testing/selftests/bpf/test_libbpf.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
index d97dc914cd49..156d89f1edcc 100755
--- a/tools/testing/selftests/bpf/test_libbpf.sh
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -6,7 +6,7 @@ export TESTNAME=test_libbpf
 # Determine selftest success via shell exit code
 exit_handler()
 {
-   if (( $? == 0 )); then
+   if [ $? -eq 0 ]; then
echo "selftests: $TESTNAME [PASS]";
else
echo "$TESTNAME: failed at file $LAST_LOADED" 1>&2
-- 
2.7.4



Re: [PATCH bpf] bpf: fix doc of bpf_skb_adjust_room() in uapi

2018-10-17 Thread Quentin Monnet

2018-10-17 16:24 UTC+0200 ~ Nicolas Dichtel 

len_diff is signed.

Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
CC: Quentin Monnet 
Signed-off-by: Nicolas Dichtel 
---
  include/uapi/linux/bpf.h   | 2 +-
  tools/include/uapi/linux/bpf.h | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 66917a4eba27..c4ffe91d5598 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1430,7 +1430,7 @@ union bpf_attr {
   *Return
   *0 on success, or a negative error in case of failure.
   *
- * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
+ * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 
flags)
   *Description
   *Grow or shrink the room for data in the packet associated to
   **skb* by *len_diff*, and according to the selected *mode*.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 66917a4eba27..c4ffe91d5598 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1430,7 +1430,7 @@ union bpf_attr {
   *Return
   *0 on success, or a negative error in case of failure.
   *
- * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
+ * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 
flags)
   *Description
   *Grow or shrink the room for data in the packet associated to
   **skb* by *len_diff*, and according to the selected *mode*.



Correct, thank you Nicolas! :)

Reviewed-by: Quentin Monnet 


[PATCH bpf-next 11/12] nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls

2018-10-07 Thread Quentin Monnet
Mark instructions that use pointers to areas in the stack outside of the
current stack frame, and process them accordingly in mem_op_stack().
This way, we also support BPF-to-BPF calls where the caller passes a
pointer to data in its own stack frame to the callee (typically, when
the caller passes an address to one of its local variables located in
the stack, as an argument).

Thanks to Jakub and Jiong for figuring out how to deal with this case,
I just had to turn their email discussion into this patch.

Suggested-by: Jiong Wang 
Suggested-by: Jakub Kicinski 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 3 ++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 1 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index b393f9dea584..6ed1b5207ecd 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1178,7 +1178,8 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta,
bool lm3 = true;
int ret;
 
-   if (meta->ptr_not_const) {
+   if (meta->ptr_not_const ||
+   meta->flags & FLAG_INSN_PTR_CALLER_STACK_FRAME) {
/* Use of the last encountered ptr_off is OK, they all have
 * the same alignment.  Depend on low bits of value being
 * discarded when written to LMaddr register.
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 44b787a0bd4b..25e10cfa2678 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -267,6 +267,7 @@ struct nfp_bpf_reg_state {
 
 #define FLAG_INSN_IS_JUMP_DST  BIT(0)
 #define FLAG_INSN_IS_SUBPROG_START BIT(1)
+#define FLAG_INSN_PTR_CALLER_STACK_FRAME   BIT(2)
 
 /**
  * struct nfp_insn_meta - BPF instruction wrapper
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index f31721bd1fac..cddb70786a58 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -336,6 +336,9 @@ nfp_bpf_check_stack_access(struct nfp_prog *nfp_prog,
 {
s32 old_off, new_off;
 
+   if (reg->frameno != env->cur_state->curframe)
+   meta->flags |= FLAG_INSN_PTR_CALLER_STACK_FRAME;
+
if (!tnum_is_const(reg->var_off)) {
pr_vlog(env, "variable ptr stack access\n");
return -EINVAL;
-- 
2.7.4



[PATCH bpf-next 09/12] nfp: bpf: fix return address from register-saving subroutine to callee

2018-10-07 Thread Quentin Monnet
On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.

This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index e8b03d8f54f7..74423d3e714d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3340,10 +3340,25 @@ static const instr_cb_t instr_cb[256] = {
 };
 
 /* --- Assembler logic --- */
+static int
+nfp_fixup_immed_relo(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+struct nfp_insn_meta *jmp_dst, u32 br_idx)
+{
+   if (immed_get_value(nfp_prog->prog[br_idx + 1])) {
+   pr_err("BUG: failed to fix up callee register saving\n");
+   return -EINVAL;
+   }
+
+   immed_set_value(_prog->prog[br_idx + 1], jmp_dst->off);
+
+   return 0;
+}
+
 static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
 {
struct nfp_insn_meta *meta, *jmp_dst;
u32 idx, br_idx;
+   int err;
 
list_for_each_entry(meta, _prog->insns, l) {
if (meta->skip)
@@ -3380,7 +3395,7 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
 
/* Leave special branches for later */
if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
-   RELO_BR_REL)
+   RELO_BR_REL && !is_mbpf_pseudo_call(meta))
continue;
 
if (!meta->jmp_dst) {
@@ -3395,6 +3410,17 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
return -ELOOP;
}
 
+   if (is_mbpf_pseudo_call(meta)) {
+   err = nfp_fixup_immed_relo(nfp_prog, meta,
+  jmp_dst, br_idx);
+   if (err)
+   return err;
+   }
+
+   if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
+   RELO_BR_REL)
+   continue;
+
for (idx = meta->off; idx <= br_idx; idx++) {
if (!nfp_is_br(nfp_prog->prog[idx]))
continue;
-- 
2.7.4



[PATCH bpf-next 10/12] nfp: bpf: optimise save/restore for R6~R9 based on register usage

2018-10-07 Thread Quentin Monnet
When pre-processing the instructions, it is trivial to detect what
subprograms are using R6, R7, R8 or R9 as destination registers. If a
subprogram uses none of those, then we do not need to jump to the
subroutines dedicated to saving and restoring callee-saved registers in
its prologue and epilogue.

This patch introduces detection of callee-saved registers in subprograms
and prevents the JIT from adding calls to those subroutines whenever we
can: we save some instructions in the translated program, and some time
at runtime on BPF-to-BPF calls and returns.

If no subprogram needs to save those registers, we can avoid appending
the subroutines at the end of the program.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 85 ++-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  2 +
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 14 ++--
 3 files changed, 78 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 74423d3e714d..b393f9dea584 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3132,7 +3132,9 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
NFP_CSR_ACT_LM_ADDR0);
}
 
-   /* The following steps are performed:
+   /* Two cases for jumping to the callee:
+*
+* - If callee uses and needs to save R6~R9 then:
 * 1. Put the start offset of the callee into imm_b(). This will
 *require a fixup step, as we do not necessarily know this
 *address yet.
@@ -3140,8 +3142,12 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
 *register ret_reg().
 * 3. (After defer slots are consumed) Jump to the subroutine that
 *pushes the registers to the stack.
-* The subroutine acts as a trampoline, and returns to the address in
-* imm_b(), i.e. jumps to the callee.
+*   The subroutine acts as a trampoline, and returns to the address in
+*   imm_b(), i.e. jumps to the callee.
+*
+* - If callee does not need to save R6~R9 then just load return
+*   address to the caller in ret_reg(), and jump to the callee
+*   directly.
 *
 * Using ret_reg() to pass the return address to the callee is set here
 * as a convention. The callee can then push this address onto its
@@ -3157,11 +3163,21 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
 *   execution of the callee, we will not have to push the return
 *   address to the stack for leaf functions.
 */
-   ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
-   emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2,
-RELO_BR_GO_CALL_PUSH_REGS);
-   offset_br = nfp_prog_current_offset(nfp_prog);
-   wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL);
+   if (!meta->jmp_dst) {
+   pr_err("BUG: BPF-to-BPF call has no destination recorded\n");
+   return -ELOOP;
+   }
+   if (nfp_prog->subprog[meta->jmp_dst->subprog_idx].needs_reg_push) {
+   ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
+   emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2,
+RELO_BR_GO_CALL_PUSH_REGS);
+   offset_br = nfp_prog_current_offset(nfp_prog);
+   wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL);
+   } else {
+   ret_tgt = nfp_prog_current_offset(nfp_prog) + 2;
+   emit_br(nfp_prog, BR_UNC, meta->n + 1 + meta->insn.imm, 1);
+   offset_br = nfp_prog_current_offset(nfp_prog);
+   }
wrp_immed_relo(nfp_prog, ret_reg(nfp_prog), ret_tgt, RELO_IMMED_REL);
 
if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
@@ -3227,15 +3243,24 @@ static int goto_out(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
 static int
 nfp_subprog_epilogue(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-   /* Pop R6~R9 to the stack via related subroutine.
-* Pop return address for BPF-to-BPF call from the stack and load it
-* into ret_reg() before we jump. This means that the subroutine does
-* not come back here, we make it jump back to the subprogram caller
-* directly!
-*/
-   emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 1,
-RELO_BR_GO_CALL_POP_REGS);
-   wrp_mov(nfp_prog, ret_reg(nfp_prog), reg_lm(0, 0));
+   if (nfp_prog->subprog[meta->subprog_idx].needs_reg_push) {
+   /* Pop R6~R9 to the stack via related subroutine.
+* We loaded the return address to the caller into ret_reg().
+  

[PATCH bpf-next 05/12] nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT

2018-10-07 Thread Quentin Monnet
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.

We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 35 +++
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  3 ++-
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 98a94ca36bfa..ccb80a5ac828 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -4018,20 +4018,35 @@ void nfp_bpf_jit_prepare(struct nfp_prog *nfp_prog, 
unsigned int cnt)
 
/* Another pass to record jump information. */
list_for_each_entry(meta, _prog->insns, l) {
+   struct nfp_insn_meta *dst_meta;
u64 code = meta->insn.code;
+   unsigned int dst_idx;
+   bool pseudo_call;
 
-   if (BPF_CLASS(code) == BPF_JMP && BPF_OP(code) != BPF_EXIT &&
-   BPF_OP(code) != BPF_CALL) {
-   struct nfp_insn_meta *dst_meta;
-   unsigned short dst_indx;
+   if (BPF_CLASS(code) != BPF_JMP)
+   continue;
+   if (BPF_OP(code) == BPF_EXIT)
+   continue;
+   if (is_mbpf_helper_call(meta))
+   continue;
 
-   dst_indx = meta->n + 1 + meta->insn.off;
-   dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_indx,
-cnt);
+   /* If opcode is BPF_CALL at this point, this can only be a
+* BPF-to-BPF call (a.k.a pseudo call).
+*/
+   pseudo_call = BPF_OP(code) == BPF_CALL;
 
-   meta->jmp_dst = dst_meta;
-   dst_meta->flags |= FLAG_INSN_IS_JUMP_DST;
-   }
+   if (pseudo_call)
+   dst_idx = meta->n + 1 + meta->insn.imm;
+   else
+   dst_idx = meta->n + 1 + meta->insn.off;
+
+   dst_meta = nfp_bpf_goto_meta(nfp_prog, meta, dst_idx, cnt);
+
+   if (pseudo_call)
+   dst_meta->flags |= FLAG_INSN_IS_SUBPROG_START;
+
+   dst_meta->flags |= FLAG_INSN_IS_JUMP_DST;
+   meta->jmp_dst = dst_meta;
}
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 853a5346378c..20a98ce4b345 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -262,7 +262,8 @@ struct nfp_bpf_reg_state {
bool var_off;
 };
 
-#define FLAG_INSN_IS_JUMP_DST  BIT(0)
+#define FLAG_INSN_IS_JUMP_DST  BIT(0)
+#define FLAG_INSN_IS_SUBPROG_START BIT(1)
 
 /**
  * struct nfp_insn_meta - BPF instruction wrapper
-- 
2.7.4



[PATCH bpf-next 12/12] bpf: allow offload of programs with BPF-to-BPF function calls

2018-10-07 Thread Quentin Monnet
Now that there is at least one driver supporting BPF-to-BPF function
calls, lift the restriction, in the verifier, on hardware offload of
eBPF programs containing such calls. But prevent jit_subprogs(), still
in the verifier, from being run for offloaded programs.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 kernel/bpf/verifier.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a0454cb299ba..73cc136915fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1009,10 +1009,6 @@ static int check_subprogs(struct bpf_verifier_env *env)
verbose(env, "function calls to other bpf functions are 
allowed for root only\n");
return -EPERM;
}
-   if (bpf_prog_is_dev_bound(env->prog->aux)) {
-   verbose(env, "function calls in offloaded programs are 
not supported yet\n");
-   return -EINVAL;
-   }
ret = add_subprog(env, i + insn[i].imm + 1);
if (ret < 0)
return ret;
@@ -5968,10 +5964,10 @@ static int fixup_call_args(struct bpf_verifier_env *env)
struct bpf_insn *insn = prog->insnsi;
int i, depth;
 #endif
-   int err;
+   int err = 0;
 
-   err = 0;
-   if (env->prog->jit_requested) {
+   if (env->prog->jit_requested &&
+   !bpf_prog_is_dev_bound(env->prog->aux)) {
err = jit_subprogs(env);
if (err == 0)
return 0;
-- 
2.7.4



[PATCH bpf-next 04/12] nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier

2018-10-07 Thread Quentin Monnet
The checks related to eBPF helper calls are performed each time the nfp
driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks
are not relevant for BPF-to-BPF call (same instruction code, different
value in source register), so just skip the checks for such calls.

While at it, rename the function that runs those checks to make it clear
they apply to _helper_ calls only.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 8 
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 9 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 7f6e850e42da..853a5346378c 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -423,6 +423,14 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta 
*meta)
return is_mbpf_alu(meta) && mbpf_op(meta) == BPF_DIV;
 }
 
+static inline bool is_mbpf_helper_call(const struct nfp_insn_meta *meta)
+{
+   struct bpf_insn insn = meta->insn;
+
+   return insn.code == (BPF_JMP | BPF_CALL) &&
+   insn.src_reg != BPF_PSEUDO_CALL;
+}
+
 /**
  * struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
  * @stack_depth:   maximum stack depth used by this sub-program
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 9ef74bc1ec1d..c642c2c07d96 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -155,8 +155,9 @@ nfp_bpf_map_call_ok(const char *fname, struct 
bpf_verifier_env *env,
 }
 
 static int
-nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
-  struct nfp_insn_meta *meta)
+nfp_bpf_check_helper_call(struct nfp_prog *nfp_prog,
+ struct bpf_verifier_env *env,
+ struct nfp_insn_meta *meta)
 {
const struct bpf_reg_state *reg1 = cur_regs(env) + BPF_REG_1;
const struct bpf_reg_state *reg2 = cur_regs(env) + BPF_REG_2;
@@ -620,8 +621,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, 
int prev_insn_idx)
return -EINVAL;
}
 
-   if (meta->insn.code == (BPF_JMP | BPF_CALL))
-   return nfp_bpf_check_call(nfp_prog, env, meta);
+   if (is_mbpf_helper_call(meta))
+   return nfp_bpf_check_helper_call(nfp_prog, env, meta);
if (meta->insn.code == (BPF_JMP | BPF_EXIT))
return nfp_bpf_check_exit(nfp_prog, env);
 
-- 
2.7.4



[PATCH bpf-next 01/12] bpf: add verifier callback to get stack usage info for offloaded progs

2018-10-07 Thread Quentin Monnet
In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.

Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.

This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  8 +++-
 drivers/net/netdevsim/bpf.c   |  8 +++-
 include/linux/bpf.h   |  1 +
 include/linux/bpf_verifier.h  |  1 +
 kernel/bpf/offload.c  | 18 ++
 kernel/bpf/verifier.c |  3 +++
 6 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index a6e9248669e1..e470489021e3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -640,6 +640,12 @@ nfp_verify_insn(struct bpf_verifier_env *env, int 
insn_idx, int prev_insn_idx)
return 0;
 }
 
+static int nfp_bpf_finalize(struct bpf_verifier_env *env)
+{
+   return 0;
+}
+
 const struct bpf_prog_offload_ops nfp_bpf_analyzer_ops = {
-   .insn_hook = nfp_verify_insn,
+   .insn_hook  = nfp_verify_insn,
+   .finalize   = nfp_bpf_finalize,
 };
diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 81444208b216..cb3518474f0e 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -86,8 +86,14 @@ nsim_bpf_verify_insn(struct bpf_verifier_env *env, int 
insn_idx, int prev_insn)
return 0;
 }
 
+static int nsim_bpf_finalize(struct bpf_verifier_env *env)
+{
+   return 0;
+}
+
 static const struct bpf_prog_offload_ops nsim_bpf_analyzer_ops = {
-   .insn_hook = nsim_bpf_verify_insn,
+   .insn_hook  = nsim_bpf_verify_insn,
+   .finalize   = nsim_bpf_finalize,
 };
 
 static bool nsim_xdp_offload_active(struct netdevsim *ns)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 027697b6a22f..9b558713447f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -263,6 +263,7 @@ struct bpf_verifier_ops {
 struct bpf_prog_offload_ops {
int (*insn_hook)(struct bpf_verifier_env *env,
 int insn_idx, int prev_insn_idx);
+   int (*finalize)(struct bpf_verifier_env *env);
 };
 
 struct bpf_prog_offload {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7b6fd2ab3263..9e8056ec20fa 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -245,5 +245,6 @@ static inline struct bpf_reg_state *cur_regs(struct 
bpf_verifier_env *env)
 int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
 int bpf_prog_offload_verify_insn(struct bpf_verifier_env *env,
 int insn_idx, int prev_insn_idx);
+int bpf_prog_offload_finalize(struct bpf_verifier_env *env);
 
 #endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 177a52436394..8e93c47f0779 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -172,6 +172,24 @@ int bpf_prog_offload_verify_insn(struct bpf_verifier_env 
*env,
return ret;
 }
 
+int bpf_prog_offload_finalize(struct bpf_verifier_env *env)
+{
+   struct bpf_prog_offload *offload;
+   int ret = -ENODEV;
+
+   down_read(_devs_lock);
+   offload = env->prog->aux->offload;
+   if (offload) {
+   if (offload->dev_ops->finalize)
+   ret = offload->dev_ops->finalize(env);
+   else
+   ret = 0;
+   }
+   up_read(_devs_lock);
+
+   return ret;
+}
+
 static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
 {
struct bpf_prog_offload *offload = prog->aux->offload;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 73c81bef6ae8..a0454cb299ba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6309,6 +6309,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
env->cur_state = NULL;
}
 
+   if (ret == 0 && bpf_prog_is_dev_bound(env->prog->aux))
+   ret = bpf_prog_offload_finalize(env);
+
 skip_full_check:
while (!pop_stack(env, NULL, NULL));
free_states(env);
-- 
2.7.4



[PATCH bpf-next 00/12] nfp: bpf: add support for BPF-to-BPF function calls

2018-10-07 Thread Quentin Monnet
This patch series adds support for hardware offload of programs containing
BPF-to-BPF function calls. First, a new callback is added to the kernel
verifier, to collect information after the main part of the verification
has been performed. Then support for BPF-to-BPF calls is incrementally
added to the nfp driver, before offloading programs containing such calls
is eventually allowed by lifting the restriction in the kernel verifier, in
the last patch. Please refer to individual patches for details.

Many thanks to Jiong and Jakub for their precious help and contribution on
the main patches for the JIT-compiler, and everything related to stack
accesses.

Quentin Monnet (12):
  bpf: add verifier callback to get stack usage info for offloaded progs
  nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
  nfp: bpf: copy eBPF subprograms information from kernel verifier
  nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier
  nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT
  nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver
  nfp: bpf: account for additional stack usage when checking stack limit
  nfp: bpf: update fixup function for BPF-to-BPF calls support
  nfp: bpf: fix return address from register-saving subroutine to callee
  nfp: bpf: optimise save/restore for R6~R9 based on register usage
  nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls
  bpf: allow offload of programs with BPF-to-BPF function calls

 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 381 --
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  52 ++-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  11 +-
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 141 +++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |   9 +
 drivers/net/netdevsim/bpf.c   |   8 +-
 include/linux/bpf.h   |   1 +
 include/linux/bpf_verifier.h  |   1 +
 kernel/bpf/offload.c  |  18 +
 kernel/bpf/verifier.c |  13 +-
 10 files changed, 589 insertions(+), 46 deletions(-)

-- 
2.7.4



[PATCH bpf-next 08/12] nfp: bpf: update fixup function for BPF-to-BPF calls support

2018-10-07 Thread Quentin Monnet
Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.

When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 25 ++---
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  2 ++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 2d2c9148bd44..e8b03d8f54f7 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3116,7 +3116,7 @@ static int jne_reg(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
 static int
 bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
-   u32 ret_tgt, stack_depth;
+   u32 ret_tgt, stack_depth, offset_br;
swreg tmp_reg;
 
stack_depth = round_up(nfp_prog->stack_frame_depth, STACK_FRAME_ALIGN);
@@ -3160,6 +3160,7 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO, 2,
 RELO_BR_GO_CALL_PUSH_REGS);
+   offset_br = nfp_prog_current_offset(nfp_prog);
wrp_immed_relo(nfp_prog, imm_b(nfp_prog), 0, RELO_IMMED_REL);
wrp_immed_relo(nfp_prog, ret_reg(nfp_prog), ret_tgt, RELO_IMMED_REL);
 
@@ -3176,6 +3177,9 @@ bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
wrp_nops(nfp_prog, 3);
}
 
+   meta->num_insns_after_br = nfp_prog_current_offset(nfp_prog);
+   meta->num_insns_after_br -= offset_br;
+
return 0;
 }
 
@@ -3344,21 +3348,36 @@ static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
list_for_each_entry(meta, _prog->insns, l) {
if (meta->skip)
continue;
-   if (meta->insn.code == (BPF_JMP | BPF_CALL))
-   continue;
if (BPF_CLASS(meta->insn.code) != BPF_JMP)
continue;
+   if (meta->insn.code == (BPF_JMP | BPF_EXIT) &&
+   !nfp_is_main_function(meta))
+   continue;
+   if (is_mbpf_helper_call(meta))
+   continue;
 
if (list_is_last(>l, _prog->insns))
br_idx = nfp_prog->last_bpf_off;
else
br_idx = list_next_entry(meta, l)->off - 1;
 
+   /* For BPF-to-BPF function call, a stack adjustment sequence is
+* generated after the return instruction. Therefore, we must
+* withdraw the length of this sequence to have br_idx pointing
+* to where the "branch" NFP instruction is expected to be.
+*/
+   if (is_mbpf_pseudo_call(meta))
+   br_idx -= meta->num_insns_after_br;
+
if (!nfp_is_br(nfp_prog->prog[br_idx])) {
pr_err("Fixup found block not ending in branch %d %02x 
%016llx!!\n",
   br_idx, meta->insn.code, nfp_prog->prog[br_idx]);
return -ELOOP;
}
+
+   if (meta->insn.code == (BPF_JMP | BPF_EXIT))
+   continue;
+
/* Leave special branches for later */
if (FIELD_GET(OP_RELO_TYPE, nfp_prog->prog[br_idx]) !=
RELO_BR_REL)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index d9695bc316dd..1cef5136c198 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -283,6 +283,7 @@ struct nfp_bpf_reg_state {
  * @xadd_maybe_16bit: 16bit immediate is possible
  * @jmp_dst: destination info for jump instructions
  * @jump_neg_op: jump instruction has inverted immediate, use ADD instead of 
SUB
+ * @num_insns_after_br: number of insns following a branch jump, used for fixup
  * @func_id: function id for call instructions
  * @arg1: arg1 for call instructions
  * @arg2: arg2 for call instructions
@@ -319,6 +320,7 @@ struct nfp_insn_meta {
struct {
struct nfp_insn_meta *jmp_dst;
bool jump_neg_op;
+   u32 num_insns_after_br; /* only for BPF-to-BPF calls */
};
/* function calls */
struct {
-- 
2.7.4



[PATCH bpf-next 03/12] nfp: bpf: copy eBPF subprograms information from kernel verifier

2018-10-07 Thread Quentin Monnet
In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.

Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/main.h | 12 
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  2 ++
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 15 +++
 3 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 7050535383b8..7f6e850e42da 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -424,6 +424,14 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta 
*meta)
 }
 
 /**
+ * struct nfp_bpf_subprog_info - nfp BPF sub-program (a.k.a. function) info
+ * @stack_depth:   maximum stack depth used by this sub-program
+ */
+struct nfp_bpf_subprog_info {
+   u16 stack_depth;
+};
+
+/**
  * struct nfp_prog - nfp BPF program
  * @bpf: backpointer to the bpf app priv structure
  * @prog: machine code
@@ -439,7 +447,9 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta 
*meta)
  * @stack_frame_depth: max stack depth for current frame
  * @adjust_head_location: if program has single adjust head call - the insn no.
  * @map_records_cnt: the number of map pointers recorded for this prog
+ * @subprog_cnt: number of sub-programs, including main function
  * @map_records: the map record pointers from bpf->maps_neutral
+ * @subprog: pointer to an array of objects holding info about sub-programs
  * @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
  */
 struct nfp_prog {
@@ -464,7 +474,9 @@ struct nfp_prog {
unsigned int adjust_head_location;
 
unsigned int map_records_cnt;
+   unsigned int subprog_cnt;
struct nfp_bpf_neutral_map **map_records;
+   struct nfp_bpf_subprog_info *subprog;
 
struct list_head insns;
 };
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index c9519bb00f8a..b683b03efd22 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -208,6 +208,8 @@ static void nfp_prog_free(struct nfp_prog *nfp_prog)
 {
struct nfp_insn_meta *meta, *tmp;
 
+   kfree(nfp_prog->subprog);
+
list_for_each_entry_safe(meta, tmp, _prog->insns, l) {
list_del(>l);
kfree(meta);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index e470489021e3..9ef74bc1ec1d 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -642,6 +642,21 @@ nfp_verify_insn(struct bpf_verifier_env *env, int 
insn_idx, int prev_insn_idx)
 
 static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
+   struct bpf_subprog_info *info;
+   struct nfp_prog *nfp_prog;
+   int i;
+
+   nfp_prog = env->prog->aux->offload->dev_priv;
+   nfp_prog->subprog_cnt = env->subprog_cnt;
+   nfp_prog->subprog = kcalloc(nfp_prog->subprog_cnt,
+   sizeof(nfp_prog->subprog[0]), GFP_KERNEL);
+   if (!nfp_prog->subprog)
+   return -ENOMEM;
+
+   info = env->subprog_info;
+   for (i = 0; i < nfp_prog->subprog_cnt; i++)
+   nfp_prog->subprog[i].stack_depth = info[i].stack_depth;
+
return 0;
 }
 
-- 
2.7.4



[PATCH bpf-next 02/12] nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth

2018-10-07 Thread Quentin Monnet
In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c | 10 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h|  4 ++--
 drivers/net/ethernet/netronome/nfp/bpf/offload.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index eff57f7d056a..98a94ca36bfa 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1137,7 +1137,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta,
 unsigned int size, unsigned int ptr_off, u8 gpr, u8 ptr_gpr,
 bool clr_gpr, lmem_step step)
 {
-   s32 off = nfp_prog->stack_depth + meta->insn.off + ptr_off;
+   s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off;
bool first = true, last;
bool needs_inc = false;
swreg stack_off_reg;
@@ -1695,7 +1695,7 @@ map_call_stack_common(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
s64 lm_off;
 
/* We only have to reload LM0 if the key is not at start of stack */
-   lm_off = nfp_prog->stack_depth;
+   lm_off = nfp_prog->stack_frame_depth;
lm_off += meta->arg2.reg.var_off.value + meta->arg2.reg.off;
load_lm_ptr = meta->arg2.var_off || lm_off;
 
@@ -1808,10 +1808,10 @@ static int mov_reg64(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
swreg stack_depth_reg;
 
stack_depth_reg = ur_load_imm_any(nfp_prog,
- nfp_prog->stack_depth,
+ nfp_prog->stack_frame_depth,
  stack_imm(nfp_prog));
-   emit_alu(nfp_prog, reg_both(dst),
-stack_reg(nfp_prog), ALU_OP_ADD, stack_depth_reg);
+   emit_alu(nfp_prog, reg_both(dst), stack_reg(nfp_prog),
+ALU_OP_ADD, stack_depth_reg);
wrp_immed(nfp_prog, reg_both(dst + 1), 0);
} else {
wrp_reg_mov(nfp_prog, dst, src);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 792ebc4081a3..7050535383b8 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -436,7 +436,7 @@ static inline bool is_mbpf_div(const struct nfp_insn_meta 
*meta)
  * @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
  * @n_translated: number of successfully translated instructions (for errors)
  * @error: error code if something went wrong
- * @stack_depth: max stack depth from the verifier
+ * @stack_frame_depth: max stack depth for current frame
  * @adjust_head_location: if program has single adjust head call - the insn no.
  * @map_records_cnt: the number of map pointers recorded for this prog
  * @map_records: the map record pointers from bpf->maps_neutral
@@ -460,7 +460,7 @@ struct nfp_prog {
unsigned int n_translated;
int error;
 
-   unsigned int stack_depth;
+   unsigned int stack_frame_depth;
unsigned int adjust_head_location;
 
unsigned int map_records_cnt;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 1ccd6371a15b..c9519bb00f8a 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -260,7 +260,7 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct 
bpf_prog *prog)
prog->aux->stack_depth, stack_size);
return -EOPNOTSUPP;
}
-   nfp_prog->stack_depth = round_up(prog->aux->stack_depth, 4);
+   nfp_prog->stack_frame_depth = round_up(prog->aux->stack_depth, 4);
 
max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);
-- 
2.7.4



[PATCH bpf-next 06/12] nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver

2018-10-07 Thread Quentin Monnet
This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.

The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.

For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).

Return to the caller is performed with the RTN instruction newly added
to the JIT.

For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.

Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jiong Wang 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 235 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  20 ++
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |   1 -
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |  34 +++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  |   9 +
 5 files changed, 295 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index ccb80a5ac828..2d2c9148bd44 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -267,6 +267,38 @@ emit_br_bset(struct nfp_prog *nfp_prog, swreg src, u8 bit, 
u16 addr, u8 defer)
 }
 
 static void
+__emit_br_alu(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
+ u8 defer, bool dst_lmextn, bool src_lmextn)
+{
+   u64 insn;
+
+   insn = OP_BR_ALU_BASE |
+   FIELD_PREP(OP_BR_ALU_A_SRC, areg) |
+   FIELD_PREP(OP_BR_ALU_B_SRC, breg) |
+   FIELD_PREP(OP_BR_ALU_DEFBR, defer) |
+   FIELD_PREP(OP_BR_ALU_IMM_HI, imm_hi) |
+   FIELD_PREP(OP_BR_ALU_SRC_LMEXTN, src_lmextn) |
+   FIELD_PREP(OP_BR_ALU_DST_LMEXTN, dst_lmextn);
+
+   nfp_prog_push(nfp_prog, insn);
+}
+
+static void emit_rtn(struct nfp_prog *nfp_prog, swreg base, u8 defer)
+{
+   struct nfp_insn_ur_regs reg;
+   int err;
+
+   err = swreg_to_unrestricted(reg_none(), base, reg_imm(0), );
+   if (err) {
+   nfp_prog->error = err;
+   return;
+   }
+
+   __emit_br_alu(nfp_prog, reg.areg, reg.breg, 0, defer, reg.dst_lmextn,
+ reg.src_lmextn);
+}
+
+static void
 __emit_immed(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
 enum immed_width width, bool invert,
 enum immed_shift shift, bool wr_both,
@@ -3081,7 +3113,73 @@ static int jne_reg(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta)
return wrp_test_reg(nfp_prog, meta, ALU_OP_XOR, BR_BNE);
 }
 
-static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+static int
+bpf_to_bpf_call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+   u32 ret_tgt, stack_depth;
+   swreg tmp_reg;
+
+   stack_depth = round_up(nfp_prog->stack_frame_depth, STACK_FRAME_ALIGN);
+   /* Space for saving the return address is accounted for by the callee,
+* so stack_depth can be zero for the main function.
+*/
+   if (stack_depth) {
+   tmp_reg = ur_load_imm_any(nfp_prog, stack_depth,
+ stack_imm(nfp_prog));
+   emit_alu(nfp_prog, stack_reg(nfp_prog),
+stack_reg(nfp_prog), ALU_OP_ADD, tmp_reg);
+   emit_csr_wr(nfp_prog, stack_reg(nfp_prog),
+   NFP_CSR_ACT_LM_ADDR0);
+   }
+
+   /* The following steps are performed:
+* 1. Put the start offset of the callee into imm_b(). This will
+*require a fixup step, as we do not necessarily know this
+*address yet.
+* 2. Put the return address from the callee to the caller into
+*register ret_reg().
+* 3. (After defer slots are consumed) Jump to the subroutine that
+*pushes the registers to the stack.
+* The subroutine acts as a trampoline, and returns to the address in
+* imm_b(), i.e. jumps to the callee.
+*
+* Using ret_reg() to pass the return addre

[PATCH bpf-next 07/12] nfp: bpf: account for additional stack usage when checking stack limit

2018-10-07 Thread Quentin Monnet
Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.

In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/bpf/offload.c  |  8 ---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 68 +++
 2 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 2ebd13b29c97..49c7bead8113 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -252,17 +252,9 @@ nfp_bpf_verifier_prep(struct nfp_app *app, struct nfp_net 
*nn,
 static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
 {
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
-   unsigned int stack_size;
unsigned int max_instr;
int err;
 
-   stack_size = nn_readb(nn, NFP_NET_CFG_BPF_STACK_SZ) * 64;
-   if (prog->aux->stack_depth > stack_size) {
-   nn_info(nn, "stack too large: program %dB > FW stack %dB\n",
-   prog->aux->stack_depth, stack_size);
-   return -EOPNOTSUPP;
-   }
-
max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
nfp_prog->__prog_alloc_len = max_instr * sizeof(u64);
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index cc1b2c601f4e..81a463726d55 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -34,10 +34,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../nfp_app.h"
 #include "../nfp_main.h"
+#include "../nfp_net.h"
 #include "fw.h"
 #include "main.h"
 
@@ -662,10 +664,67 @@ nfp_assign_subprog_idx(struct bpf_verifier_env *env, 
struct nfp_prog *nfp_prog)
return 0;
 }
 
+static unsigned int
+nfp_bpf_get_stack_usage(struct nfp_prog *nfp_prog, unsigned int cnt)
+{
+   struct nfp_insn_meta *meta = nfp_prog_first_meta(nfp_prog);
+   unsigned int max_depth = 0, depth = 0, frame = 0;
+   struct nfp_insn_meta *ret_insn[MAX_CALL_FRAMES];
+   unsigned short frame_depths[MAX_CALL_FRAMES];
+   unsigned short ret_prog[MAX_CALL_FRAMES];
+   unsigned short idx = meta->subprog_idx;
+
+   /* Inspired from check_max_stack_depth() from kernel verifier.
+* Starting from main subprogram, walk all instructions and recursively
+* walk all callees that given subprogram can call. Since recursion is
+* prevented by the kernel verifier, this algorithm only needs a local
+* stack of MAX_CALL_FRAMES to remember callsites.
+*/
+process_subprog:
+   frame_depths[frame] = nfp_prog->subprog[idx].stack_depth;
+   frame_depths[frame] = round_up(frame_depths[frame], STACK_FRAME_ALIGN);
+   depth += frame_depths[frame];
+   max_depth = max(max_depth, depth);
+
+continue_subprog:
+   for (; meta != nfp_prog_last_meta(nfp_prog) && meta->subprog_idx == idx;
+meta = nfp_meta_next(meta)) {
+   if (!is_mbpf_pseudo_call(meta))
+   continue;
+
+   /* We found a call to a subprogram. Remember instruction to
+* return to and subprog id.
+*/
+   ret_insn[frame] = nfp_meta_next(meta);
+   ret_prog[frame] = idx;
+
+   /* Find the callee and start processing it. */
+   meta = nfp_bpf_goto_meta(nfp_prog, meta,
+meta->n + 1 + meta->insn.imm, cnt);
+   idx = meta->subprog_idx;
+   frame++;
+   goto process_subprog;
+   }
+   /* End of for() loop means the last instruction of the subprog was
+* reached. If we popped all stack frames, return; otherwise, go on
+* processing remaining instructions from the caller.
+*/
+   if (frame == 0)
+   return max_depth;
+
+   depth -= frame_depths[frame];
+   frame--;
+   meta = ret_insn[frame];
+   idx = ret_prog[frame];
+   goto continue_subprog;
+}
+
 static int nfp_bpf_finalize(struct bpf_verifier_env *env)
 {
+   unsigned int stack_size, stack_needed;

Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Quentin Monnet
2018-08-23 20:35 UTC+0300 ~ Sergei Shtylyov

> Hello!
> 
> On 08/23/2018 07:46 PM, Quentin Monnet wrote:
> 
>> When command line parsing fails in the while loop in do_event_pipe()
>> because the number of arguments is incorrect or because the keyword is
>> unknown, an error message is displayed, but bpfool
> 
>bp-who? ;-)
> 
>> remains stucked in
> 
>Stuck.
> 
>> the loop. Make sure we exit the loop upon failure.
>>
>> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
>> Signed-off-by: Quentin Monnet 
>> Reviewed-by: Jakub Kicinski 
> [...]
> 
> MBR, Sergei

Thanks Sergei! The patch has been applied so I cannot fix these, but
I'll make sure to give an additional pass to my future commit logs…

Best,
Quentin


[PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Quentin Monnet
When command line parsing fails in the while loop in do_event_pipe()
because the number of arguments is incorrect or because the keyword is
unknown, an error message is displayed, but bpfool remains stucked in
the loop. Make sure we exit the loop upon failure.

Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/map_perf_ring.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/map_perf_ring.c 
b/tools/bpf/bpftool/map_perf_ring.c
index 1832100d1b27..6d41323be291 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -194,8 +194,10 @@ int do_event_pipe(int argc, char **argv)
}
 
while (argc) {
-   if (argc < 2)
+   if (argc < 2) {
BAD_ARG();
+   goto err_close_map;
+   }
 
if (is_prefix(*argv, "cpu")) {
char *endptr;
@@ -221,6 +223,7 @@ int do_event_pipe(int argc, char **argv)
NEXT_ARG();
} else {
BAD_ARG();
+   goto err_close_map;
}
 
do_all = false;
-- 
2.14.1



[PATCH bpf-next 1/3] bpf: fix documentation for eBPF helpers

2018-07-12 Thread Quentin Monnet
Minor formatting edits for eBPF helpers documentation, including blank
lines removal, fix of item list for return values in bpf_fib_lookup(),
and missing prefix on bpf_skb_load_bytes_relative().

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 include/uapi/linux/bpf.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b7db3261c62d..6bcb287a888d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1826,7 +1826,7 @@ union bpf_attr {
  * A non-negative value equal to or less than *size* on success,
  * or a negative error in case of failure.
  *
- * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
+ * int bpf_skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
  * Description
  * This helper is similar to **bpf_skb_load_bytes**\ () in that
  * it provides an easy way to load *len* bytes from *offset*
@@ -1877,7 +1877,7 @@ union bpf_attr {
  * * < 0 if any input argument is invalid
  * *   0 on success (packet is forwarded, nexthop neighbor exists)
  * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
- * * packet is not forwarded or needs assist from full stack
+ *   packet is not forwarded or needs assist from full stack
  *
  * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
*map, void *key, u64 flags)
  * Description
@@ -2033,7 +2033,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
@@ -2053,7 +2052,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
-- 
2.14.1



[PATCH bpf-next 2/3] tools: bpf: synchronise BPF UAPI header with tools

2018-07-12 Thread Quentin Monnet
Update with latest changes from include/uapi/linux/bpf.h header.

Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/include/uapi/linux/bpf.h | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 59b19b6a40d7..6bcb287a888d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1826,7 +1826,7 @@ union bpf_attr {
  * A non-negative value equal to or less than *size* on success,
  * or a negative error in case of failure.
  *
- * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
+ * int bpf_skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
  * Description
  * This helper is similar to **bpf_skb_load_bytes**\ () in that
  * it provides an easy way to load *len* bytes from *offset*
@@ -1857,7 +1857,8 @@ union bpf_attr {
  * is resolved), the nexthop address is returned in ipv4_dst
  * or ipv6_dst based on family, smac is set to mac address of
  * egress device, dmac is set to nexthop mac address, rt_metric
- * is set to metric from route (IPv4/IPv6 only).
+ * is set to metric from route (IPv4/IPv6 only), and ifindex
+ * is set to the device index of the nexthop from the FIB lookup.
  *
  * *plen* argument is the size of the passed in struct.
  * *flags* argument can be a combination of one or more of the
@@ -1873,9 +1874,10 @@ union bpf_attr {
  * *ctx* is either **struct xdp_md** for XDP programs or
  * **struct sk_buff** tc cls_act programs.
  * Return
- * Egress device index on success, 0 if packet needs to continue
- * up the stack for further processing or a negative error in case
- * of failure.
+ * * < 0 if any input argument is invalid
+ * *   0 on success (packet is forwarded, nexthop neighbor exists)
+ * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
+ *   packet is not forwarded or needs assist from full stack
  *
  * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
*map, void *key, u64 flags)
  * Description
@@ -2031,7 +2033,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
@@ -2051,7 +2052,6 @@ union bpf_attr {
  * This helper is only available is the kernel was compiled with
  * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
  * "**y**".
- *
  * Return
  * 0
  *
@@ -2612,6 +2612,18 @@ struct bpf_raw_tracepoint_args {
 #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
 #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
 
+enum {
+   BPF_FIB_LKUP_RET_SUCCESS,  /* lookup successful */
+   BPF_FIB_LKUP_RET_BLACKHOLE,/* dest is blackholed; can be dropped */
+   BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable; can be dropped */
+   BPF_FIB_LKUP_RET_PROHIBIT, /* dest not allowed; can be dropped */
+   BPF_FIB_LKUP_RET_NOT_FWDED,/* packet is not forwarded */
+   BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
+   BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires encapsulation */
+   BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
+   BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
+};
+
 struct bpf_fib_lookup {
/* input:  network family for lookup (AF_INET, AF_INET6)
 * output: network family of egress nexthop
@@ -2625,7 +2637,11 @@ struct bpf_fib_lookup {
 
/* total length of packet from network header - used for MTU check */
__u16   tot_len;
-   __u32   ifindex;  /* L3 device index for lookup */
+
+   /* input: L3 device index for lookup
+* output: device index from FIB lookup
+*/
+   __u32   ifindex;
 
union {
/* inputs to lookup */
-- 
2.14.1



[PATCH bpf-next 0/3] bpf: install eBPF helper man page along with bpftool doc

2018-07-12 Thread Quentin Monnet
The three patches in this series are related to the documentation for eBPF
helpers. The first patch brings minor formatting edits to the documentation
in include/uapi/linux/bpf.h, and the second one updates the related header
file under tools/.

The third patch adds a Makefile under tools/bpf for generating the
documentation (man pages) about eBPF helpers. The targets defined in this
file can also be called from the bpftool directory (please refer to
relevant commit logs for details).

Quentin Monnet (3):
  bpf: fix documentation for eBPF helpers
  tools: bpf: synchronise BPF UAPI header with tools
  tools: bpf: build and install man page for eBPF helpers from bpftool/

 include/uapi/linux/bpf.h |  6 ++--
 tools/bpf/Makefile.helpers   | 59 
 tools/bpf/bpftool/Documentation/Makefile | 13 ---
 tools/include/uapi/linux/bpf.h   | 32 -
 4 files changed, 93 insertions(+), 17 deletions(-)
 create mode 100644 tools/bpf/Makefile.helpers

-- 
2.14.1



[PATCH bpf-next 3/3] tools: bpf: build and install man page for eBPF helpers from bpftool/

2018-07-12 Thread Quentin Monnet
Provide a new Makefile.helpers in tools/bpf, in order to build and
install the man page for eBPF helpers. This Makefile is also included in
the one used to build bpftool documentation, so that it can be called
either on its own (cd tools/bpf && make -f Makefile.helpers) or from
bpftool directory (cd tools/bpf/bpftool && make doc, or
cd tools/bpf/bpftool/Documentation && make helpers).

Makefile.helpers is not added directly to bpftool to avoid changing its
Makefile too much (helpers are not 100% directly related with bpftool).
But the possibility to build the page from bpftool directory makes us
able to package the helpers man page with bpftool, and to install it
along with bpftool documentation, so that the doc for helpers becomes
easily available to developers through the "man" program.

Cc: linux-...@vger.kernel.org
Suggested-by: Daniel Borkmann 
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/Makefile.helpers   | 59 
 tools/bpf/bpftool/Documentation/Makefile | 13 ---
 2 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 tools/bpf/Makefile.helpers

diff --git a/tools/bpf/Makefile.helpers b/tools/bpf/Makefile.helpers
new file mode 100644
index ..c34fea77f39f
--- /dev/null
+++ b/tools/bpf/Makefile.helpers
@@ -0,0 +1,59 @@
+ifndef allow-override
+  include ../scripts/Makefile.include
+  include ../scripts/utilities.mak
+else
+  # Assume Makefile.helpers is being run from bpftool/Documentation
+  # subdirectory. Go up two more directories to fetch bpf.h header and
+  # associated script.
+  UP2DIR := ../../
+endif
+
+INSTALL ?= install
+RM ?= rm -f
+RMDIR ?= rmdir --ignore-fail-on-non-empty
+
+ifeq ($(V),1)
+  Q =
+else
+  Q = @
+endif
+
+prefix ?= /usr/local
+mandir ?= $(prefix)/man
+man7dir = $(mandir)/man7
+
+HELPERS_RST = bpf-helpers.rst
+MAN7_RST = $(HELPERS_RST)
+
+_DOC_MAN7 = $(patsubst %.rst,%.7,$(MAN7_RST))
+DOC_MAN7 = $(addprefix $(OUTPUT),$(_DOC_MAN7))
+
+helpers: man7
+man7: $(DOC_MAN7)
+
+RST2MAN_DEP := $(shell command -v rst2man 2>/dev/null)
+
+$(OUTPUT)$(HELPERS_RST): $(UP2DIR)../../include/uapi/linux/bpf.h
+   $(QUIET_GEN)$(UP2DIR)../../scripts/bpf_helpers_doc.py --filename $< > $@
+
+$(OUTPUT)%.7: $(OUTPUT)%.rst
+ifndef RST2MAN_DEP
+   $(error "rst2man not found, but required to generate man pages")
+endif
+   $(QUIET_GEN)rst2man $< > $@
+
+helpers-clean:
+   $(call QUIET_CLEAN, eBPF_helpers-manpage)
+   $(Q)$(RM) $(DOC_MAN7) $(OUTPUT)$(HELPERS_RST)
+
+helpers-install: helpers
+   $(call QUIET_INSTALL, eBPF_helpers-manpage)
+   $(Q)$(INSTALL) -d -m 755 $(DESTDIR)$(man7dir)
+   $(Q)$(INSTALL) -m 644 $(DOC_MAN7) $(DESTDIR)$(man7dir)
+
+helpers-uninstall:
+   $(call QUIET_UNINST, eBPF_helpers-manpage)
+   $(Q)$(RM) $(addprefix $(DESTDIR)$(man7dir)/,$(_DOC_MAN7))
+   $(Q)$(RMDIR) $(DESTDIR)$(man7dir)
+
+.PHONY: helpers helpers-clean helpers-install helpers-uninstall
diff --git a/tools/bpf/bpftool/Documentation/Makefile 
b/tools/bpf/bpftool/Documentation/Makefile
index a9d47c1558bb..f7663a3e60c9 100644
--- a/tools/bpf/bpftool/Documentation/Makefile
+++ b/tools/bpf/bpftool/Documentation/Makefile
@@ -15,12 +15,15 @@ prefix ?= /usr/local
 mandir ?= $(prefix)/man
 man8dir = $(mandir)/man8
 
-MAN8_RST = $(wildcard *.rst)
+# Load targets for building eBPF helpers man page.
+include ../../Makefile.helpers
+
+MAN8_RST = $(filter-out $(HELPERS_RST),$(wildcard *.rst))
 
 _DOC_MAN8 = $(patsubst %.rst,%.8,$(MAN8_RST))
 DOC_MAN8 = $(addprefix $(OUTPUT),$(_DOC_MAN8))
 
-man: man8
+man: man8 helpers
 man8: $(DOC_MAN8)
 
 RST2MAN_DEP := $(shell command -v rst2man 2>/dev/null)
@@ -31,16 +34,16 @@ ifndef RST2MAN_DEP
 endif
$(QUIET_GEN)rst2man $< > $@
 
-clean:
+clean: helpers-clean
$(call QUIET_CLEAN, Documentation)
$(Q)$(RM) $(DOC_MAN8)
 
-install: man
+install: man helpers-install
$(call QUIET_INSTALL, Documentation-man)
$(Q)$(INSTALL) -d -m 755 $(DESTDIR)$(man8dir)
$(Q)$(INSTALL) -m 644 $(DOC_MAN8) $(DESTDIR)$(man8dir)
 
-uninstall:
+uninstall: helpers-uninstall
$(call QUIET_UNINST, Documentation-man)
$(Q)$(RM) $(addprefix $(DESTDIR)$(man8dir)/,$(_DOC_MAN8))
$(Q)$(RMDIR) $(DESTDIR)$(man8dir)
-- 
2.14.1



Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status

2018-06-19 Thread Quentin Monnet
Hi David,

2018-06-17 08:18 UTC-0700 ~ dsah...@kernel.org
> From: David Ahern 
> 
> For ACLs implemented using either FIB rules or FIB entries, the BPF
> program needs the FIB lookup status to be able to drop the packet.
> Since the bpf_fib_lookup API has not reached a released kernel yet,
> change the return code to contain an encoding of the FIB lookup
> result and return the nexthop device index in the params struct.
> 
> In addition, inform the BPF program of any post FIB lookup reason as
> to why the packet needs to go up the stack.
> 
> Update the sample program per the change in API.
> 
> Signed-off-by: David Ahern 
> ---
>  include/uapi/linux/bpf.h   | 28 ++
>  net/core/filter.c  | 74 
> --
>  samples/bpf/xdp_fwd_kern.c |  8 ++---
>  3 files changed, 78 insertions(+), 32 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 59b19b6a40d7..ceb80071c341 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1857,7 +1857,8 @@ union bpf_attr {
>   *   is resolved), the nexthop address is returned in ipv4_dst
>   *   or ipv6_dst based on family, smac is set to mac address of
>   *   egress device, dmac is set to nexthop mac address, rt_metric
> - *   is set to metric from route (IPv4/IPv6 only).
> + *   is set to metric from route (IPv4/IPv6 only), and ifindex
> + *   is set to the device index of the nexthop from the FIB lookup.
>   *
>   * *plen* argument is the size of the passed in struct.
>   * *flags* argument can be a combination of one or more of the
> @@ -1873,9 +1874,9 @@ union bpf_attr {
>   * *ctx* is either **struct xdp_md** for XDP programs or
>   * **struct sk_buff** tc cls_act programs.
>   * Return
> - * Egress device index on success, 0 if packet needs to continue
> - * up the stack for further processing or a negative error in 
> case
> - * of failure.
> + *   < 0 if any input argument is invalid
> + * 0 on success (packet is forwarded and nexthop neighbor exists)
> + *   > 0 one of BPF_FIB_LKUP_RET_ codes on FIB lookup response


Since you are about to respin (I think?), could you please also fix the
formatting in your change to the doc? The "BPF_FIB_LKUP_RET_" is not
emphasized (and will even cause an error message when producing the man
page, because of the trailing underscore that gets interpreted in RST),
and the three cases for the return value are not formatted properly for
the conversion.

Something like the following would work:

---
 * Return
 *  * < 0 if any input argument is invalid.
 *  *   0 on success (packet is forwarded and nexthop neighbor 
exists).
 *  * > 0: one of **BPF_FIB_LKUP_RET_** codes on FIB lookup 
response.
---

Thank you,
Quentin


Re: [PATCH bpf-next] bpf: clean up eBPF helpers documentation

2018-05-30 Thread Quentin Monnet
On 29 May 2018 at 20:44, Daniel Borkmann  wrote:
> On 05/29/2018 08:27 PM, Song Liu wrote:
>> On Tue, May 29, 2018 at 4:27 AM, Quentin Monnet
>>  wrote:
>>> These are minor edits for the eBPF helpers documentation in
>>> include/uapi/linux/bpf.h.
>>>
>>> The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends
>>> with a non-escaped underscore that gets interpreted by rst2man and
>>> produces the following message in the resulting manual page:
>>>
>>> DOCUTILS SYSTEM MESSAGES
>>>System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514)
>>>   Unknown target name: "bpf_fib_lookup".
>>>
>>> Other edits consist in:
>>>
>>> - Improving formatting for flag values for "bpf_fib_lookup()" helper.
>>> - Emphasising a parameter name in description of the return value for
>>>   "bpf_get_stack()" helper.
>>> - Removing unnecessary blank lines between "Description" and "Return"
>>>   sections for the few helpers that would use it, for consistency.
>>>
>>> Signed-off-by: Quentin Monnet 
> [...]
>>
>> Please also apply the same changes to tools/include/uapi/linux/bpf.h.

Ah, true, I forgot it... Thanks for the reminder.

> Just did while applying to bpf-next, thanks guys!
>
>> Other than this, it looks to me.
>>
>> Acked-by: Song Liu 

Thanks a lot Song, Daniel!
Quentin


Re: [PATCH bpf-next 06/11] bpf: add bpf_skb_cgroup_id helper

2018-05-29 Thread Quentin Monnet
Hi Daniel,

2018-05-28 02:43 UTC+0200 ~ Daniel Borkmann 
> Add a new bpf_skb_cgroup_id() helper that allows to retrieve the
> cgroup id from the skb's socket. This is useful in particular to
> enable bpf_get_cgroup_classid()-like behavior for cgroup v1 in
> cgroup v2 by allowing ID based matching on egress. This can in
> particular be used in combination with applying policy e.g. from
> map lookups, and also complements the older bpf_skb_under_cgroup()
> interface. In user space the cgroup id for a given path can be
> retrieved through the f_handle as demonstrated in [0] recently.
> 
>   [0] https://lkml.org/lkml/2018/5/22/1190
> 
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 
> ---
>  include/uapi/linux/bpf.h | 17 -
>  net/core/filter.c| 29 +++--
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 9b8c6e3..e2853aa 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2004,6 +2004,20 @@ union bpf_attr {
>   *   direct packet access.
>   *   Return
>   *   0 on success, or a negative error in case of failure.
> + *
> + * uint64_t bpf_skb_cgroup_id(struct sk_buff *skb)
> + *   Description
> + *   Return the cgroup v2 id of the socket associated with the *skb*.
> + *   This is roughly similar to the **bpf_get_cgroup_classid**\ ()
> + *   helper for cgroup v1 by providing a tag resp. identifier that
> + *   can be matched on or used for map lookups e.g. to implement
> + *   policy. The cgroup v2 id of a given path in the hierarchy is
> + *   exposed in user space through the f_handle API in order to get
> + *   to the same 64-bit id.
> + *
> + *   This helper can be used on TC egress path, but not on ingress.

Nitpick: Maybe mention that the kernel must be built with
CONFIG_SOCK_CGROUP_DATA option for the helper to be available?

Best,
Quentin


> + *   Return
> + *   The id is returned or 0 in case the id could not be retrieved.
>   */
>  #define __BPF_FUNC_MAPPER(FN)\
>   FN(unspec), \
> @@ -2082,7 +2096,8 @@ union bpf_attr {
>   FN(lwt_push_encap), \
>   FN(lwt_seg6_store_bytes),   \
>   FN(lwt_seg6_adjust_srh),\
> - FN(lwt_seg6_action),
> + FN(lwt_seg6_action),\
> + FN(skb_cgroup_id),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index acf1f4f..717c740 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3661,6 +3661,27 @@ static const struct bpf_func_proto 
> bpf_skb_under_cgroup_proto = {
>   .arg3_type  = ARG_ANYTHING,
>  };
>  
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> +BPF_CALL_1(bpf_skb_cgroup_id, const struct sk_buff *, skb)
> +{
> + struct sock *sk = skb_to_full_sk(skb);
> + struct cgroup *cgrp;
> +
> + if (!sk || !sk_fullsock(sk))
> + return 0;
> +
> + cgrp = sock_cgroup_ptr(>sk_cgrp_data);
> + return cgrp->kn->id.id;
> +}
> +
> +static const struct bpf_func_proto bpf_skb_cgroup_id_proto = {
> + .func   = bpf_skb_cgroup_id,
> + .gpl_only   = false,
> + .ret_type   = RET_INTEGER,
> + .arg1_type  = ARG_PTR_TO_CTX,
> +};
> +#endif
> +
>  static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
> unsigned long off, unsigned long len)
>  {
> @@ -4741,12 +4762,16 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const 
> struct bpf_prog *prog)
>   return _get_socket_cookie_proto;
>   case BPF_FUNC_get_socket_uid:
>   return _get_socket_uid_proto;
> + case BPF_FUNC_fib_lookup:
> + return _skb_fib_lookup_proto;
>  #ifdef CONFIG_XFRM
>   case BPF_FUNC_skb_get_xfrm_state:
>   return _skb_get_xfrm_state_proto;
>  #endif
> - case BPF_FUNC_fib_lookup:
> - return _skb_fib_lookup_proto;
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> + case BPF_FUNC_skb_cgroup_id:
> + return _skb_cgroup_id_proto;
> +#endif
>   default:
>   return bpf_base_func_proto(func_id);
>   }
> 



[PATCH bpf-next] bpf: clean up eBPF helpers documentation

2018-05-29 Thread Quentin Monnet
These are minor edits for the eBPF helpers documentation in
include/uapi/linux/bpf.h.

The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends
with a non-escaped underscore that gets interpreted by rst2man and
produces the following message in the resulting manual page:

DOCUTILS SYSTEM MESSAGES
   System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514)
  Unknown target name: "bpf_fib_lookup".

Other edits consist in:

- Improving formatting for flag values for "bpf_fib_lookup()" helper.
- Emphasising a parameter name in description of the return value for
  "bpf_get_stack()" helper.
- Removing unnecessary blank lines between "Description" and "Return"
  sections for the few helpers that would use it, for consistency.

Signed-off-by: Quentin Monnet 
---
 include/uapi/linux/bpf.h | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cc68787f2d97..3f556b35ac8d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1010,7 +1010,6 @@ union bpf_attr {
  * ::
  *
  * # sysctl kernel.perf_event_max_stack=
- *
  * Return
  * The positive or null stack id on success, or a negative error
  * in case of failure.
@@ -1821,10 +1820,9 @@ union bpf_attr {
  * ::
  *
  * # sysctl kernel.perf_event_max_stack=
- *
  * Return
- * a non-negative value equal to or less than size on success, or
- * a negative error in case of failure.
+ * A non-negative value equal to or less than *size* on success,
+ * or a negative error in case of failure.
  *
  * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void 
*to, u32 len, u32 start_header)
  * Description
@@ -1845,7 +1843,6 @@ union bpf_attr {
  * in socket filters where *skb*\ **->data** does not always point
  * to the start of the mac header and where "direct packet access"
  * is not available.
- *
  * Return
  * 0 on success, or a negative error in case of failure.
  *
@@ -1861,16 +1858,18 @@ union bpf_attr {
  * rt_metric is set to metric from route.
  *
  * *plen* argument is the size of the passed in struct.
- * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags:
+ * *flags* argument can be a combination of one or more of the
+ * following values:
  *
- * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs
- * full lookup using FIB rules
- * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress
- * perspective (default is ingress)
+ * **BPF_FIB_LOOKUP_DIRECT**
+ * Do a direct table lookup vs full lookup using FIB
+ * rules.
+ * **BPF_FIB_LOOKUP_OUTPUT**
+ * Perform lookup from an egress perspective (default is
+ * ingress).
  *
  * *ctx* is either **struct xdp_md** for XDP programs or
  * **struct sk_buff** tc cls_act programs.
- *
  * Return
  * Egress device index on success, 0 if packet needs to continue
  * up the stack for further processing or a negative error in case
-- 
2.14.1



Re: [PATCH v3 2/2] bpf: add selftest for rawir_event type program

2018-05-18 Thread Quentin Monnet
2018-05-18 14:33 UTC+0100 ~ Sean Young <s...@mess.org>
> On Fri, May 18, 2018 at 11:13:07AM +0100, Quentin Monnet wrote:
>> 2018-05-17 22:01 UTC+0100 ~ Sean Young <s...@mess.org>
>>> On Thu, May 17, 2018 at 10:17:59AM -0700, Y Song wrote:
>>>> On Wed, May 16, 2018 at 2:04 PM, Sean Young <s...@mess.org> wrote:
>>>>> This is simple test over rc-loopback.
>>>>>
>>>>> Signed-off-by: Sean Young <s...@mess.org>
>>>>> ---
>>>>>  tools/bpf/bpftool/prog.c  |   1 +
>>>>>  tools/include/uapi/linux/bpf.h|  57 +++-
>>>>>  tools/lib/bpf/libbpf.c|   1 +
>>>>>  tools/testing/selftests/bpf/Makefile  |   8 +-
>>>>>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>>>>>  tools/testing/selftests/bpf/test_rawir.sh |  37 +
>>>>>  .../selftests/bpf/test_rawir_event_kern.c |  26 
>>>>>  .../selftests/bpf/test_rawir_event_user.c | 130 ++
>>>>>  8 files changed, 261 insertions(+), 5 deletions(-)
>>>>>  create mode 100755 tools/testing/selftests/bpf/test_rawir.sh
>>>>>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_kern.c
>>>>>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_user.c
>>
>> [...]
>>
>>>> Most people probably not really familiar with lircN device. It would be
>>>> good to provide more information about how to enable this, e.g.,
>>>>   CONFIG_RC_CORE=y
>>>>   CONFIG_BPF_RAWIR_EVENT=y
>>>>   CONFIG_RC_LOOPBACK=y
>>>>   ..
>>>
>>> Good point. I'll add some words explaining what is and how to make it work.
>>>
>>> Thanks
>>> Sean
>>
>>
>> By the way, shouldn't the two eBPF helpers bpf_rc_keydown() and
>> bpf_rc_repeat() be compiled out in patch 1 if e.g.
>> CONFIG_BPF_RAWIR_EVENT is not set? There are some other helpers that are
>> compiled only if relevant config options are set (bpf_get_xfrm_state()
>> for example).
> 
> So if CONFIG_BPF_RAWIR_EVENT is not set, then bpf-rawir-event.c is not
> compiled. Stubs are created in include/linux/bpf_rcdev.h, so this is
> already the case if I understand your correctly.

This is correct, sorry for the mistake.

>> (If you were to change that, please also update helper documentations to
>> indicate what configuration options are required to be able to use the
>> helpers.)
> 
> Ok, I'll add that.
Thanks a lot!

Quentin


Re: [PATCH v3 2/2] bpf: add selftest for rawir_event type program

2018-05-18 Thread Quentin Monnet
2018-05-17 22:01 UTC+0100 ~ Sean Young 
> On Thu, May 17, 2018 at 10:17:59AM -0700, Y Song wrote:
>> On Wed, May 16, 2018 at 2:04 PM, Sean Young  wrote:
>>> This is simple test over rc-loopback.
>>>
>>> Signed-off-by: Sean Young 
>>> ---
>>>  tools/bpf/bpftool/prog.c  |   1 +
>>>  tools/include/uapi/linux/bpf.h|  57 +++-
>>>  tools/lib/bpf/libbpf.c|   1 +
>>>  tools/testing/selftests/bpf/Makefile  |   8 +-
>>>  tools/testing/selftests/bpf/bpf_helpers.h |   6 +
>>>  tools/testing/selftests/bpf/test_rawir.sh |  37 +
>>>  .../selftests/bpf/test_rawir_event_kern.c |  26 
>>>  .../selftests/bpf/test_rawir_event_user.c | 130 ++
>>>  8 files changed, 261 insertions(+), 5 deletions(-)
>>>  create mode 100755 tools/testing/selftests/bpf/test_rawir.sh
>>>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_kern.c
>>>  create mode 100644 tools/testing/selftests/bpf/test_rawir_event_user.c

[...]

>> Most people probably not really familiar with lircN device. It would be
>> good to provide more information about how to enable this, e.g.,
>>   CONFIG_RC_CORE=y
>>   CONFIG_BPF_RAWIR_EVENT=y
>>   CONFIG_RC_LOOPBACK=y
>>   ..
> 
> Good point. I'll add some words explaining what is and how to make it work.
> 
> Thanks
> Sean


By the way, shouldn't the two eBPF helpers bpf_rc_keydown() and
bpf_rc_repeat() be compiled out in patch 1 if e.g.
CONFIG_BPF_RAWIR_EVENT is not set? There are some other helpers that are
compiled only if relevant config options are set (bpf_get_xfrm_state()
for example).

(If you were to change that, please also update helper documentations to
indicate what configuration options are required to be able to use the
helpers.)

Best regards,
Quentin


Re: [PATCH bpf-next] bpf: change eBPF helper doc parsing script to allow for smaller indent

2018-05-17 Thread Quentin Monnet
2018-05-17 17:38 UTC+0200 ~ Daniel Borkmann <dan...@iogearbox.net>
> On 05/17/2018 02:43 PM, Quentin Monnet wrote:
>> Documentation for eBPF helpers can be parsed from bpf.h and eventually
>> turned into a man page. Commit 6f96674dbd8c ("bpf: relax constraints on
>> formatting for eBPF helper documentation") changed the script used to
>> parse it, in order to allow for different indent style and to ease the
>> work for writing documentation for future helpers.
>>
>> The script currently considers that the first tab can be replaced by 6
>> to 8 spaces. But the documentation for bpf_fib_lookup() uses a mix of
>> tabs (for the "Description" part) and of spaces ("Return" part), and
>> only has 5 space long indent for the latter.
>>
>> We probably do not want to change the values accepted by the script each
>> time a new helper gets a new indent style. However, it is worth noting
>> that with those 5 spaces, the "Description" and "Return" part *look*
>> aligned in the generated patch and in `git show`, so it is likely other
>> helper authors will use the same length. Therefore, allow for helper
>> documentation to use 5 spaces only for the first indent level.
>>
>> Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
> 
> Applied to bpf-next, thanks Quentin! Btw in the current uapi description
> some of the helpers have a new line before 'Return' and most have not. I
> presume it doesn't really matter though we might want to do a one-time
> cleanup on these cases at some point in time.

Thanks Daniel!

I did notice those new lines as well. The script was failing on the
5-space indent, but not on the new lines, so I let them as they are. I
agree for the cleanup, I can send a patch when the various helpers
currently being discussed on the list are merged.

Best,
Quentin



[PATCH bpf-next] bpf: change eBPF helper doc parsing script to allow for smaller indent

2018-05-17 Thread Quentin Monnet
Documentation for eBPF helpers can be parsed from bpf.h and eventually
turned into a man page. Commit 6f96674dbd8c ("bpf: relax constraints on
formatting for eBPF helper documentation") changed the script used to
parse it, in order to allow for different indent style and to ease the
work for writing documentation for future helpers.

The script currently considers that the first tab can be replaced by 6
to 8 spaces. But the documentation for bpf_fib_lookup() uses a mix of
tabs (for the "Description" part) and of spaces ("Return" part), and
only has 5 space long indent for the latter.

We probably do not want to change the values accepted by the script each
time a new helper gets a new indent style. However, it is worth noting
that with those 5 spaces, the "Description" and "Return" part *look*
aligned in the generated patch and in `git show`, so it is likely other
helper authors will use the same length. Therefore, allow for helper
documentation to use 5 spaces only for the first indent level.

Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 scripts/bpf_helpers_doc.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 8f59897fbda1..5010a4d5bfba 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -95,7 +95,7 @@ class HeaderParser(object):
 return capture.group(1)
 
 def parse_desc(self):
-p = re.compile(' \* ?(?:\t| {6,8})Description$')
+p = re.compile(' \* ?(?:\t| {5,8})Description$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty description and we might be parsing another
@@ -109,7 +109,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 desc += '\n'
 else:
-p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
+p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 desc += capture.group(1) + '\n'
@@ -118,7 +118,7 @@ class HeaderParser(object):
 return desc
 
 def parse_ret(self):
-p = re.compile(' \* ?(?:\t| {6,8})Return$')
+p = re.compile(' \* ?(?:\t| {5,8})Return$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty retval and we might be parsing another
@@ -132,7 +132,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 ret += '\n'
 else:
-p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
+p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 ret += capture.group(1) + '\n'
-- 
2.14.1



Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Quentin Monnet
2018-05-16 22:04 UTC+0100 ~ Sean Young 
> Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
> 
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
> 
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/Kconfig   |  13 ++
>  drivers/media/rc/Makefile  |   1 +
>  drivers/media/rc/bpf-rawir-event.c | 363 +
>  drivers/media/rc/lirc_dev.c|  24 ++
>  drivers/media/rc/rc-core-priv.h|  24 ++
>  drivers/media/rc/rc-ir-raw.c   |  14 +-
>  include/linux/bpf_rcdev.h  |  30 +++
>  include/linux/bpf_types.h  |   3 +
>  include/uapi/linux/bpf.h   |  55 -
>  kernel/bpf/syscall.c   |   7 +
>  10 files changed, 531 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>  create mode 100644 include/linux/bpf_rcdev.h
> 

[...]

Hi Sean,

Please find below some nitpicks on the documentation for the two helpers.

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..243e141e8a5b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h

[...]

> @@ -1902,6 +1904,35 @@ union bpf_attr {
>   *   egress otherwise). This is the only flag supported for now.
>   *   Return
>   *   **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u32 scancode, u32 toggle)
> + *   Description
> + *   Report decoded scancode with toggle value. For use in
> + *   BPF_PROG_TYPE_RAWIR_EVENT, to report a successfully

Could you please use bold RST markup for constants and function names?
Typically for BPF_PROG_TYPE_RAWIR_EVENT here and the enum below.

> + *   decoded scancode. This is will generate a keydown event,

s/This is will/This will/?

> + *   and a keyup event once the scancode is no longer repeated.
> + *
> + *   *ctx* pointer to bpf_rawir_event, *protocol* is decoded
> + *   protocol (see RC_PROTO_* enum).

This documentation is intended to be compiled as a man page. Could you
please use a complete sentence here?
Also, this could do with additional markup as well: **struct
bpf_rawir_event**.

> + *
> + *   Some protocols include a toggle bit, in case the button
> + *   was released and pressed again between consecutive scancodes,
> + *   copy this bit into *toggle* if it exists, else set to 0.
> + *
> + * Return

The "Return" lines here and in the second helper use space indent
instead as tabs (as all other lines do). Would you mind fixing it for
consistency?

> + *   Always return 0 (for now)

Other helpers use just "0" in that case, but I do not really mind.
Out of curiosity, do you have anything specific in mind for changing the
return value here in the future?

> + *
> + * int bpf_rc_repeat(void *ctx)
> + *   Description
> + *   Repeat the last decoded scancode; some IR protocols like
> + *   NEC have a special IR message for repeat last button,

s/repeat/repeating/?

> + *   in case user is holding a button down; the scancode is
> + *   not repeated.
> + *
> + *   *ctx* pointer to bpf_rawir_event.

Please use a complete sentence here as well, if you do not mind.

> + *
> + * Return
> + *   Always return 0 (for now)
>   */
Thanks,
Quentin


Re: [PATCH bpf-next v4 3/4] bpf: selftest additions for SOCKHASH

2018-05-04 Thread Quentin Monnet
2018-05-03 11:28:27 UTC-0700 ~ John Fastabend 
> This runs existing SOCKMAP tests with SOCKHASH map type. To do this
> we push programs into include file and build two BPF programs. One
> for SOCKHASH and one for SOCKMAP.
> 
> We then run the entire test suite with each type.
> 
> Signed-off-by: John Fastabend 
> ---
>  tools/include/uapi/linux/bpf.h |  6 -
>  tools/testing/selftests/bpf/Makefile   |  3 ++-
>  tools/testing/selftests/bpf/test_sockhash_kern.c   |  4 
>  tools/testing/selftests/bpf/test_sockmap.c | 27 
> --
>  .../{test_sockmap_kern.c => test_sockmap_kern.h}   | 10 
>  5 files changed, 36 insertions(+), 14 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/test_sockhash_kern.c
>  rename tools/testing/selftests/bpf/{test_sockmap_kern.c => 
> test_sockmap_kern.h} (97%)
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index da77a93..5cb983d 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -116,6 +116,7 @@ enum bpf_map_type {
>   BPF_MAP_TYPE_DEVMAP,
>   BPF_MAP_TYPE_SOCKMAP,
>   BPF_MAP_TYPE_CPUMAP,
> + BPF_MAP_TYPE_SOCKHASH,
>  };
>  
>  enum bpf_prog_type {
> @@ -1835,7 +1836,10 @@ struct bpf_stack_build_id {
>   FN(msg_pull_data),  \
>   FN(bind),   \
>   FN(xdp_adjust_tail),\
> - FN(skb_get_xfrm_state),
> + FN(skb_get_xfrm_state), \
> + FN(sock_hash_update),   \
> + FN(msg_redirect_hash),  \
> + FN(sk_redirect_hash),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call

Thanks for documenting the helpers in include/uapi/linux/bpf.h! However
the doc is missing in the update to the bpf.h file under tools/ in this
patch, could you please fix it?

Best regards,
Quentin


[PATCH bpf-next v2] bpf: relax constraints on formatting for eBPF helper documentation

2018-05-02 Thread Quentin Monnet
The Python script used to parse and extract eBPF helpers documentation
from include/uapi/linux/bpf.h expects a very specific formatting for the
descriptions (single dot represents a space, '>' stands for a tab):

/*
 ...
 *.int bpf_helper(list of arguments)
 *.>Description
 *.>>   Start of description
 *.>>   Another line of description
 *.>>   And yet another line of description
 *.>Return
 *.>>   0 on success, or a negative error in case of failure
 ...
 */

This is too strict, and painful for developers who wants to add
documentation for new helpers. Worse, it is extremely difficult to check
that the formatting is correct during reviews. Change the format
expected by the script and make it more flexible. The script now works
whether or not the initial space (right after the star) is present, and
accepts both tabs and white spaces (or a combination of both) for
indenting description sections and contents.

Concretely, something like the following would now be supported:

/*
 ...
 *int bpf_helper(list of arguments)
 *..Description
 *.>>   Start of description...
 *> >   Another line of description
 *..And yet another line of description
 *> Return
 *.>0 on success, or a negative error in case of failure
 ...
 */

While at it, remove unnecessary carets from each regex used with match()
in the script. They are redundant, as match() tries to match from the
beginning of the string by default.

v2: Remove unnecessary caret when a regex is used with match().

Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 scripts/bpf_helpers_doc.py | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 30ba0fee36e4..8f59897fbda1 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -39,9 +39,9 @@ class Helper(object):
 Break down helper function protocol into smaller chunks: return type,
 name, distincts arguments.
 """
-arg_re = re.compile('^((const )?(struct )?(\w+|...))( (\**)(\w+))?$')
+arg_re = re.compile('((const )?(struct )?(\w+|...))( (\**)(\w+))?$')
 res = {}
-proto_re = re.compile('^(.+) (\**)(\w+)\[^,]+)(, )?){1,5})\)$')
+proto_re = re.compile('(.+) (\**)(\w+)\[^,]+)(, )?){1,5})\)$')
 
 capture = proto_re.match(self.proto)
 res['ret_type'] = capture.group(1)
@@ -87,7 +87,7 @@ class HeaderParser(object):
 #   - Same as above, with "const" and/or "struct" in front of type
 #   - "..." (undefined number of arguments, for bpf_trace_printk())
 # There is at least one term ("void"), and at most five arguments.
-p = re.compile('^ \* ((.+) \**\w+\const )?(struct )?(\w+|\.\.\.)( 
\**\w+)?)(, )?){1,5}\))$')
+p = re.compile(' \* ?((.+) \**\w+\const )?(struct )?(\w+|\.\.\.)( 
\**\w+)?)(, )?){1,5}\))$')
 capture = p.match(self.line)
 if not capture:
 raise NoHelperFound
@@ -95,7 +95,7 @@ class HeaderParser(object):
 return capture.group(1)
 
 def parse_desc(self):
-p = re.compile('^ \* \tDescription$')
+p = re.compile(' \* ?(?:\t| {6,8})Description$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty description and we might be parsing another
@@ -109,7 +109,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 desc += '\n'
 else:
-p = re.compile('^ \* \t\t(.*)')
+p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 desc += capture.group(1) + '\n'
@@ -118,7 +118,7 @@ class HeaderParser(object):
 return desc
 
 def parse_ret(self):
-p = re.compile('^ \* \tReturn$')
+p = re.compile(' \* ?(?:\t| {6,8})Return$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty retval and we might be parsing another
@@ -132,7 +132,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 ret += '\n'
 else:
-p = re.compile('^ \* \t\t(.*)')
+p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 ret += capture.group(1) + '\n'
-- 
2.14.1



[PATCH bpf-nex] tools: bpftool: change time format for program 'loaded at:' information

2018-05-01 Thread Quentin Monnet
To make eBPF program load time easier to parse from "bpftool prog"
output for machines, change the time format used by the program. The
format now differs for plain and JSON version:

- Plain version uses a string formatted according to ISO 8601.
- JSON uses the number of seconds since the Epoch, wich is less friendly
  for humans but even easier to process.

Example output:

# ./bpftool prog
41298: xdp  tag a04f5eef06a7f555 dev foo
loaded_at 2018-04-18T17:19:47+0100  uid 0
xlated 16B  not jited  memlock 4096B

# ./bpftool prog -p
[{
"id": 41298,
"type": "xdp",
"tag": "a04f5eef06a7f555",
"gpl_compatible": false,
"dev": {
"ifindex": 14,
"ns_dev": 3,
"ns_inode": 4026531993,
"ifname": "foo"
},
"loaded_at": 1524068387,
"uid": 0,
"bytes_xlated": 16,
"jited": false,
    "bytes_memlock": 4096
}
]

Previously, "Apr 18/17:19" would be used at both places.

Suggested-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicin...@netronome.com>
---
 tools/bpf/bpftool/prog.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e71a0a11afde..9bdfdf2d3fbe 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -96,7 +96,10 @@ static void print_boot_time(__u64 nsecs, char *buf, unsigned 
int size)
return;
}
 
-   strftime(buf, size, "%b %d/%H:%M", _tm);
+   if (json_output)
+   strftime(buf, size, "%s", _tm);
+   else
+   strftime(buf, size, "%FT%T%z", _tm);
 }
 
 static int prog_fd_by_tag(unsigned char *tag)
@@ -245,7 +248,8 @@ static void print_prog_json(struct bpf_prog_info *info, int 
fd)
print_boot_time(info->load_time, buf, sizeof(buf));
 
/* Piggy back on load_time, since 0 uid is a valid one */
-   jsonw_string_field(json_wtr, "loaded_at", buf);
+   jsonw_name(json_wtr, "loaded_at");
+   jsonw_printf(json_wtr, "%s", buf);
jsonw_uint_field(json_wtr, "uid", info->created_by_uid);
}
 
-- 
2.7.4



Re: [PATCH bpf-next] bpf: relax constraints on formatting for eBPF helper documentation

2018-04-30 Thread Quentin Monnet
2018-04-30 17:33 UTC+0100 ~ Edward Cree <ec...@solarflare.com>
> On 30/04/18 16:59, Quentin Monnet wrote:
>> The Python script used to parse and extract eBPF helpers documentation
>> from include/uapi/linux/bpf.h expects a very specific formatting for the
>> descriptions (single dots represent a space, '>' stands for a tab):
>>
>> /*
>>  ...
>>  *.int bpf_helper(list of arguments)
>>  *.>Description
>>  *.>>   Start of description
>>  *.>>   Another line of description
>>  *.>>   And yet another line of description
>>  *.>Return
>>  *.>>   0 on success, or a negative error in case of failure
>>  ...
>>  */
>>
>> This is too strict, and painful for developers who wants to add
>> documentation for new helpers. Worse, it is extremelly difficult to
>> check that the formatting is correct during reviews. Change the
>> format expected by the script and make it more flexible. The script now
>> works whether or not the initial space (right after the star) is
>> present, and accepts both tabs and white spaces (or a combination of
>> both) for indenting description sections and contents.
>>
>> Concretely, something like the following would now be supported:
>>
>> /*
>>  ...
>>  *int bpf_helper(list of arguments)
>>  *..Description
>>  *.>>   Start of description...
>>  *> >   Another line of description
>>  *..And yet another line of description
>>  *> Return
>>  *.>0 on success, or a negative error in case of failure
>>  ...
>>  */
>>
>> Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
>> ---
>>  scripts/bpf_helpers_doc.py | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
>> index 30ba0fee36e4..717547e6f0a6 100755
>> --- a/scripts/bpf_helpers_doc.py
>> +++ b/scripts/bpf_helpers_doc.py
>> @@ -87,7 +87,7 @@ class HeaderParser(object):
>>  #   - Same as above, with "const" and/or "struct" in front of type
>>  #   - "..." (undefined number of arguments, for bpf_trace_printk())
>>  # There is at least one term ("void"), and at most five arguments.
>> -p = re.compile('^ \* ((.+) \**\w+\const )?(struct 
>> )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
>> +p = re.compile('^ \* ?((.+) \**\w+\const )?(struct 
>> )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
> The proper coding style for such things is to go straight to tabs after
>  the star and not have the space.  So if we're going to make the script
>  flexible here (and leave coding style enforcement to other tools such
>  as checkpatch), maybe the regexen should just begin '^ \*\s+' and avoid
>  relying on counting indentation to delimit sections (e.g. scan for the
>  section headers like '^ \*\s+Description$' instead).

Thanks Edward! I agree it would be cleaner. However, with the current
format of the doc, I see two shortcomings.

- First we need a way to detect the end of a section. There is no
"Return" section for helper returning void, so we cannot rely on it to
end the "Description" section. And there is no delimiter to indicate the
end of the description of a given helper. We cannot assume that a string
matching a function definition, alone on its line, indicate the start of
a new helper (this is not the case). So as I see it, this would at least
require some delimiter between the descriptions of different functions
in bpf.h. I could add them if you think this is better.

- Also, we loose the possibility to further indent the text from the
description. Think about code snippets in descriptions: were we to
extract the lines with a regex such as / *\s+(.*)/, I see no way to get
the additional indent that should appear in the man page, if we do not
know what indent level was used for the helper description. I do not see
any simple workaround.

This being said, I am ready to bring whatever changes are needed to make
writing new helper doc easier, so I am open to suggestions if you have
workarounds for these or if the consensus is that the formatting should
be completely revised.

> Btw, leading '^' is unnecessary as re.match() is already implicitly
>  anchored at start-of-string.  (The trailing '$' are still needed.)

Oh, thanks! I'll fix that.

Quentin


[PATCH bpf-next] bpf: relax constraints on formatting for eBPF helper documentation

2018-04-30 Thread Quentin Monnet
The Python script used to parse and extract eBPF helpers documentation
from include/uapi/linux/bpf.h expects a very specific formatting for the
descriptions (single dots represent a space, '>' stands for a tab):

/*
 ...
 *.int bpf_helper(list of arguments)
 *.>Description
 *.>>   Start of description
 *.>>   Another line of description
 *.>>   And yet another line of description
 *.>Return
 *.>>   0 on success, or a negative error in case of failure
 ...
 */

This is too strict, and painful for developers who wants to add
documentation for new helpers. Worse, it is extremelly difficult to
check that the formatting is correct during reviews. Change the
format expected by the script and make it more flexible. The script now
works whether or not the initial space (right after the star) is
present, and accepts both tabs and white spaces (or a combination of
both) for indenting description sections and contents.

Concretely, something like the following would now be supported:

/*
 ...
 *int bpf_helper(list of arguments)
 *..Description
 *.>>   Start of description...
 *> >   Another line of description
 *..And yet another line of description
 *> Return
 *.>0 on success, or a negative error in case of failure
 ...
 */

Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 scripts/bpf_helpers_doc.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 30ba0fee36e4..717547e6f0a6 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -87,7 +87,7 @@ class HeaderParser(object):
 #   - Same as above, with "const" and/or "struct" in front of type
 #   - "..." (undefined number of arguments, for bpf_trace_printk())
 # There is at least one term ("void"), and at most five arguments.
-p = re.compile('^ \* ((.+) \**\w+\const )?(struct )?(\w+|\.\.\.)( 
\**\w+)?)(, )?){1,5}\))$')
+p = re.compile('^ \* ?((.+) \**\w+\const )?(struct )?(\w+|\.\.\.)( 
\**\w+)?)(, )?){1,5}\))$')
 capture = p.match(self.line)
 if not capture:
 raise NoHelperFound
@@ -95,7 +95,7 @@ class HeaderParser(object):
 return capture.group(1)
 
 def parse_desc(self):
-p = re.compile('^ \* \tDescription$')
+p = re.compile('^ \* ?(?:\t| {6,8})Description$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty description and we might be parsing another
@@ -109,7 +109,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 desc += '\n'
 else:
-p = re.compile('^ \* \t\t(.*)')
+p = re.compile('^ \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 desc += capture.group(1) + '\n'
@@ -118,7 +118,7 @@ class HeaderParser(object):
 return desc
 
 def parse_ret(self):
-p = re.compile('^ \* \tReturn$')
+p = re.compile('^ \* ?(?:\t| {6,8})Return$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty retval and we might be parsing another
@@ -132,7 +132,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 ret += '\n'
 else:
-p = re.compile('^ \* \t\t(.*)')
+p = re.compile('^ \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 ret += capture.group(1) + '\n'
-- 
2.14.1



Re: [PATCH bpf-next 2/3] bpf: fix formatting for bpf_get_stack() helper doc

2018-04-30 Thread Quentin Monnet
2018-04-30 09:12 UTC-0600 ~ David Ahern 
> On 4/30/18 9:08 AM, Alexei Starovoitov wrote:
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 530ff6588d8f..8daef7326bb7 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -1770,33 +1770,33 @@ union bpf_attr {
>>>   *
>>>   * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
>>>   * Description
>>> - * Return a user or a kernel stack in bpf program provided buffer.
>>> - * To achieve this, the helper needs *ctx*, which is a pointer
>>> + * Return a user or a kernel stack in bpf program provided 
>>> buffer.
>>> + * To achieve this, the helper needs *ctx*, which is a 
>>> pointer
>> I still don't quite get the difference.
>> It's replacing 2 tabs in above with 1 space + 2 tabs ?

Yes, exactly (Plus in this case, the "::" a few line below has a missing
tab).

>> Can you please teach the python script to accept both?
>> I bet that will be recurring mistake and it's impossible to spot in code 
>> review.
> And checkpatch throws an error on the 1 space + 2 tabs so it gets
> confusing on which format should be used.

Sorry about that :/. I will send a patch to make the script more flexible.

Quentin



[PATCH bpf-next 2/3] bpf: fix formatting for bpf_get_stack() helper doc

2018-04-30 Thread Quentin Monnet
Fix formatting (indent) for bpf_get_stack() helper documentation, so
that the doc is rendered correctly with the Python script.

Fixes: c195651e565a ("bpf: add bpf_get_stack helper")
Cc: Yonghong Song <y...@fb.com>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---

Note: The error was a missing space between the '*' marking the
comments, and the tabs. This expected mixed indent comes from the fact I
started to write the doc as a RST, then copied my contents (tabs
included) in the header file and added a " * " (with a space) prefix
everywhere.

On a second thought, using such indent style was maybe... not my best idea
ever. Anyway, if indent for documenting eBPF helpers really gets to painful, we
could relax parsing rules in the Python script to make things easier.
---
 include/uapi/linux/bpf.h | 54 
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 530ff6588d8f..8daef7326bb7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1770,33 +1770,33 @@ union bpf_attr {
  *
  * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
  * Description
- * Return a user or a kernel stack in bpf program provided buffer.
- * To achieve this, the helper needs *ctx*, which is a pointer
- * to the context on which the tracing program is executed.
- * To store the stacktrace, the bpf program provides *buf* with
- * a nonnegative *size*.
- *
- * The last argument, *flags*, holds the number of stack frames to
- * skip (from 0 to 255), masked with
- * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
- * the following flags:
- *
- * **BPF_F_USER_STACK**
- * Collect a user space stack instead of a kernel stack.
- * **BPF_F_USER_BUILD_ID**
- * Collect buildid+offset instead of ips for user stack,
- * only valid if **BPF_F_USER_STACK** is also specified.
- *
- * **bpf_get_stack**\ () can collect up to
- * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
- * to sufficient large buffer size. Note that
- * this limit can be controlled with the **sysctl** program, and
- * that it should be manually increased in order to profile long
- * user stacks (such as stacks for Java programs). To do so, use:
- *
- * ::
- *
- * # sysctl kernel.perf_event_max_stack=
+ * Return a user or a kernel stack in bpf program provided buffer.
+ * To achieve this, the helper needs *ctx*, which is a pointer
+ * to the context on which the tracing program is executed.
+ * To store the stacktrace, the bpf program provides *buf* with
+ * a nonnegative *size*.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_USER_BUILD_ID**
+ * Collect buildid+offset instead of ips for user stack,
+ * only valid if **BPF_F_USER_STACK** is also specified.
+ *
+ * **bpf_get_stack**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
+ * to sufficient large buffer size. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=
  *
  * Return
  * a non-negative value equal to or less than size on success, or
-- 
2.14.1



[PATCH bpf-next 3/3] bpf: update bpf.h uapi header for tools

2018-04-30 Thread Quentin Monnet
Bring fixes for eBPF helper documentation formatting to bpf.h under
tools/ as well.

Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 tools/include/uapi/linux/bpf.h | 62 +-
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 23b334bba1a6..8daef7326bb7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -828,12 +828,12 @@ union bpf_attr {
  *
  * Also, be aware that the newer helper
  * **bpf_perf_event_read_value**\ () is recommended over
- * **bpf_perf_event_read*\ () in general. The latter has some ABI
+ * **bpf_perf_event_read**\ () in general. The latter has some ABI
  * quirks where error and counter value are used as a return code
  * (which is wrong to do since ranges may overlap). This issue is
- * fixed with bpf_perf_event_read_value(), which at the same time
- * provides more features over the **bpf_perf_event_read**\ ()
- * interface. Please refer to the description of
+ * fixed with **bpf_perf_event_read_value**\ (), which at the same
+ * time provides more features over the **bpf_perf_event_read**\
+ * () interface. Please refer to the description of
  * **bpf_perf_event_read_value**\ () for details.
  * Return
  * The value of the perf event counter read from the map, or a
@@ -1770,33 +1770,33 @@ union bpf_attr {
  *
  * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
  * Description
- * Return a user or a kernel stack in bpf program provided buffer.
- * To achieve this, the helper needs *ctx*, which is a pointer
- * to the context on which the tracing program is executed.
- * To store the stacktrace, the bpf program provides *buf* with
- * a nonnegative *size*.
- *
- * The last argument, *flags*, holds the number of stack frames to
- * skip (from 0 to 255), masked with
- * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
- * the following flags:
- *
- * **BPF_F_USER_STACK**
- * Collect a user space stack instead of a kernel stack.
- * **BPF_F_USER_BUILD_ID**
- * Collect buildid+offset instead of ips for user stack,
- * only valid if **BPF_F_USER_STACK** is also specified.
- *
- * **bpf_get_stack**\ () can collect up to
- * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
- * to sufficient large buffer size. Note that
- * this limit can be controlled with the **sysctl** program, and
- * that it should be manually increased in order to profile long
- * user stacks (such as stacks for Java programs). To do so, use:
- *
- * ::
- *
- * # sysctl kernel.perf_event_max_stack=
+ * Return a user or a kernel stack in bpf program provided buffer.
+ * To achieve this, the helper needs *ctx*, which is a pointer
+ * to the context on which the tracing program is executed.
+ * To store the stacktrace, the bpf program provides *buf* with
+ * a nonnegative *size*.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_USER_BUILD_ID**
+ * Collect buildid+offset instead of ips for user stack,
+ * only valid if **BPF_F_USER_STACK** is also specified.
+ *
+ * **bpf_get_stack**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
+ * to sufficient large buffer size. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=
  *
  * Return
  * a non-negative value equal to or less than size on success, or
-- 
2.14.1



[PATCH bpf-next 1/3] bpf: fix formatting for bpf_perf_event_read() helper doc

2018-04-30 Thread Quentin Monnet
Some edits brought to the last iteration of BPF helper functions
documentation introduced an error with RST formatting. As a result, most
of one paragraph is rendered in bold text when only the name of a helper
should be. Fix it, and fix formatting of another function name in the
same paragraph.

Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 include/uapi/linux/bpf.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 23b334bba1a6..530ff6588d8f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -828,12 +828,12 @@ union bpf_attr {
  *
  * Also, be aware that the newer helper
  * **bpf_perf_event_read_value**\ () is recommended over
- * **bpf_perf_event_read*\ () in general. The latter has some ABI
+ * **bpf_perf_event_read**\ () in general. The latter has some ABI
  * quirks where error and counter value are used as a return code
  * (which is wrong to do since ranges may overlap). This issue is
- * fixed with bpf_perf_event_read_value(), which at the same time
- * provides more features over the **bpf_perf_event_read**\ ()
- * interface. Please refer to the description of
+ * fixed with **bpf_perf_event_read_value**\ (), which at the same
+ * time provides more features over the **bpf_perf_event_read**\
+ * () interface. Please refer to the description of
  * **bpf_perf_event_read_value**\ () for details.
  * Return
  * The value of the perf event counter read from the map, or a
-- 
2.14.1



[PATCH bpf-next 0/3] bpf: fix formatting for eBPF helper doc

2018-04-30 Thread Quentin Monnet
Hi,
Here is a follow-up set for eBPF helper documentation, with two patches to
fix formatting issues:

- One to fix an error I introduced with the initial set for the doc.
- One for the newly added bpf_get_stack(), that is currently not parsed
  correctly with the Python script (function signature is fine, but
  description and return value appear as empty).

There is no change to text contents in this set.

Quentin Monnet (3):
  bpf: fix formatting for bpf_perf_event_read() helper doc
  bpf: fix formatting for bpf_get_stack() helper documentation
  bpf: update bpf.h uapi header for tools

 include/uapi/linux/bpf.h   | 62 +-
 tools/include/uapi/linux/bpf.h | 62 +-
 2 files changed, 62 insertions(+), 62 deletions(-)

-- 
2.14.1



Re: [PATCH bpf-next 0/2] Fix BPF helpers documentation

2018-04-29 Thread Quentin Monnet
On 29 April 2018 at 00:06, Andrey Ignatov <r...@fb.com> wrote:
> BPF helpers documentation in UAPI refers to kernel ctx structures when it
> has to refer to user visible ones. Fix it.
>
> Andrey Ignatov (2):
>   bpf: Fix helpers ctx struct types in uapi doc
>   bpf: Sync bpf.h to tools/
>
>  include/uapi/linux/bpf.h   | 12 ++--
>  tools/include/uapi/linux/bpf.h | 12 ++--
>  2 files changed, 12 insertions(+), 12 deletions(-)

Correct. Thanks a lot for the fix, Andrey!
Reviewed-by: Quentin Monnet <quentin.mon...@netronome.com>


[PATCH bpf-next v4 03/10] bpf: add documentation for eBPF helpers (12-22)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Alexei:

- bpf_get_current_pid_tgid()
- bpf_get_current_uid_gid()
- bpf_get_current_comm()
- bpf_skb_vlan_push()
- bpf_skb_vlan_pop()
- bpf_skb_get_tunnel_key()
- bpf_skb_set_tunnel_key()
- bpf_redirect()
- bpf_perf_event_output()
- bpf_get_stackid()
- bpf_get_current_task()

v4:
- bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
  note on bpf_redirect_map() providing better performance. Replace "Save
  for" with "Except for".
- bpf_skb_vlan_push(): Clarify comment about invalidated verifier
  checks.
- bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
  checks.
- bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
  mode, and example tunneling protocols with which it can be used.
- bpf_skb_set_tunnel_key(): Add a reference to the description of
  bpf_skb_get_tunnel_key().
- bpf_perf_event_output(): Specify that, and for what purpose, the
  helper can be used with programs attached to TC and XDP.

v3:
- bpf_skb_get_tunnel_key(): Change and improve description and example.
- bpf_redirect(): Improve description of BPF_F_INGRESS flag.
- bpf_perf_event_output(): Fix first sentence of description. Delete
  wrong statement on context being evaluated as a struct pt_reg. Remove
  the long yet incomplete example.
- bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
  configurable.

Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/uapi/linux/bpf.h | 254 +++
 1 file changed, 254 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6ac56df3ea8d..368d83cf02d8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -623,6 +623,260 @@ union bpf_attr {
  * direct packet access.
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * u64 bpf_get_current_pid_tgid(void)
+ * Return
+ * A 64-bit integer containing the current tgid and pid, and
+ * created as such:
+ * *current_task*\ **->tgid << 32 \|**
+ * *current_task*\ **->pid**.
+ *
+ * u64 bpf_get_current_uid_gid(void)
+ * Return
+ * A 64-bit integer containing the current GID and UID, and
+ * created as such: *current_gid* **<< 32 \|** *current_uid*.
+ *
+ * int bpf_get_current_comm(char *buf, u32 size_of_buf)
+ * Description
+ * Copy the **comm** attribute of the current task into *buf* of
+ * *size_of_buf*. The **comm** attribute contains the name of
+ * the executable (excluding the path) for the current task. The
+ * *size_of_buf* must be strictly positive. On success, the
+ * helper makes sure that the *buf* is NUL-terminated. On failure,
+ * it is filled with zeroes.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
+ * Description
+ * Push a *vlan_tci* (VLAN tag control information) of protocol
+ * *vlan_proto* to the packet associated to *skb*, then update
+ * the checksum. Note that if *vlan_proto* is different from
+ * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
+ * be **ETH_P_8021Q**.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_vlan_pop(struct sk_buff *skb)
+ * Description
+ * Pop a VLAN header from the packet associated to *skb*.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative er

[PATCH bpf-next v4 02/10] bpf: add documentation for eBPF helpers (01-11)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Alexei:

- bpf_map_lookup_elem()
- bpf_map_update_elem()
- bpf_map_delete_elem()
- bpf_probe_read()
- bpf_ktime_get_ns()
- bpf_trace_printk()
- bpf_skb_store_bytes()
- bpf_l3_csum_replace()
- bpf_l4_csum_replace()
- bpf_tail_call()
- bpf_clone_redirect()

v4:
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_map_update_elem(): Add "const" qualifier for key and value.
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_skb_store_bytes(): Clarify comment about invalidated verifier
  checks.
- bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
  about bpf_csum_diff().
- bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
  note about bpf_csum_diff().
- bpf_tail_call(): Bring minor edits to description.
- bpf_clone_redirect(): Add a note about the relation with
  bpf_redirect(). Also clarify comment about invalidated verifier
  checks.

v3:
- bpf_map_lookup_elem(): Fix description of restrictions for flags
  related to the existence of the entry.
- bpf_trace_printk(): State that trace_pipe can be configured. Fix
  return value in case an unknown format specifier is met. Add a note on
  kernel log notice when the helper is used. Edit example.
- bpf_tail_call(): Improve comment on stack inheritance.
- bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.

Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/uapi/linux/bpf.h | 230 +++
 1 file changed, 230 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index df28a60d314c..6ac56df3ea8d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -393,6 +393,236 @@ union bpf_attr {
  * intentional, removing them would break paragraphs for rst2man.
  *
  * Start of BPF helper function descriptions:
+ *
+ * void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Perform a lookup in *map* for an entry associated to *key*.
+ * Return
+ * Map value associated to *key*, or **NULL** if no entry was
+ * found.
+ *
+ * int bpf_map_update_elem(struct bpf_map *map, const void *key, const void 
*value, u64 flags)
+ * Description
+ * Add or update the value of the entry associated to *key* in
+ * *map* with *value*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * Flag value **BPF_NOEXIST** cannot be used for maps of types
+ * **BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY**  (all
+ * elements always exist), the helper would return an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_delete_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Delete entry with *key* from *map*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * Description
+ * For tracing programs, safely attempt to read *size* bytes from
+ * address *src* and store the data in *dst*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * u64 bpf_ktime_get_ns(void)
+ * Description
+ * Return the time elapsed since system boot, in nanoseconds.
+ * Return
+ * Current *ktime*.
+ *
+ * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
+ * Description
+ * This helper is a "printk()-like" facility for debugging. It
+ * prints a message defined by format *fmt* (of size *fmt_size*)
+ * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
+ * available. It can take up to three additional **u64**
+ * arguments (as an eBPF helpers, the total number of arguments is
+ * limited to five).
+ *
+ * Each time the helper is called, it appends a line to the trace.
+ * The format of the trace is customizable, and the exact outp

[PATCH bpf-next v4 07/10] bpf: add documentation for eBPF helpers (51-57)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()

Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()

Helper from Josef:
- bpf_override_return()

Helper from Andrey:
- bpf_bind()

v4:
- bpf_perf_event_read_value(): State that this helper should be
  preferred over bpf_perf_event_read().

v3:
- bpf_perf_event_read_value(): Fix time of selection for perf event type
  in description. Remove occurences of "cores" to avoid confusion with
  "CPU".
- bpf_bind(): Remove last paragraph of description, which was off topic.

Cc: Lawrence Brakmo <bra...@fb.com>
Cc: Yonghong Song <y...@fb.com>
Cc: Josef Bacik <jba...@fb.com>
Cc: Andrey Ignatov <r...@fb.com>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Yonghong Song <y...@fb.com>
[for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
Acked-by: Andrey Ignatov <r...@fb.com>
[for bpf_bind()]
---
 include/uapi/linux/bpf.h | 180 +++
 1 file changed, 180 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f06225f3a01d..9fe008dd51e7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1361,6 +1361,28 @@ union bpf_attr {
  * Return
  * 0
  *
+ * int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int 
optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
+ *
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ *   **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ *   **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ *   **TCP_CONGESTION**, **TCP_BPF_IW**,
+ *   **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
  * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
  * Description
  * Grow or shrink the room for data in the packet associated to
@@ -1410,6 +1432,164 @@ union bpf_attr {
  * direct packet access.
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct 
bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf event
+ * counter is selected when *map* is updated with perf event file
+ * descriptors. The *map* is an array whose size is the number of
+ * available CPUs, and each cell contains a value relative to one
+ * CPU. The value to retrieve is indicated by *flags*, that
+ * contains the index of the CPU to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied. In general, **bpf_perf_event_read_value**\ () is
+ * recommended over **bpf_perf_event_read**\ (), which ha

[PATCH bpf-next v4 10/10] bpf: update bpf.h uapi header for tools

2018-04-25 Thread Quentin Monnet
Update tools/include/uapi/linux/bpf.h file in order to reflect the
changes for BPF helper functions documentation introduced in previous
commits.

Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 tools/include/uapi/linux/bpf.h | 1776 +++-
 1 file changed, 1380 insertions(+), 396 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e6679393b687..3b91e22f68c1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -377,412 +377,1396 @@ union bpf_attr {
};
 } __attribute__((aligned(8)));
 
-/* BPF helper function descriptions:
- *
- * void *bpf_map_lookup_elem(, )
- * Return: Map value or NULL
- *
- * int bpf_map_update_elem(, , , flags)
- * Return: 0 on success or negative error
- *
- * int bpf_map_delete_elem(, )
- * Return: 0 on success or negative error
- *
- * int bpf_probe_read(void *dst, int size, void *src)
- * Return: 0 on success or negative error
+/* The description below is an attempt at providing documentation to eBPF
+ * developers about the multiple available eBPF helper functions. It can be
+ * parsed and used to produce a manual page. The workflow is the following,
+ * and requires the rst2man utility:
+ *
+ * $ ./scripts/bpf_helpers_doc.py \
+ * --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
+ * $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
+ * $ man /tmp/bpf-helpers.7
+ *
+ * Note that in order to produce this external documentation, some RST
+ * formatting is used in the descriptions to get "bold" and "italics" in
+ * manual pages. Also note that the few trailing white spaces are
+ * intentional, removing them would break paragraphs for rst2man.
+ *
+ * Start of BPF helper function descriptions:
+ *
+ * void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Perform a lookup in *map* for an entry associated to *key*.
+ * Return
+ * Map value associated to *key*, or **NULL** if no entry was
+ * found.
+ *
+ * int bpf_map_update_elem(struct bpf_map *map, const void *key, const void 
*value, u64 flags)
+ * Description
+ * Add or update the value of the entry associated to *key* in
+ * *map* with *value*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * Flag value **BPF_NOEXIST** cannot be used for maps of types
+ * **BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY**  (all
+ * elements always exist), the helper would return an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_delete_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Delete entry with *key* from *map*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * Description
+ * For tracing programs, safely attempt to read *size* bytes from
+ * address *src* and store the data in *dst*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
  *
  * u64 bpf_ktime_get_ns(void)
- * Return: current ktime
- *
- * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
- * Return: length of buffer written or negative error
- *
- * u32 bpf_prandom_u32(void)
- * Return: random value
- *
- * u32 bpf_raw_smp_processor_id(void)
- * Return: SMP processor ID
- *
- * int bpf_skb_store_bytes(skb, offset, from, len, flags)
- * store bytes into packet
- * @skb: pointer to skb
- * @offset: offset within packet from skb->mac_header
- * @from: pointer where to copy bytes from
- * @len: number of bytes to store into packet
- * @flags: bit 0 - if true, recompute skb->csum
- * other bits - reserved
- * Return: 0 on success or negative error
- *
- * int bpf_l3_csum_replace(skb, offset, from, to, flags)
- * recompute IP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where IP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * other bits - reserved
- * Return: 0 on success or negative error
- *
- * int bpf_l4_csum_replace(skb, offset, from, to, flags)
- * recompute TCP/UDP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where TCP/UDP checksum is located
- * @from: old value of header field
- * @to: new value of header

[PATCH bpf-next v4 05/10] bpf: add documentation for eBPF helpers (33-41)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Daniel:

- bpf_get_hash_recalc()
- bpf_skb_change_tail()
- bpf_skb_pull_data()
- bpf_csum_update()
- bpf_set_hash_invalid()
- bpf_get_numa_node_id()
- bpf_set_hash()
- bpf_skb_adjust_room()
- bpf_xdp_adjust_meta()

v4:
- bpf_skb_change_tail(): Clarify comment about invalidated verifier
  checks.
- bpf_skb_pull_data(): Clarify the motivation for using this helper or
  bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
  *skb*. Clarify comment about invalidated verifier checks.
- bpf_csum_update(): Fix description of checksum (entire packet, not IP
  checksum). Fix a typo: "header" instead of "helper".
- bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
- bpf_get_numa_node_id(): State that the helper is not restricted to
  programs attached to sockets.
- bpf_skb_adjust_room(): Clarify comment about invalidated verifier
  checks.
- bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
  checks.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/uapi/linux/bpf.h | 164 +++
 1 file changed, 164 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 28681cef14d4..d1207ba39280 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1071,9 +1071,173 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * u32 bpf_get_hash_recalc(struct sk_buff *skb)
+ * Description
+ * Retrieve the hash of the packet, *skb*\ **->hash**. If it is
+ * not set, in particular if the hash was cleared due to mangling,
+ * recompute this hash. Later accesses to the hash can be done
+ * directly with *skb*\ **->hash**.
+ *
+ * Calling **bpf_set_hash_invalid**\ (), changing a packet
+ * prototype with **bpf_skb_change_proto**\ (), or calling
+ * **bpf_skb_store_bytes**\ () with the
+ * **BPF_F_INVALIDATE_HASH** are actions susceptible to clear
+ * the hash and to trigger a new computation for the next call to
+ * **bpf_get_hash_recalc**\ ().
+ * Return
+ * The 32-bit hash.
+ *
  * u64 bpf_get_current_task(void)
  * Return
  * A pointer to the current task struct.
+ *
+ * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
+ * Description
+ * Resize (trim or grow) the packet associated to *skb* to the
+ * new *len*. The *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * The basic idea is that the helper performs the needed work to
+ * change the size of the packet, then the eBPF program rewrites
+ * the rest via helpers like **bpf_skb_store_bytes**\ (),
+ * **bpf_l3_csum_replace**\ (), **bpf_l3_csum_replace**\ ()
+ * and others. This helper is a slow path utility intended for
+ * replies with control messages. And because it is targeted for
+ * slow path, the helper itself can afford to be slow: it
+ * implicitly linearizes, unclones and drops offloads from the
+ * *skb*.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
+ * Description
+ * Pull in non-linear data in case the *skb* is non-linear and not
+ * all of *len* are part of the linear section. Make *len* bytes
+ * from *skb* readable and writable. If a zero value is passed for
+ * *len*, then the whole length of the *skb* is pulled.
+ *
+ * This helper is only needed for reading and writing with direct
+ * packet access.
+ *
+ * For direct packet access, testing that offsets to access
+ * are within packet boundaries (test on *skb*\ **->data_end**) is
+ * susceptible to fail if offsets are invalid, or 

[PATCH bpf-next v4 09/10] bpf: add documentation for eBPF helpers (65-66)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helper from Nikita:
- bpf_xdp_adjust_tail()

Helper from Eyal:
- bpf_skb_get_xfrm_state()

v4:
- New patch (helpers did not exist yet for previous versions).

Cc: Nikita V. Shirokov <tehn...@tehnerd.com>
Cc: Eyal Birger <eyal.bir...@gmail.com>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 include/uapi/linux/bpf.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 335ac427d43b..3b91e22f68c1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1737,6 +1737,36 @@ union bpf_attr {
  * must be set to zero.
  * Return
  * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
+ * only possible to shrink the packet as of this writing,
+ * therefore *delta* must be a negative integer.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct 
bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
+ * Description
+ * Retrieve the XFRM state (IP transform framework, see also
+ * **ip-xfrm(8)**) at *index* in XFRM "security path" for *skb*.
+ *
+ * The retrieved value is stored in the **struct bpf_xfrm_state**
+ * pointed by *xfrm_state* and of length *size*.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_XFRM** configuration option.
+ * Return
+ * 0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
-- 
2.14.1



[PATCH bpf-next v4 06/10] bpf: add documentation for eBPF helpers (42-50)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions:

Helper from Kaixu:
- bpf_perf_event_read()

Helpers from Martin:
- bpf_skb_under_cgroup()
- bpf_xdp_adjust_head()

Helpers from Sargun:
- bpf_probe_write_user()
- bpf_current_task_under_cgroup()

Helper from Thomas:
- bpf_skb_change_head()

Helper from Gianluca:
- bpf_probe_read_str()

Helpers from Chenbo:
- bpf_get_socket_cookie()
- bpf_get_socket_uid()

v4:
- bpf_perf_event_read(): State that bpf_perf_event_read_value() should
  be preferred over this helper.
- bpf_skb_change_head(): Clarify comment about invalidated verifier
  checks.
- bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
  checks.
- bpf_probe_write_user(): Add that dst must be a valid user space
  address.
- bpf_get_socket_cookie(): Improve description by making clearer that
  the cockie belongs to the socket, and state that it remains stable for
  the life of the socket.

v3:
- bpf_perf_event_read(): Fix time of selection for perf event type in
  description. Remove occurences of "cores" to avoid confusion with
  "CPU".

Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Sargun Dhillon <sar...@sargun.me>
Cc: Thomas Graf <tg...@suug.ch>
Cc: Gianluca Borello <g.bore...@gmail.com>
Cc: Chenbo Feng <fe...@google.com>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Martin KaFai Lau <ka...@fb.com>
[for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
---
 include/uapi/linux/bpf.h | 172 +++
 1 file changed, 172 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d1207ba39280..f06225f3a01d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -810,6 +810,35 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
+ * Description
+ * Read the value of a perf event counter. This helper relies on a
+ * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of
+ * the perf event counter is selected when *map* is updated with
+ * perf event file descriptors. The *map* is an array whose size
+ * is the number of available CPUs, and each cell contains a value
+ * relative to one CPU. The value to retrieve is indicated by
+ * *flags*, that contains the index of the CPU to look up, masked
+ * with **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU should be retrieved.
+ *
+ * Note that before Linux 4.13, only hardware perf event can be
+ * retrieved.
+ *
+ * Also, be aware that the newer helper
+ * **bpf_perf_event_read_value**\ () is recommended over
+ * **bpf_perf_event_read*\ () in general. The latter has some ABI
+ * quirks where error and counter value are used as a return code
+ * (which is wrong to do since ranges may overlap). This issue is
+ * fixed with bpf_perf_event_read_value(), which at the same time
+ * provides more features over the **bpf_perf_event_read**\ ()
+ * interface. Please refer to the description of
+ * **bpf_perf_event_read_value**\ () for details.
+ * Return
+ * The value of the perf event counter read from the map, or a
+ * negative error code in case of failure.
+ *
  * int bpf_redirect(u32 ifindex, u64 flags)
  * Description
  * Redirect the packet to another net device of index *ifindex*.
@@ -1071,6 +1100,17 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 
index)
+ * Description
+ * Check whether *skb* is a descendant of the cgroup2 held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends on the result of the test, and can be:
+ *
+ * * 0, if the *skb* failed the cgroup2 descendant test.
+ * * 1, if the *skb* succeeded the cgroup2 descendant test.
+ * * A negative error code, if an error occurred.
+ *
  * u32 bpf_get_hash_recalc(struct sk_buff *skb)
  * Description
  * Ret

[PATCH bpf-next v4 08/10] bpf: add documentation for eBPF helpers (58-64)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by John:

- bpf_redirect_map()
- bpf_sk_redirect_map()
- bpf_sock_map_update()
- bpf_msg_redirect_map()
- bpf_msg_apply_bytes()
- bpf_msg_cork_bytes()
- bpf_msg_pull_data()

v4:
- bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
  "his" to "this". Also add a paragraph on performance improvement over
  bpf_redirect() helper.

v3:
- bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
  for generic XDP but supported on native XDP.
- bpf_msg_pull_data(): Clarify comment about invalidated verifier
  checks.

Cc: Jesper Dangaard Brouer <bro...@redhat.com>
Cc: John Fastabend <john.fastab...@gmail.com>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
---
 include/uapi/linux/bpf.h | 147 +++
 1 file changed, 147 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9fe008dd51e7..335ac427d43b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1404,6 +1404,56 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the endpoint referenced by *map* at
+ * index *key*. Depending on its type, this *map* can contain
+ * references to net devices (for forwarding packets through other
+ * ports), or to CPUs (for redirecting XDP frames to another CPU;
+ * but this is only implemented for native XDP (with driver
+ * support) as of this writing).
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * When used to redirect packets to net devices, this helper
+ * provides a high performance increase over **bpf_redirect**\ ().
+ * This is due to various implementation details of the underlying
+ * mechanisms, one of which is the fact that **bpf_redirect_map**\
+ * () tries to send packet as a "bulk" to the device.
+ * Return
+ * **XDP_REDIRECT** on success, or **XDP_ABORTED** on error.
+ *
+ * int bpf_sk_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_sock_map_update(struct bpf_sock_ops_kern *skops, struct bpf_map 
*map, void *key, u64 flags)
+ * Description
+ * Add an entry to, or update a *map* referencing sockets. The
+ * *skops* is used as a new value for the entry associated to
+ * *key*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * If the *map* has eBPF programs (parser and verdict), those will
+ * be inherited by the socket being added. If the socket is
+ * already attached to eBPF programs, this results in an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
  * int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
  * Description
  * Adjust the address pointed by *xdp_md*\ **->data_meta** by
@@ -1574,6 +1624,103 @@ union bpf_attr {
  * be set is returned (which comes down to 0 if all bits were set
  * as required).
  *
+ * int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 
key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ *

[PATCH bpf-next v4 04/10] bpf: add documentation for eBPF helpers (23-32)

2018-04-25 Thread Quentin Monnet
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.

The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.

This patch contains descriptions for the following helper functions, all
written by Daniel:

- bpf_get_prandom_u32()
- bpf_get_smp_processor_id()
- bpf_get_cgroup_classid()
- bpf_get_route_realm()
- bpf_skb_load_bytes()
- bpf_csum_diff()
- bpf_skb_get_tunnel_opt()
- bpf_skb_set_tunnel_opt()
- bpf_skb_change_proto()
- bpf_skb_change_type()

v4:
- bpf_get_prandom_u32(): Warn that the prng is not cryptographically
  secure.
- bpf_get_smp_processor_id(): Fix a typo (case).
- bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
  being limited to cgroup v1, and to egress path.
- bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
  Add a note about usage with TC and advantage of clsact. Fix a typo in
  return value ("sdb" instead of "skb").
- bpf_skb_load_bytes(): Make explicit loading large data loads it to the
  eBPF stack.
- bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
  bpf_l3|l4_csum_replace().
- bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
  metadata" mode, and example of this with Geneve.
- bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
  description.
- bpf_skb_change_proto(): Mention that the main use case is NAT64.
  Clarify comment about invalidated verifier checks.

v3:
- bpf_get_prandom_u32(): Fix helper name :(. Add description, including
  a note on the internal random state.
- bpf_get_smp_processor_id(): Add description, including a note on the
  processor id remaining stable during program run.
- bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
  required to use the helper. Add a reference to related documentation.
  State that placing a task in net_cls controller disables cgroup-bpf.
- bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
  required to use this helper.
- bpf_skb_load_bytes(): Fix comment on current use cases for the helper.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.mon...@netronome.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/uapi/linux/bpf.h | 197 +++
 1 file changed, 197 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 368d83cf02d8..28681cef14d4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -495,6 +495,27 @@ union bpf_attr {
  * The number of bytes written to the buffer, or a negative error
  * in case of failure.
  *
+ * u32 bpf_get_prandom_u32(void)
+ * Description
+ * Get a pseudo-random number.
+ *
+ * From a security point of view, this helper uses its own
+ * pseudo-random internal state, and cannot be used to infer the
+ * seed of other random functions in the kernel. However, it is
+ * essential to note that the generator used by the helper is not
+ * cryptographically secure.
+ * Return
+ * A random 32-bit unsigned value.
+ *
+ * u32 bpf_get_smp_processor_id(void)
+ * Description
+ * Get the SMP (symmetric multiprocessing) processor id. Note that
+ * all programs run with preemption disabled, which means that the
+ * SMP processor id is stable during all the execution of the
+ * program.
+ * Return
+ * The SMP id of the processor running the program.
+ *
  * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, 
u32 len, u64 flags)
  * Description
  * Store *len* bytes from address *from* into the packet
@@ -647,6 +668,32 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
+ * Description
+ * Retrieve the classid for the current task, i.e. for the net_cls
+ * cgroup to which *skb* belongs.
+ *
+ * This helper can be used on TC egress path, but not on ingress.
+ *
+ * The net_cls cgroup provides an interface to tag network packets
+ * based on a user-provided identifier for all traffic coming from
+ * the tasks belonging to the related cgroup. See also the related
+ * kernel documentation, available from the Linux sources in file
+ * *Documentation/cgroup-v1/net_cls.txt*.
+ *
+ * The Linux kernel has two versions for cgroups: there are
+ * cgroups v1 and cgroups v2. Both are available to users, who can
+ * 

  1   2   >