Re: [PATCH bpf-next] tools: bpftool: add a command to dump the trace pipe

2018-12-05 Thread Daniel Borkmann
On 12/05/2018 11:28 AM, Quentin Monnet wrote:
> BPF programs can use the bpf_trace_printk() helper to print debug
> information into the trace pipe. Add a subcommand
> "bpftool prog tracelog" to simply dump this pipe to the console.
> 
> This is for a good part copied from iproute2, where the feature is
> available with "tc exec bpf dbg". Changes include dumping pipe content
> to stdout instead of stderr and adding JSON support (content is dumped
> as an array of strings, one per line read from the pipe). This version
> is dual-licensed, with Daniel's permission.
> 
> Cc: Daniel Borkmann 
> Suggested-by: Daniel Borkmann 
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jakub Kicinski 

Looks good, applied, thanks!


Re: [PATCH bpf 0/3] bpf: improve verifier resilience

2018-12-04 Thread Daniel Borkmann
On 12/04/2018 07:46 AM, Alexei Starovoitov wrote:
> Three patches to improve verifier ability to handle pathological bpf programs
> with a lot of branches:
> - make sure prog_load syscall can be aborted
> - improve branch taken analysis
> - introduce per-insn complexity limit for unprivileged programs
> 
> Alexei Starovoitov (3):
>   bpf: check pending signals while verifying programs
>   bpf: improve verifier branch analysis
>   bpf: add per-insn complexity limit
> 
>  kernel/bpf/verifier.c   | 103 +---
>  tools/testing/selftests/bpf/test_verifier.c |   4 +-
>  2 files changed, 91 insertions(+), 16 deletions(-)
> 

Applied to bpf, thanks!


Re: [PATCH v2] samples: bpf: fix: seg fault with NULL pointer arg

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 11:39 AM, Daniel T. Lee wrote:
> When NULL pointer accidentally passed to write_kprobe_events,
> due to strlen(NULL), segmentation fault happens.
> Changed code returns -1 to deal with this situation.
> 
> Bug issued with Smatch, static analysis.
> 
> Signed-off-by: Daniel T. Lee 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next] bpf: fix documentation for eBPF helpers

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 01:13 PM, Quentin Monnet wrote:
> The missing indentation on the "Return" sections for bpf_map_pop_elem()
> and bpf_map_peek_elem() helpers break RST and man pages generation. This
> patch fixes them, and moves the description of those two helpers towards
> the end of the list (even though they are somehow related to the three
> first helpers for maps, the man page explicitly states that the helpers
> are sorted in chronological order).
> 
> While at it, bring other minor formatting edits for eBPF helpers
> documentation: mostly blank lines removal, RST formatting, or other
> small nits for consistency.
> 
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jakub Kicinski 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next v2] bpf: allow BPF read access to qdisc pkt_len

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 02:18 AM, Willem de Bruijn wrote:
> From: Petar Penkov 
> 
> The pkt_len field in qdisc_skb_cb stores the skb length as it will
> appear on the wire after segmentation. For byte accounting, this value
> is more accurate than skb->len. It is computed on entry to the TC
> layer, so only valid there.
> 
> Allow read access to this field from BPF tc classifier and action
> programs. The implementation is analogous to tc_classid, aside from
> restricting to read access.
> 
> To distinguish it from skb->len and self-describe export as wire_len.
> 
> Changes v1->v2
>   - Rename pkt_len to wire_len
> 
> Signed-off-by: Petar Penkov 
> Signed-off-by: Vlad Dumitrescu 
> Signed-off-by: Willem de Bruijn 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next] libbpf: Fix license in README.rst

2018-12-03 Thread Daniel Borkmann
On 12/03/2018 08:48 AM, Song Liu wrote:
> On Sun, Dec 2, 2018 at 1:04 PM Andrey Ignatov  wrote:
>>
>> The whole libbpf is licensed as (LGPL-2.1 OR BSD-2-Clause). I missed it
>> while adding README.rst. Fix it and use same license as all other files
>> in libbpf do. Since I'm the only author of README.rst so far, no others'
>> permissions should be needed.
>>
>> Fixes: 76d1b894c515 ("libbpf: Document API and ABI conventions")
>> Signed-off-by: Andrey Ignatov 
> Acked-by: Song Liu 

Applied, thanks!


Re: [PATCH bpf-next] bpf: allow BPF read access to qdisc pkt_len

2018-11-30 Thread Daniel Borkmann
On 12/01/2018 12:42 AM, Willem de Bruijn wrote:
> On Fri, Nov 30, 2018 at 5:48 PM Song Liu  wrote:
>>
>> On Fri, Nov 30, 2018 at 12:09 PM Willem de Bruijn
>>  wrote:
>>>
>>> From: Petar Penkov 
>>>
>>> The pkt_len field in qdisc_skb_cb stores the skb length as it will
>>> appear on the wire after segmentation. For byte accounting, this value
>>> is more accurate than skb->len. It is computed on entry to the TC
>>> layer, so only valid there.
>>>
>>> Allow read access to this field from BPF tc classifier and action
>>> programs. The implementation is analogous to tc_classid, aside from
>>> restricting to read access.
>>>
>>> Signed-off-by: Petar Penkov 
>>> Signed-off-by: Vlad Dumitrescu 
>>> Signed-off-by: Willem de Bruijn 
>>> ---
>>>  include/uapi/linux/bpf.h|  1 +
>>>  net/core/filter.c   | 16 +++
>>>  tools/include/uapi/linux/bpf.h  |  1 +
>>>  tools/testing/selftests/bpf/test_verifier.c | 32 +
>>>  4 files changed, 50 insertions(+)
>>
>> Please split this into 3 patches:
>> 1 for include/uapi/linux/bpf.h and filter.c
>> 1 for tools/include/uapi/linux/bpf.h
>> 1 for tools/testing/selftests/bpf/test_verifier.c
>>
>> Other than this
>> Acked-by: Song Liu 
> 
> Thanks for the fast review.
> 
> I'm happy to resend in three parts, of course, but am curious: what is
> the rationale for splitting this up?
> 
> This patch follows the process for commit  f11216b24219 ("bpf: add
> skb->tstamp r/w access from tc clsact and cg skb progs"), which went
> in as a single patch just last week.

Yeah, I think it's fine as is, one small thing I'm wondering though is
given that we now would have both 'skb->len' and 'skb->pkt_len', would
it be more intuitive for a BPF program developer to distinguish the two
by having the latter named e.g. 'skb->wire_len' so it's slightly more
obvious that it's including header size at post-segmentation? If not
probably some comment in the uapi header similar as in qdisc_pkt_len_init()
might be helpful in any case.

Thanks,
Daniel


[PATCH bpf] bpf: fix pointer offsets in context for 32 bit

2018-11-30 Thread Daniel Borkmann
Currently, pointer offsets in three BPF context structures are
broken in two scenarios: i) 32 bit compiled applications running
on 64 bit kernels, and ii) LLVM compiled BPF programs running
on 32 bit kernels. The latter is due to BPF target machine being
strictly 64 bit. So in each of the cases the offsets will mismatch
in verifier when checking / rewriting context access. Fix this by
providing a helper macro __bpf_md_ptr() that will enforce padding
up to 64 bit and proper alignment, and for context access a macro
bpf_ctx_range_ptr() which will cover full 64 bit member range on
32 bit archs. For flow_keys, we additionally need to force the
size check to sizeof(__u64) as with other pointer types.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket 
TX/RX data")
Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Reported-by: David S. Miller 
Signed-off-by: Daniel Borkmann 
Acked-by: David S. Miller 
---
 include/linux/filter.h |  7 +++
 include/uapi/linux/bpf.h   | 17 -
 net/core/filter.c  | 16 
 tools/include/uapi/linux/bpf.h | 17 -
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 448dcc4..795ff0b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -449,6 +449,13 @@ struct sock_reuseport;
offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
 #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2) 
\
offsetof(TYPE, MEMBER1) ... offsetofend(TYPE, MEMBER2) - 1
+#if BITS_PER_LONG == 64
+# define bpf_ctx_range_ptr(TYPE, MEMBER)   
\
+   offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
+#else
+# define bpf_ctx_range_ptr(TYPE, MEMBER)   
\
+   offsetof(TYPE, MEMBER) ... offsetof(TYPE, MEMBER) + 8 - 1
+#endif /* BITS_PER_LONG == 64 */
 
 #define bpf_target_off(TYPE, MEMBER, SIZE, PTR_SIZE)   
\
({  
\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 852dc17..426b5c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2422,6 +2422,12 @@ enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6_INLINE
 };
 
+#define __bpf_md_ptr(type, name)   \
+union {\
+   type name;  \
+   __u64 :64;  \
+} __attribute__((aligned(8)))
+
 /* user accessible mirror of in-kernel sk_buff.
  * new fields can only be added to the end of this structure
  */
@@ -2456,7 +2462,7 @@ struct __sk_buff {
/* ... here. */
 
__u32 data_meta;
-   struct bpf_flow_keys *flow_keys;
+   __bpf_md_ptr(struct bpf_flow_keys *, flow_keys);
 };
 
 struct bpf_tunnel_key {
@@ -2572,8 +2578,8 @@ enum sk_action {
  * be added to the end of this structure
  */
 struct sk_msg_md {
-   void *data;
-   void *data_end;
+   __bpf_md_ptr(void *, data);
+   __bpf_md_ptr(void *, data_end);
 
__u32 family;
__u32 remote_ip4;   /* Stored in network byte order */
@@ -2589,8 +2595,9 @@ struct sk_reuseport_md {
 * Start of directly accessible data. It begins from
 * the tcp/udp header.
 */
-   void *data;
-   void *data_end; /* End of directly accessible data */
+   __bpf_md_ptr(void *, data);
+   /* End of directly accessible data */
+   __bpf_md_ptr(void *, data_end);
/*
 * Total length of packet (starting from the tcp/udp header).
 * Note that the directly accessible bytes (data_end - data)
diff --git a/net/core/filter.c b/net/core/filter.c
index 9a1327e..6ee605d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5435,8 +5435,8 @@ static bool bpf_skb_is_valid_access(int off, int size, 
enum bpf_access_type type
if (size != size_default)
return false;
break;
-   case bpf_ctx_range(struct __sk_buff, flow_keys):
-   if (size != sizeof(struct bpf_flow_keys *))
+   case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
+   if (size != sizeof(__u64))
return false;
break;
default:
@@ -5464,7 +5464,7 @@ static bool sk_filter_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, data_end):
-   case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
}
@@ 

Re: BPF uapi structures and 32-bit

2018-11-30 Thread Daniel Borkmann
On 12/01/2018 12:33 AM, Alexei Starovoitov wrote:
> On Wed, Nov 28, 2018 at 11:02:00AM -0800, David Miller wrote:
>> From: Daniel Borkmann 
>> Date: Wed, 28 Nov 2018 11:34:55 +0100
>>
>>> Yeah fully agree. Thinking diff below should address it, do you
>>> have a chance to give this a spin for sparc / 32 bit to check if
>>> test_verifier still explodes?
>>
>> Great, let me play with this.
>>
>> I did something simpler yesterday, just changing the data pointers to
>> "u64" and that made at least one test pass that didn't before :-)
>>
>> I'll get back to you with results.
> 
> Did you have a chance to test it ?

David got back to me and mentioned it worked fine on sparc.

> I'd like to add a tested-by before I apply Daniel's patch
> which looks good to me. btw.

Yeah, wanted to cook an official patch today, let me do that now.


pull-request: bpf-next 2018-11-30

2018-11-29 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

(Getting out bit earlier this time to pull in a dependency from bpf.)

The main changes are:

1) Add libbpf ABI versioning and document API naming conventions
   as well as ABI versioning process, from Andrey.

2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
   programs that is used in conjunction with sk_msg_push_data()
   for adding / removing meta data to the msg data, from John.

3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
   lib and testsuite build failures on 32 bit, from David.

4) Make BPF prog dump for !JIT identical to how we dump subprogs
   when JIT is in use, from Yonghong.

5) Rename btf_get_from_id() to make it more conform with libbpf
   API naming conventions, from Martin.

6) Add a missing BPF kselftest config item, from Naresh.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!



The following changes since commit 4afe60a97ba6ffacc4d030b13653dc64099fea26:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (2018-11-26 
13:08:17 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to b42699547fc9fb1057795bccc21a6445743a7fde:

  tools/bpf: make libbpf _GNU_SOURCE friendly (2018-11-30 02:41:02 +0100)


Alexei Starovoitov (2):
  Merge branch 'non-jit-btf-func_info'
  Merge branch 'libbpf-versioning-doc'

Andrey Ignatov (3):
  libbpf: Add version script for DSO
  libbpf: Verify versioned symbols
  libbpf: Document API and ABI conventions

Daniel Borkmann (1):
  Merge branch 'bpf-sk-msg-pop-data'

David Miller (2):
  bpf: Avoid unnecessary instruction in convert_bpf_ld_abs()
  bpf: Fix various lib and testsuite build failures on 32-bit.

John Fastabend (3):
  bpf: helper to pop data from messages
  bpf: add msg_pop_data helper to tools
  bpf: test_sockmap, add options for msg_pop_data() helper

Martin KaFai Lau (1):
  libbpf: Name changing for btf_get_from_id

Naresh Kamboju (1):
  selftests/bpf: add config fragment CONFIG_FTRACE_SYSCALLS

Yonghong Song (3):
  bpf: btf: support proper non-jit func info
  tools/bpf: change selftest test_btf for both jit and non-jit
  tools/bpf: make libbpf _GNU_SOURCE friendly

 include/linux/bpf.h |   6 +-
 include/linux/bpf_verifier.h|   1 -
 include/uapi/linux/bpf.h|  16 ++-
 kernel/bpf/core.c   |   3 +-
 kernel/bpf/syscall.c|  33 ++---
 kernel/bpf/verifier.c   |  55 +---
 net/core/filter.c   | 174 +++-
 net/ipv4/tcp_bpf.c  |  17 ++-
 net/tls/tls_sw.c|  11 +-
 tools/bpf/bpftool/map.c |   4 +-
 tools/bpf/bpftool/prog.c|   2 +-
 tools/include/uapi/linux/bpf.h  |  16 ++-
 tools/lib/bpf/Makefile  |  23 +++-
 tools/lib/bpf/README.rst| 139 +++
 tools/lib/bpf/btf.c |   4 +-
 tools/lib/bpf/btf.h |   2 +-
 tools/lib/bpf/libbpf.c  |   2 +
 tools/lib/bpf/libbpf.map| 121 
 tools/lib/bpf/libbpf_errno.c|   1 +
 tools/testing/selftests/bpf/bpf_helpers.h   |   2 +
 tools/testing/selftests/bpf/config  |   1 +
 tools/testing/selftests/bpf/test_btf.c  |  35 +
 tools/testing/selftests/bpf/test_progs.c|  10 +-
 tools/testing/selftests/bpf/test_sockmap.c  | 127 -
 tools/testing/selftests/bpf/test_sockmap_kern.h |  70 --
 25 files changed, 760 insertions(+), 115 deletions(-)
 create mode 100644 tools/lib/bpf/README.rst
 create mode 100644 tools/lib/bpf/libbpf.map


Re: [PATCH bpf-next v2] tools/bpf: make libbpf _GNU_SOURCE friendly

2018-11-29 Thread Daniel Borkmann
On 11/30/2018 12:47 AM, Jakub Kicinski wrote:
> On Thu, 29 Nov 2018 15:31:45 -0800, Yonghong Song wrote:
>> During porting libbpf to bcc, I got some warnings like below:
>>   ...
>>   [  2%] Building C object 
>> src/cc/CMakeFiles/bpf-shared.dir/libbpf/src/libbpf.c.o
>>   /home/yhs/work/bcc2/src/cc/libbpf/src/libbpf.c:12:0:
>>   warning: "_GNU_SOURCE" redefined [enabled by default]
>>#define _GNU_SOURCE
>>   ...
>>   [  3%] Building C object 
>> src/cc/CMakeFiles/bpf-shared.dir/libbpf/src/libbpf_errno.c.o
>>   /home/yhs/work/bcc2/src/cc/libbpf/src/libbpf_errno.c: In function 
>> ‘libbpf_strerror’:
>>   /home/yhs/work/bcc2/src/cc/libbpf/src/libbpf_errno.c:45:7:
>>   warning: assignment makes integer from pointer without a cast [enabled by 
>> default]
>>  ret = strerror_r(err, buf, size);
>>   ...
>>
>> bcc is built with _GNU_SOURCE defined and this caused the above warning.
>> This patch intends to make libpf _GNU_SOURCE friendly by
>>   . define _GNU_SOURCE in libbpf.c unless it is not defined
>>   . undefine _GNU_SOURCE as non-gnu version of strerror_r is expected.
>>
>> Signed-off-by: Yonghong Song 
> 
> Acked-by: Jakub Kicinski 
> 
> FWIW :)
> 

Applied, thanks.


Re: [PATCH] selftests/bpf: add config fragment CONFIG_FTRACE_SYSCALLS

2018-11-28 Thread Daniel Borkmann
On 11/27/2018 04:24 PM, Naresh Kamboju wrote:
> CONFIG_FTRACE_SYSCALLS=y is required for get_cgroup_id_user test case
> this test reads a file from debug trace path
> /sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id
> 
> Signed-off-by: Naresh Kamboju 
> ---
>  tools/testing/selftests/bpf/config | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/bpf/config 
> b/tools/testing/selftests/bpf/config
> index 7f90d3645af8..37f947ec44ed 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -22,3 +22,4 @@ CONFIG_NET_CLS_FLOWER=m
>  CONFIG_LWTUNNEL=y
>  CONFIG_BPF_STREAM_PARSER=y
>  CONFIG_XDP_SOCKETS=y
> +CONFIG_FTRACE_SYSCALLS=y
> 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next v2 0/3] bpf: add sk_msg helper sk_msg_pop_data

2018-11-28 Thread Daniel Borkmann
On 11/26/2018 11:16 PM, John Fastabend wrote:
> After being able to add metadata to messages with sk_msg_push_data we
> have also found it useful to be able to "pop" this metadata off before
> sending it to applications in some cases. This series adds a new helper
> sk_msg_pop_data() and the associated patches to add tests and tools/lib
> support.
> 
> Thanks!
> 
> v2: Daniel caught that we missed adding sk_msg_pop_data to the changes
> data helper so that the verifier ensures BPF programs revalidate
> data after using this helper. Also improve documentation adding a
> return description and using RST syntax per Quentin's comment. And
> delta calculations for DROP with pop'd data (albeit a strange set
> of operations for a program to be doing) had potential to be
> incorrect possibly confusing user space applications, so fix it.
> 
> John Fastabend (3):
>   bpf: helper to pop data from messages
>   bpf: add msg_pop_data helper to tools
>   bpf: test_sockmap, add options for msg_pop_data() helper usage
> 
>  include/uapi/linux/bpf.h|  13 +-
>  net/core/filter.c   | 169 
> 
>  net/ipv4/tcp_bpf.c  |  14 +-
>  tools/include/uapi/linux/bpf.h  |  13 +-
>  tools/testing/selftests/bpf/bpf_helpers.h   |   2 +
>  tools/testing/selftests/bpf/test_sockmap.c  | 127 +-
>  tools/testing/selftests/bpf/test_sockmap_kern.h |  70 --
>  7 files changed, 386 insertions(+), 22 deletions(-)
> 

Applied to bpf-next, thanks!


Re: BPF uapi structures and 32-bit

2018-11-28 Thread Daniel Borkmann
On 11/27/2018 11:25 PM, David Miller wrote:
> 
> In the linux/bpf.h UAPI header, we must absolutely avoid any
> non-fixed-sized types.
> 
> Otherwise we have serious problems on 32-bit.
> 
> Unfortunately I discovered today that we have take on two such cases,
> sk_msg_md and sk_reuseport_md, both of which start with two void
> pointers.
> 
> I hit this because test_verifier.c on my sparc64 box was built as
> a 32-bit binary and this causes a hundred or so tests to fail, and
> many if not all are because of the changing struct layout.
> 
> I could built 64-bit but this absolutely should work properly.
> 
> But for fully native 32-bit it is even worse, this will really
> need to be resolved because llvm/clang is always going to build
> the BPF programs with 64-bit void pointers.
> 
> I would strongly suggest we try and fix this now if we can.

Yeah fully agree. Thinking diff below should address it, do you
have a chance to give this a spin for sparc / 32 bit to check if
test_verifier still explodes?

Thanks a lot,
Daniel

 include/linux/filter.h |  7 +++
 include/uapi/linux/bpf.h   | 17 -
 net/core/filter.c  | 16 
 tools/include/uapi/linux/bpf.h | 17 -
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 448dcc4..795ff0b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -449,6 +449,13 @@ struct sock_reuseport;
offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
 #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2) 
\
offsetof(TYPE, MEMBER1) ... offsetofend(TYPE, MEMBER2) - 1
+#if BITS_PER_LONG == 64
+# define bpf_ctx_range_ptr(TYPE, MEMBER)   
\
+   offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
+#else
+# define bpf_ctx_range_ptr(TYPE, MEMBER)   
\
+   offsetof(TYPE, MEMBER) ... offsetof(TYPE, MEMBER) + 8 - 1
+#endif /* BITS_PER_LONG == 64 */

 #define bpf_target_off(TYPE, MEMBER, SIZE, PTR_SIZE)   
\
({  
\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 852dc17..426b5c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2422,6 +2422,12 @@ enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6_INLINE
 };

+#define __bpf_md_ptr(type, name)   \
+union {\
+   type name;  \
+   __u64 :64;  \
+} __attribute__((aligned(8)))
+
 /* user accessible mirror of in-kernel sk_buff.
  * new fields can only be added to the end of this structure
  */
@@ -2456,7 +2462,7 @@ struct __sk_buff {
/* ... here. */

__u32 data_meta;
-   struct bpf_flow_keys *flow_keys;
+   __bpf_md_ptr(struct bpf_flow_keys *, flow_keys);
 };

 struct bpf_tunnel_key {
@@ -2572,8 +2578,8 @@ enum sk_action {
  * be added to the end of this structure
  */
 struct sk_msg_md {
-   void *data;
-   void *data_end;
+   __bpf_md_ptr(void *, data);
+   __bpf_md_ptr(void *, data_end);

__u32 family;
__u32 remote_ip4;   /* Stored in network byte order */
@@ -2589,8 +2595,9 @@ struct sk_reuseport_md {
 * Start of directly accessible data. It begins from
 * the tcp/udp header.
 */
-   void *data;
-   void *data_end; /* End of directly accessible data */
+   __bpf_md_ptr(void *, data);
+   /* End of directly accessible data */
+   __bpf_md_ptr(void *, data_end);
/*
 * Total length of packet (starting from the tcp/udp header).
 * Note that the directly accessible bytes (data_end - data)
diff --git a/net/core/filter.c b/net/core/filter.c
index 9a1327e..6ee605d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5435,8 +5435,8 @@ static bool bpf_skb_is_valid_access(int off, int size, 
enum bpf_access_type type
if (size != size_default)
return false;
break;
-   case bpf_ctx_range(struct __sk_buff, flow_keys):
-   if (size != sizeof(struct bpf_flow_keys *))
+   case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
+   if (size != sizeof(__u64))
return false;
break;
default:
@@ -5464,7 +5464,7 @@ static bool sk_filter_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, data_end):
-   case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
}
@@ -5489,7 +5489,7 @@ static bool 

Re: [PATCH bpf 2/2] bpf: test_verifier, test for lookup on queue/stack maps

2018-11-28 Thread Daniel Borkmann
On 11/28/2018 08:51 AM, Prashant Bhole wrote:
> This patch adds tests to check whether bpf verifier prevents lookup
> on queue/stack maps
> 
> Signed-off-by: Prashant Bhole 
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 52 +
>  1 file changed, 52 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_verifier.c 
> b/tools/testing/selftests/bpf/test_verifier.c
> index 550b7e46bf4a..becd9f4f3980 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -74,6 +74,8 @@ struct bpf_test {
>   int fixup_map_in_map[MAX_FIXUPS];
>   int fixup_cgroup_storage[MAX_FIXUPS];
>   int fixup_percpu_cgroup_storage[MAX_FIXUPS];
> + int fixup_map_queue[MAX_FIXUPS];
> + int fixup_map_stack[MAX_FIXUPS];
>   const char *errstr;
>   const char *errstr_unpriv;
>   uint32_t retval, retval_unpriv;
> @@ -4611,6 +4613,38 @@ static struct bpf_test tests[] = {
>   .errstr = "cannot pass map_type 7 into func 
> bpf_map_lookup_elem",
>   .prog_type = BPF_PROG_TYPE_PERF_EVENT,
>   },
> + {
> + "prevent map lookup in queue map",
> + .insns = {
> + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> + BPF_LD_MAP_FD(BPF_REG_1, 0),
> + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +  BPF_FUNC_map_lookup_elem),
> + BPF_EXIT_INSN(),
> + },
> + .fixup_map_queue = { 3 },
> + .result = REJECT,
> + .errstr = "invalid stack type R2 off=-8 access_size=0",
> + .prog_type = BPF_PROG_TYPE_XDP,

Hmm, the approach in patch 1 is very fragile, and we're lucky in this case
that the verifier bailed out with 'invalid stack type R2 off=-8 access_size=0'
because of key size being zero. If this would have not been the case then
the added ERR_PTR(-EOPNOTSUPP) would basically be seen as a valid pointer and
the program could read/write into it. Instead, this needs to be prevented much
earlier like check_map_func_compatibility(), and I would like to have a split
on these approaches to make verifier more robust. While you want 
ERR_PTR(-EOPNOTSUPP)
for user space syscall side, the BPF prog should only ever see (if anything)
a NULL here, because this is what the verifier matches later on to set the map
value_or_null pointer to a map value pointer.

> + },
> + {
> + "prevent map lookup in stack map",
> + .insns = {
> + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> + BPF_LD_MAP_FD(BPF_REG_1, 0),
> + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +  BPF_FUNC_map_lookup_elem),
> + BPF_EXIT_INSN(),
> + },
> + .fixup_map_stack = { 3 },
> + .result = REJECT,
> + .errstr = "invalid stack type R2 off=-8 access_size=0",
> + .prog_type = BPF_PROG_TYPE_XDP,
> + },
>   {
>   "prevent map lookup in prog array",
>   .insns = {
> @@ -14048,6 +14082,8 @@ static void do_test_fixup(struct bpf_test *test, enum 
> bpf_map_type prog_type,
>   int *fixup_map_sockhash = test->fixup_map_sockhash;
>   int *fixup_map_xskmap = test->fixup_map_xskmap;
>   int *fixup_map_stacktrace = test->fixup_map_stacktrace;
> + int *fixup_map_queue = test->fixup_map_queue;
> + int *fixup_map_stack = test->fixup_map_stack;
>   int *fixup_prog1 = test->fixup_prog1;
>   int *fixup_prog2 = test->fixup_prog2;
>   int *fixup_map_in_map = test->fixup_map_in_map;
> @@ -14168,6 +14204,22 @@ static void do_test_fixup(struct bpf_test *test, 
> enum bpf_map_type prog_type,
>   fixup_map_stacktrace++;
>   } while (fixup_map_stacktrace);
>   }
> + if (*fixup_map_queue) {
> + map_fds[13] = create_map(BPF_MAP_TYPE_QUEUE, 0,
> +  sizeof(u32), 1);
> + do {
> + prog[*fixup_map_queue].imm = map_fds[13];
> + fixup_map_queue++;
> + } while (*fixup_map_queue);
> + }
> + if (*fixup_map_stack) {
> + map_fds[14] = create_map(BPF_MAP_TYPE_STACK, 0,
> +  sizeof(u32), 1);
> + do {
> + prog[*fixup_map_stack].imm = map_fds[14];
> + fixup_map_stack++;
> + } while (*fixup_map_stack);
> + }
>  }
>  
>  static int set_admin(bool admin)
> 



Re: [PATCH bpf-next v2 0/4] libbpf: ABI versioning and documentation

2018-11-27 Thread Daniel Borkmann
On 11/27/2018 04:06 AM, Alexei Starovoitov wrote:
> On Fri, Nov 23, 2018 at 04:44:31PM -0800, Andrey Ignatov wrote:
>> This patch set adds ABI versioning and documentation to libbpf.
>>
>> Patch 1 renames btf_get_from_id to btf__get_from_id to follow naming
>> convention.
>> Patch 2 adds version script and has more details on ABI versioning.
>> Patch 3 adds simple check that all global symbols are versioned.
>> Patch 4 documents a few aspects of libbpf API and ABI in dev process.
>>
>> v1->v2:
>> * add patch from Martin KaFai Lau  to rename btf_get_from_id;
>> * add documentation for libbpf API and ABI.
> 
> All looks great to me.
> Thank you for adding the doc.
> Applied to bpf-next.
> 
> We need to discuss the release model and version bumps.
> For example I don't think it's necessary to bump the version
> and update libbpf.map every time the new function is added.
> I mean we can synchronize release of libbpf with the release of the kernel
> or release it every N weeks.

+1, synchronizing with release of the kernel seems a natural fit
given it's part of the kernel tree. Every N weeks might not even
work since changes are not in Linus' tree at that point, I think
this would probably just lead to confusion.

> So if we add new api functions during this release we can simply
> add them to libbpf.map while keeping the version as 0.0.1
> 
> I'd also consider the first 0.0.1 release to be experimental
> while we're figuring out the right process.
> For the next kernel/libbpf release I propose to bump it to 1.0.0
> 
> Another idea is to version it just like kernel and make libbpf version
> always equal kernel version.
> But I think that would be an overkill. libbpf isn't tightly coupled to
> the kernel. Like we just merged the patch (prog_name/map_name probing
> that allows new libbpf to work with older kernel.

Right, though we might end up bumping version each kernel release
anyway. This would allow for more automation on that part if we
could simply say that version looks like libbpf.so.4.20 or
libbpf.so.4.21, etc, without us forgetting to send an extra patch
every release just to bump version. Whether it's version -0.0.1,
-1.0.0, -4.20 or say -2018.11.27 is probably mostly matter of
preference but for the former two we would have to additionally
establish some convention / process on when to bump which part
of the three version component. I'd rather prefer if we could
have kernel-like convention where a bump in major version is "just"
yet another release but has no other special meaning attached to
it in which case we could just adapt the same numbering scheme.

Thanks,
Daniel


pull-request: bpf 2018-11-27

2018-11-27 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix several bugs in BPF sparc JIT, that is, convergence for fused
   branches, initialization of frame pointer register, and moving all
   arguments into output registers from input registers in prologue
   to fix BPF to BPF calls, from David.

2) Fix a bug in arm64 JIT for fetching BPF to BPF call addresses where
   they are not guaranteed to fit into imm field and therefore must be
   retrieved through prog aux data, from Daniel.

3) Explicitly add all JITs to MAINTAINERS file with developers able to
   help out in feature development, fixes, review, etc.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 69500127424cd90ff2cf8191256b2ac3b0a4af56:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf (2018-11-25 
20:04:58 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 2b9034b5eaddc09c0e9529b93446eb975f97f814:

  sparc: Adjust bpf JIT prologue for PSEUDO calls. (2018-11-27 09:46:52 +0100)


Alexei Starovoitov (1):
  Merge branch 'arm64-jit-fixes'

Daniel Borkmann (3):
  bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr
  bpf, arm64: fix getting subprog addr from aux for calls
  bpf, doc: add entries of who looks over which jits

David Miller (3):
  sparc: Fix JIT fused branch convergance.
  sparc: Correct ctx->saw_frame_pointer logic.
  sparc: Adjust bpf JIT prologue for PSEUDO calls.

 MAINTAINERS   | 63 -
 arch/arm64/net/bpf_jit_comp.c | 26 +++
 arch/powerpc/net/bpf_jit_comp64.c | 57 +++
 arch/sparc/net/bpf_jit_comp_64.c  | 97 +++
 include/linux/filter.h|  4 ++
 kernel/bpf/core.c | 34 ++
 6 files changed, 223 insertions(+), 58 deletions(-)


Re: [PATCH bpf] sparc: Adjust bpf JIT prologue for PSEUDO calls.

2018-11-27 Thread Daniel Borkmann
On 11/27/2018 06:55 AM, David Miller wrote:
> 
> Move all arguments into output registers from input registers.
> 
> This path is exercised by test_verifier.c's "calls: two calls with
> args" test.  Adjust BPF_TAILCALL_PROLOGUE_SKIP as needed.
> 
> Let's also make the prologue length a constant size regardless of
> the combination of ->saw_frame_pointer and ->saw_tail_call
> settings.
> 
> Signed-off-by: David S. Miller 

Applied to bpf, thanks David!


[PATCH bpf] bpf, doc: add entries of who looks over which jits

2018-11-26 Thread Daniel Borkmann
Make the high-level BPF JIT entry a general 'catch-all' and add
architecture specific entries to make it more clear who actively
maintains which BPF JIT compiler. The list (L) address implies
that this eventually lands in the bpf patchwork bucket. Goal is
that this set of responsible developers listed here is always up
to date and a point of contact for helping out in e.g. feature
development, fixes, review or testing patches in order to help
long-term in ensuring quality of the BPF JITs and therefore BPF
core under a given architecture. Every new JIT in future /must/
have an entry here as well.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Acked-by: Naveen N. Rao 
Acked-by: Sandipan Das 
Acked-by: Martin Schwidefsky 
Acked-by: Heiko Carstens 
Acked-by: David S. Miller 
Acked-by: Zi Shen Lim 
Acked-by: Paul Burton 
Acked-by: Jakub Kicinski 
Acked-by: Wang YanQing 
---
 MAINTAINERS | 63 -
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 03c46f4..bfaa411 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2801,7 +2801,7 @@ T:git 
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
 Q: https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
 S: Supported
-F: arch/x86/net/bpf_jit*
+F: arch/*/net/*
 F: Documentation/networking/filter.txt
 F: Documentation/bpf/
 F: include/linux/bpf*
@@ -2821,6 +2821,67 @@ F:   tools/bpf/
 F: tools/lib/bpf/
 F: tools/testing/selftests/bpf/
 
+BPF JIT for ARM
+M: Shubham Bansal 
+L: netdev@vger.kernel.org
+S: Maintained
+F: arch/arm/net/
+
+BPF JIT for ARM64
+M: Daniel Borkmann 
+M: Alexei Starovoitov 
+M: Zi Shen Lim 
+L: netdev@vger.kernel.org
+S: Supported
+F: arch/arm64/net/
+
+BPF JIT for MIPS (32-BIT AND 64-BIT)
+M: Paul Burton 
+L: netdev@vger.kernel.org
+S: Maintained
+F: arch/mips/net/
+
+BPF JIT for NFP NICs
+M: Jakub Kicinski 
+L: netdev@vger.kernel.org
+S: Supported
+F: drivers/net/ethernet/netronome/nfp/bpf/
+
+BPF JIT for POWERPC (32-BIT AND 64-BIT)
+M: Naveen N. Rao 
+M: Sandipan Das 
+L: netdev@vger.kernel.org
+S: Maintained
+F: arch/powerpc/net/
+
+BPF JIT for S390
+M: Martin Schwidefsky 
+M: Heiko Carstens 
+L: netdev@vger.kernel.org
+S: Maintained
+F: arch/s390/net/
+X: arch/s390/net/pnet.c
+
+BPF JIT for SPARC (32-BIT AND 64-BIT)
+M: David S. Miller 
+L: netdev@vger.kernel.org
+S: Maintained
+F: arch/sparc/net/
+
+BPF JIT for X86 32-BIT
+M: Wang YanQing 
+L: netdev@vger.kernel.org
+S: Maintained
+F: arch/x86/net/bpf_jit_comp32.c
+
+BPF JIT for X86 64-BIT
+M: Alexei Starovoitov 
+M: Daniel Borkmann 
+L: netdev@vger.kernel.org
+S: Supported
+F: arch/x86/net/
+X: arch/x86/net/bpf_jit_comp32.c
+
 BROADCOM B44 10/100 ETHERNET DRIVER
 M: Michael Chan 
 L: netdev@vger.kernel.org
-- 
2.9.5



Re: [PATCH bpf-next] bpf: Avoid unnecessary instruction in conver_bpf_ld_abs()

2018-11-26 Thread Daniel Borkmann
On 11/26/2018 10:42 PM, David Miller wrote:
> 
> 'offset' is constant and if it is zero, no need to subtract it
> from BPF_REG_TMP.
> 
> Signed-off-by: David S. Miller 

Applied to bpf-next, thanks!


Re: [PATCH v3 3/4] libbpf: add bpf_prog_test_run_xattr

2018-11-26 Thread Daniel Borkmann
On 11/26/2018 01:45 PM, Lorenz Bauer wrote:
> On Sat, 24 Nov 2018 at 22:20, Alexei Starovoitov
>  wrote:
>> On Fri, Nov 23, 2018 at 11:25:11PM +0100, Daniel Borkmann wrote:
>>> On 11/22/2018 03:09 PM, Lorenz Bauer wrote:
[...]
>>>>  LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
>>>>  __u32 size, void *data_out, __u32 *size_out,
>>>>  __u32 *retval, __u32 *duration);
>>>
>>> Could we add a comment into the header here stating that we discourage
>>> bpf_prog_test_run()'s use?
>>>
>>> It would probably also make sense since we go that route that we would
>>> convert the 10 bpf_prog_test_run() instances under test_progs.c at the
>>> same time so that people extending or looking at BPF kselftests don't
>>> copy discouraged bpf_prog_test_run() api as examples from this point
>>> onwards anymore.
>>
>> I would keep bpf_prog_test_run() and test_progs.c as-is.
>> I think the prog_test_run in the current form is actually fine.
>> I don't find it 'unsafe'.
>> When it's called on a given bpf prog the user knows what prog
>> suppose to do. If prog is increasing the packet size the test writer
>> knows that and should size the output buffer accordingly.
> 
> I guess this is a matter of perspective. I prefer an interface that
> gives me back
> an error message, rather than corrupt my stack / heap, when the assumptions
> change. It's also nicer to use from "managed" languages like Go where users
> aren't as accustomed to issues like these.
> 
> Do you object to me adding the disclaimer to the header as Daniel suggested?

Agree that prog_test_run() in the current form is actually fine, I was mostly
thinking that it may be non-obvious to users extending the tests or writing
their own test framework and checking how BPF kselftests does it (since it's
kind of a prime example), so they might end up using the wrong API and run
into mentioned issues w/o realizing. At min some comment with more context
would be needed.

>> Also there are plenty of tests where progs don't change the packet size
>> and any test with 'repeat > 1' should have the packet size
>> return to initial value. Like if the test is doing ipip encap
>> at the end of the run the bpf prog should remove that encap.
>> Otherwise 'repeat > 1' will produce wrong results.
> 
> Yup. Another thorny part of the test interface, which I'd like to improve! :)
> Don't know how to do it yet though.

Yep, somehow here we would need to restore original input packet layout/data
so that the test program always runs on the user defined one.


pull-request: bpf-next 2018-11-26

2018-11-26 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend BTF to support function call types and improve the BPF
   symbol handling with this info for kallsyms and bpftool program
   dump to make debugging easier, from Martin and Yonghong.

2) Optimize LPM lookups by making longest_prefix_match() handle
   multiple bytes at a time, from Eric.

3) Adds support for loading and attaching flow dissector BPF progs
   from bpftool, from Stanislav.

4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.

5) Enable verifier to support narrow context loads with offset > 0
   to adapt to LLVM code generation (currently only offset of 0 was
   supported). Add test cases as well, from Andrey.

6) Simplify passing device functions for offloaded BPF progs by
   adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
   Also convert nfp and netdevsim to make use of them, from Quentin.

7) Add support for sock_ops based BPF programs to send events to
   the perf ring-buffer through perf_event_output helper, from
   Sowmini and Daniel.

8) Add read / write support for skb->tstamp from tc BPF and cg BPF
   programs to allow for supporting rate-limiting in EDT qdiscs
   like fq from BPF side, from Vlad.

9) Extend libbpf API to support map in map types and add test cases
   for it as well to BPF kselftests, from Nikita.

10) Account the maximum packet offset accessed by a BPF program in
the verifier and use it for optimizing nfp JIT, from Jiong.

11) Fix error handling regarding kprobe_events in BPF sample loader,
from Daniel T.

12) Add support for queue and stack map type in bpftool, from David.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Note, there is a tiny merge conflict in BPF's kselftest Makefile.
Resolution is to take both chunks, like:

  [...]
  test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o \
  xdp_dummy.o test_map_in_map.o

Thanks a lot!



The following changes since commit f601a85bd7883708f48911d8c88e69fe5bde2b4d:

  net: hns3: Remove set but not used variable 'reset_level' (2018-11-07 
11:46:37 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to ffac28f95a98a87db0850801cd98771a08bb1dec:

  bpf: align map type names formatting. (2018-11-26 01:31:16 +0100)


Alexei Starovoitov (5):
  Merge branch 'device-ops-as-cb'
  Merge branch 'bpftool-flow-dissector'
  Merge branch 'narrow-loads'
  Merge branch 'socket-lookup-cg_sock'
  Merge branch 'btf-func-info'

Andrey Ignatov (6):
  bpf: Allow narrow loads with offset > 0
  selftests/bpf: Test narrow loads with off > 0 in test_verifier
  selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr
  bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp
  bpf: Support socket lookup in CGROUP_SOCK_ADDR progs
  selftest/bpf: Use bpf_sk_lookup_{tcp, udp} in test_sock_addr

Colin Ian King (3):
  bpf: fix null pointer dereference on pointer offload
  tools/bpf: fix spelling mistake "memeory" -> "memory"
  bpf: btf: fix spelling mistake "Memmber" -> "Member"

Daniel Borkmann (3):
  Merge branch 'bpf-max-pkt-offset'
  Merge branch 'bpf-zero-hash-seed'
  Merge branch 'bpf-libbpf-mapinmap'

Daniel T. Lee (1):
  samples: bpf: fix: error handling regarding kprobe_events

David Ahern (1):
  bpftool: Improve handling of ENOENT on map dumps

David Calavera (2):
  bpf: Add BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK to bpftool-map
  bpf: align map type names formatting.

Eric Dumazet (1):
  bpf, lpm: make longest_prefix_match() faster

Jiong Wang (2):
  bpf: let verifier to calculate and record max_pkt_offset
  nfp: bpf: relax prog rejection through max_pkt_offset

Joe Stringer (1):
  selftests/bpf: Fix uninitialized duration warning

Lorenz Bauer (4):
  bpf: allow zero-initializing hash map seed
  bpf: move BPF_F_QUERY_EFFECTIVE after map flags
  tools: sync linux/bpf.h
  tools: add selftest for BPF_F_ZERO_SEED

Martin KaFai Lau (5):
  bpf: libbpf: Fix bpf_program__next() API
  bpf: btf: Break up btf_type_is_void()
  bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
  tools/bpf: Sync kernel btf.h header
  tools/bpf: Add tests for BTF_KIND_FUNC_PROTO and BTF_KIND_FUNC

Nathan Chancellor (1):
  bpf: Remove unused variable in nsim_bpf

Nikita V. Shirokov (3):
  bpf: adding support for map in map in libbpf
  bpf: adding tests for map_in_map helpber in libbpf
  libbpf: make bpf_object__open default to UNSPEC

Nitin Hande (1):
  bpf: Extend the sk_lookup() helper to XDP

[PATCH bpf 1/2] bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr

2018-11-26 Thread Daniel Borkmann
Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp->aux->func[off]->bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.

Signed-off-by: Daniel Borkmann 
Reviewed-by: Sandipan Das 
Tested-by: Sandipan Das 
---
 arch/powerpc/net/bpf_jit_comp64.c | 57 ++-
 include/linux/filter.h|  4 +++
 kernel/bpf/core.c | 34 +++
 3 files changed, 76 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 50b1297..17482f5 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -166,7 +166,33 @@ static void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
PPC_BLR();
 }
 
-static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, 
u64 func)
+static void bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx,
+  u64 func)
+{
+#ifdef PPC64_ELF_ABI_v1
+   /* func points to the function descriptor */
+   PPC_LI64(b2p[TMP_REG_2], func);
+   /* Load actual entry point from function descriptor */
+   PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
+   /* ... and move it to LR */
+   PPC_MTLR(b2p[TMP_REG_1]);
+   /*
+* Load TOC from function descriptor at offset 8.
+* We can clobber r2 since we get called through a
+* function pointer (so caller will save/restore r2)
+* and since we don't use a TOC ourself.
+*/
+   PPC_BPF_LL(2, b2p[TMP_REG_2], 8);
+#else
+   /* We can clobber r12 */
+   PPC_FUNC_ADDR(12, func);
+   PPC_MTLR(12);
+#endif
+   PPC_BLRL();
+}
+
+static void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx,
+  u64 func)
 {
unsigned int i, ctx_idx = ctx->idx;
 
@@ -273,7 +299,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
-   int i;
+   int i, ret;
 
/* Start of epilogue code - will only be valid 2nd pass onwards */
u32 exit_addr = addrs[flen];
@@ -284,8 +310,9 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
u32 src_reg = b2p[insn[i].src_reg];
s16 off = insn[i].off;
s32 imm = insn[i].imm;
+   bool func_addr_fixed;
+   u64 func_addr;
u64 imm64;
-   u8 *func;
u32 true_cond;
u32 tmp_idx;
 
@@ -711,23 +738,15 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
 
-   /* bpf function call */
-   if (insn[i].src_reg == BPF_PSEUDO_CALL)
-   if (!extra_pass)
-   func = NULL;
-   else if (fp->aux->func && off < 
fp->aux->func_cnt)
-   /* use the subprog id from the off
-* field to lookup the callee address
-*/
-   func = (u8 *) 
fp->aux->func[off]->bpf_func;
-   else
-   return -EINVAL;
-   /* kernel helper call */
-   else
-   func = (u8 *) __bpf_call_base + imm;
-
-   bpf_jit_emit_func_call(image, ctx, (u64)func);
+   ret = bpf_jit_get_func_addr(fp, [i], extra_pass,
+   _addr, 
_addr_fixed);
+   if (ret < 0)
+   return ret;
 
+   if (func_addr_fixed)
+   bpf_jit_emit_func_call_hlp(image, ctx, 
func_addr);
+   else
+   bpf_jit_emit_func_call_rel(image, ctx, 
func_addr);
/* move return value from r3 to BPF_REG_0 */
PPC_MR(b2p[BPF_REG_0], 3);
break;
diff --git a/inclu

[PATCH bpf 2/2] bpf, arm64: fix getting subprog addr from aux for calls

2018-11-26 Thread Daniel Borkmann
The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF
to BPF call offset can be too far away from core kernel in that relative
encoding into imm is not sufficient and could potentially be truncated,
see also fd045f6cd98e ("arm64: add support for module PLTs") which adds
spill-over space for module_alloc() and therefore bpf_jit_binary_alloc().
Therefore, use the recently added bpf_jit_get_func_addr() helper for
properly fetching the address through prog->aux->func[off]->bpf_func
instead. This also has the benefit to optimize normal helper calls since
their address can use the optimized emission. Tested on Cavium ThunderX
CN8890.

Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann 
---
 arch/arm64/net/bpf_jit_comp.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a6fdaea..89198017 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -351,7 +351,8 @@ static void build_epilogue(struct jit_ctx *ctx)
  * >0 - successfully JITed a 16-byte eBPF instruction.
  * <0 - failed to JIT.
  */
-static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
+static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
+ bool extra_pass)
 {
const u8 code = insn->code;
const u8 dst = bpf2a64[insn->dst_reg];
@@ -625,12 +626,19 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
case BPF_JMP | BPF_CALL:
{
const u8 r0 = bpf2a64[BPF_REG_0];
-   const u64 func = (u64)__bpf_call_base + imm;
+   bool func_addr_fixed;
+   u64 func_addr;
+   int ret;
 
-   if (ctx->prog->is_func)
-   emit_addr_mov_i64(tmp, func, ctx);
+   ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
+   _addr, _addr_fixed);
+   if (ret < 0)
+   return ret;
+   if (func_addr_fixed)
+   /* We can use optimized emission here. */
+   emit_a64_mov_i64(tmp, func_addr, ctx);
else
-   emit_a64_mov_i64(tmp, func, ctx);
+   emit_addr_mov_i64(tmp, func_addr, ctx);
emit(A64_BLR(tmp), ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx);
break;
@@ -753,7 +761,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
return 0;
 }
 
-static int build_body(struct jit_ctx *ctx)
+static int build_body(struct jit_ctx *ctx, bool extra_pass)
 {
const struct bpf_prog *prog = ctx->prog;
int i;
@@ -762,7 +770,7 @@ static int build_body(struct jit_ctx *ctx)
const struct bpf_insn *insn = >insnsi[i];
int ret;
 
-   ret = build_insn(insn, ctx);
+   ret = build_insn(insn, ctx, extra_pass);
if (ret > 0) {
i++;
if (ctx->image == NULL)
@@ -858,7 +866,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
/* 1. Initial fake pass to compute ctx->idx. */
 
/* Fake pass to fill in ctx->offset. */
-   if (build_body()) {
+   if (build_body(, extra_pass)) {
prog = orig_prog;
goto out_off;
}
@@ -888,7 +896,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
build_prologue(, was_classic);
 
-   if (build_body()) {
+   if (build_body(, extra_pass)) {
bpf_jit_binary_free(header);
prog = orig_prog;
goto out_off;
-- 
2.9.5



[PATCH bpf 0/2] Fix for arm64 jit

2018-11-26 Thread Daniel Borkmann
This set contains a fix for arm64 BPF JIT. First patch generalizes
ppc64 way of retrieving subprog into bpf_jit_get_func_addr() as core
code and uses the same on arm64 in second patch. Tested on both arm64
and ppc64.

Thanks!

Daniel Borkmann (2):
  bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr
  bpf, arm64: fix getting subprog addr from aux for calls

 arch/arm64/net/bpf_jit_comp.c | 26 +++---
 arch/powerpc/net/bpf_jit_comp64.c | 57 ++-
 include/linux/filter.h|  4 +++
 kernel/bpf/core.c | 34 +++
 4 files changed, 93 insertions(+), 28 deletions(-)

-- 
2.9.5



Re: [PATCH bpf-next 1/3] bpf: helper to pop data from messages

2018-11-25 Thread Daniel Borkmann
On 11/23/2018 02:38 AM, John Fastabend wrote:
> This adds a BPF SK_MSG program helper so that we can pop data from a
> msg. We use this to pop metadata from a previous push data call.
> 
> Signed-off-by: John Fastabend 
> ---
>  include/uapi/linux/bpf.h |  13 +++-
>  net/core/filter.c| 169 
> +++
>  net/ipv4/tcp_bpf.c   |  14 +++-
>  3 files changed, 192 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c1554aa..64681f8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2268,6 +2268,16 @@ union bpf_attr {
>   *
>   *   Return
>   *   0 on success, or a negative error in case of failure.
> + *
> + * int bpf_msg_pop_data(struct sk_msg_buff *msg, u32 start, u32 pop, u64 
> flags)
> + *Description
> + *   Will remove 'pop' bytes from a msg starting at byte 'start'.
> + *   This result in ENOMEM errors under certain situations where
> + *   a allocation and copy are required due to a full ring buffer.
> + *   However, the helper will try to avoid doing the allocation
> + *   if possible. Other errors can occur if input parameters are
> + *   invalid either do to start byte not being valid part of msg
> + *   payload and/or pop value being to large.
>   */
>  #define __BPF_FUNC_MAPPER(FN)\
>   FN(unspec), \
> @@ -2360,7 +2370,8 @@ union bpf_attr {
>   FN(map_push_elem),  \
>   FN(map_pop_elem),   \
>   FN(map_peek_elem),  \
> - FN(msg_push_data),
> + FN(msg_push_data),  \
> + FN(msg_pop_data),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index f6ca38a..c6b35b5 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2428,6 +2428,173 @@ static const struct bpf_func_proto 
> bpf_msg_push_data_proto = {
>   .arg4_type  = ARG_ANYTHING,
>  };
>  
> +static void sk_msg_shift_left(struct sk_msg *msg, int i)
> +{
> + int prev;
> +
> + do {
> + prev = i;
> + sk_msg_iter_var_next(i);
> + msg->sg.data[prev] = msg->sg.data[i];
> + } while (i != msg->sg.end);
> +
> + sk_msg_iter_prev(msg, end);
> +}
> +
> +static void sk_msg_shift_right(struct sk_msg *msg, int i)
> +{
> + struct scatterlist tmp, sge;
> +
> + sk_msg_iter_next(msg, end);
> + sge = sk_msg_elem_cpy(msg, i);
> + sk_msg_iter_var_next(i);
> + tmp = sk_msg_elem_cpy(msg, i);
> +
> + while (i != msg->sg.end) {
> + msg->sg.data[i] = sge;
> + sk_msg_iter_var_next(i);
> + sge = tmp;
> + tmp = sk_msg_elem_cpy(msg, i);
> + }
> +}
> +
> +BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
> +u32, len, u64, flags)
> +{
> + u32 i = 0, l, space, offset = 0;
> + u64 last = start + len;
> + int pop;
> +
> + if (unlikely(flags))
> + return -EINVAL;
> +
> + /* First find the starting scatterlist element */
> + i = msg->sg.start;
> + do {
> + l = sk_msg_elem(msg, i)->length;
> +
> + if (start < offset + l)
> + break;
> + offset += l;
> + sk_msg_iter_var_next(i);
> + } while (i != msg->sg.end);
> +
> + /* Bounds checks: start and pop must be inside message */
> + if (start >= offset + l || last >= msg->sg.size)
> + return -EINVAL;
> +
> + space = MAX_MSG_FRAGS - sk_msg_elem_used(msg);
> +
> + pop = len;
> + /* --| offset
> +  * -| start  |--- len --|
> +  *
> +  *  |- a | pop ---|- b |
> +  *  |__| length
> +  *
> +  *
> +  * a:   region at front of scatter element to save
> +  * b:   region at back of scatter element to save when length > A + pop
> +  * pop: region to pop from element, same as input 'pop' here will be
> +  *  decremented below per iteration.
> +  *
> +  * Two top-level cases to handle when start != offset, first B is non
> +  * zero and second B is zero corresponding to when a pop includes more
> +  * than one element.
> +  *
> +  * Then if B is non-zero AND there is no space allocate space and
> +  * compact A, B regions into page. If there is space shift ring to
> +  * the rigth free'ing the next element in ring to place B, leaving
> +  * A untouched except to reduce length.
> +  */
> + if (start != offset) {
> + struct scatterlist *nsge, *sge = sk_msg_elem(msg, i);
> + int a = start;
> + int b = sge->length - pop - a;
> +
> + sk_msg_iter_var_next(i);
> +
> + if (pop < 

Re: [PATCH bpf-next 0/3] bpf: add sk_msg helper sk_msg_pop_data

2018-11-25 Thread Daniel Borkmann
On 11/23/2018 02:38 AM, John Fastabend wrote:
> After being able to add metadata to messages with sk_msg_push_data we
> have also found it useful to be able to "pop" this metadata off before
> sending it to applications in some cases. This series adds a new helper
> sk_msg_pop_data() and the associated patches to add tests and tools/lib
> support.
> 
> Thanks!
> 
> John Fastabend (3):
>   bpf: helper to pop data from messages
>   bpf: add msg_pop_data helper to tools
>   bpf: test_sockmap, add options for msg_pop_data() helper usage
> 
>  include/uapi/linux/bpf.h|  13 +-
>  net/core/filter.c   | 169 
> 
>  net/ipv4/tcp_bpf.c  |  14 +-
>  tools/include/uapi/linux/bpf.h  |  13 +-
>  tools/testing/selftests/bpf/bpf_helpers.h   |   2 +
>  tools/testing/selftests/bpf/test_sockmap.c  | 127 +-
>  tools/testing/selftests/bpf/test_sockmap_kern.h |  70 --
>  7 files changed, 386 insertions(+), 22 deletions(-)
> 

Applied to bpf-next, thanks.


Re: [PATCH bpf-next] bpf: align map type names formatting.

2018-11-25 Thread Daniel Borkmann
On 11/24/2018 12:58 AM, David Calavera wrote:
> Make the formatting for map_type_name array consistent.
> 
> Signed-off-by: David Calavera 

Applied, thanks!


Re: [PATCH] tags: Fix DEFINE_PER_CPU expansion

2018-11-25 Thread Daniel Borkmann
On 11/24/2018 12:48 AM, Rustam Kovhaev wrote:
> Building tags produces warning:
>  ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name 
> pattern "\1"
> 
> Let's use the same fix as in commit <25528213fe9f75f4>, even though it
> violates the usual code style.
> 
> Signed-off-by: Rustam Kovhaev 

Applied to bpf-next, thanks!


pull-request: bpf 2018-11-25

2018-11-25 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix an off-by-one bug when adjusting subprog start offsets after
   patching, from Edward.

2) Fix several bugs such as overflow in size allocation in queue /
   stack map creation, from Alexei.

3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp
   helper, from Andrey.

4) Fix several bugs in bpftool such as preventing an infinite loop
   in get_fdinfo, error handling and man page references, from Quentin.

5) Fix a warning in bpf_trace_printk() that wasn't catching an
   invalid format string, from Martynas.

6) Fix a bug in BPF cgroup local storage where non-atomic allocation
   was used in atomic context, from Roman.

7) Fix a NULL pointer dereference bug in bpftool from reallocarray()
   error handling, from Jakub and Wen.

8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools
   include infrastructure so that bpftool compiles on older RHEL7-like
   user space which does not ship these headers, from Yonghong.

9) Fix BPF kselftests for user space where to get ping test working
   with ping6 and ping -6, from Li.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 85b18b0237ce9986a81a1b9534b5e2ee116f5504:

  net: smsc95xx: Fix MTU range (2018-11-08 19:54:49 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 1efb6ee3edea57f57f9fb05dba8dcb3f7333f61f:

  bpf: fix check of allowed specifiers in bpf_trace_printk (2018-11-23 21:54:14 
+0100)


Alexei Starovoitov (1):
  bpf: fix integer overflow in queue_stack_map

Andrey Ignatov (1):
  bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp

Edward Cree (1):
  bpf: fix off-by-one error in adjust_subprog_starts

Jakub Kicinski (1):
  tools: bpftool: fix potential NULL pointer dereference in do_load

Li Zhijian (1):
  kselftests/bpf: use ping6 as the default ipv6 ping binary when it exists

Martynas Pumputis (1):
  bpf: fix check of allowed specifiers in bpf_trace_printk

Quentin Monnet (4):
  tools: bpftool: prevent infinite loop in get_fdinfo()
  tools: bpftool: fix plain output and doc for --bpffs option
  tools: bpftool: pass an argument to silence open_obj_pinned()
  tools: bpftool: update references to other man pages in documentation

Roman Gushchin (1):
  bpf: allocate local storage buffers using GFP_ATOMIC

Yonghong Song (1):
  tools/bpftool: copy a few net uapi headers to tools directory

 kernel/bpf/local_storage.c |   3 +-
 kernel/bpf/queue_stack_maps.c  |  16 +-
 kernel/bpf/verifier.c  |   2 +-
 kernel/trace/bpf_trace.c   |   8 +-
 net/core/filter.c  |   5 +-
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |   8 +-
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   8 +-
 tools/bpf/bpftool/Documentation/bpftool-net.rst|   8 +-
 tools/bpf/bpftool/Documentation/bpftool-perf.rst   |   8 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |  11 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   9 +-
 tools/bpf/bpftool/common.c |  17 +-
 tools/bpf/bpftool/main.h   |   2 +-
 tools/bpf/bpftool/prog.c   |  13 +-
 tools/include/uapi/linux/pkt_cls.h | 612 +
 tools/include/uapi/linux/tc_act/tc_bpf.h   |  37 ++
 tools/testing/selftests/bpf/test_netcnt.c  |   5 +-
 tools/testing/selftests/bpf/test_verifier.c|  19 +
 18 files changed, 752 insertions(+), 39 deletions(-)
 create mode 100644 tools/include/uapi/linux/pkt_cls.h
 create mode 100644 tools/include/uapi/linux/tc_act/tc_bpf.h


Re: [PATCH v3 3/4] libbpf: add bpf_prog_test_run_xattr

2018-11-23 Thread Daniel Borkmann
On 11/22/2018 03:09 PM, Lorenz Bauer wrote:
> Add a new function, which encourages safe usage of the test interface.
> bpf_prog_test_run continues to work as before, but should be considered
> unsafe.
> 
> Signed-off-by: Lorenz Bauer 

Set looks good to me, thanks! Three small things below:

> ---
>  tools/lib/bpf/bpf.c | 27 +++
>  tools/lib/bpf/bpf.h | 13 +
>  2 files changed, 40 insertions(+)
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 961e1b9fc592..f8518bef6886 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -424,6 +424,33 @@ int bpf_prog_test_run(int prog_fd, int repeat, void 
> *data, __u32 size,
>   return ret;
>  }
>  
> +int bpf_prog_test_run_xattr(const struct bpf_prog_test_run_attr *test_attr,
> + __u32 *size_out, __u32 *retval, __u32 *duration)
> +{
> + union bpf_attr attr;
> + int ret;
> +
> + if (!test_attr->data_out && test_attr->size_out > 0)
> + return -EINVAL;
> +
> + bzero(, sizeof(attr));
> + attr.test.prog_fd = test_attr->prog_fd;
> + attr.test.data_in = ptr_to_u64(test_attr->data);
> + attr.test.data_out = ptr_to_u64(test_attr->data_out);
> + attr.test.data_size_in = test_attr->size;
> + attr.test.data_size_out = test_attr->size_out;
> + attr.test.repeat = test_attr->repeat;
> +
> + ret = sys_bpf(BPF_PROG_TEST_RUN, , sizeof(attr));
> + if (size_out)
> + *size_out = attr.test.data_size_out;
> + if (retval)
> + *retval = attr.test.retval;
> + if (duration)
> + *duration = attr.test.duration;
> + return ret;
> +}
> +
>  int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
>  {
>   union bpf_attr attr;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 26a51538213c..570f19f77f42 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -110,6 +110,19 @@ LIBBPF_API int bpf_prog_attach(int prog_fd, int 
> attachable_fd,
>  LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
>  LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
>   enum bpf_attach_type type);
> +
> +struct bpf_prog_test_run_attr {
> + int prog_fd;
> + int repeat;
> + const void *data;
> + __u32 size;
> + void *data_out; /* optional */
> + __u32 size_out;

Small nit: could we name these data_{in,out} and data_size_{in,out} as
well, so it's analog to the ones from the bpf_attr?

> +};
> +
> +LIBBPF_API int bpf_prog_test_run_xattr(const struct bpf_prog_test_run_attr 
> *test_attr,
> +__u32 *size_out, __u32 *retval,
> +__u32 *duration);
>  LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
>__u32 size, void *data_out, __u32 *size_out,
>__u32 *retval, __u32 *duration);

Could we add a comment into the header here stating that we discourage
bpf_prog_test_run()'s use?

It would probably also make sense since we go that route that we would
convert the 10 bpf_prog_test_run() instances under test_progs.c at the
same time so that people extending or looking at BPF kselftests don't
copy discouraged bpf_prog_test_run() api as examples from this point
onwards anymore.

Thanks,
Daniel


Re: [PATCH v2 bpf-next 1/1] libbpf: make bpf_object__open default to UNSPEC

2018-11-23 Thread Daniel Borkmann
On 11/23/2018 09:58 PM, Nikita V. Shirokov wrote:
> currently by default libbpf's bpf_object__open requires
> bpf's program to specify  version in a code because of two things:
> 1) default prog type is set to KPROBE
> 2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified
> 
> in this patch i'm changing default prog type to UNSPEC and also changing
> requirments for version's section to be present in object file.
> now it would reflect what we have today in kernel
> (only KPROBE prog type requires for version to be explicitly set).
> 
> v1 -> v2:
>  - RFC tag has been dropped
> 
> Signed-off-by: Nikita V. Shirokov 

Applied, thanks!


Re: [PATCH v2] samples: bpf: fix: error handling regarding kprobe_events

2018-11-23 Thread Daniel Borkmann
On 11/22/2018 11:14 PM, Daniel T. Lee wrote:
> Currently, kprobe_events failure won't be handled properly.
> Due to calling system() indirectly to write to kprobe_events,
> it can't be identified whether an error is derived from kprobe or system.
> 
> // buf = "echo '%c:%s %s' >> /s/k/d/t/kprobe_events"
> err = system(buf);
> if (err < 0) {
> printf("failed to create kprobe ..");
> return -1;
> }
> 
> For example, running ./tracex7 sample in ext4 partition,
> "echo p:open_ctree open_ctree >> /s/k/d/t/kprobe_events"
> gets 256 error code system() failure.
> => The error comes from kprobe, but it's not handled correctly.
> 
> According to man of system(3), it's return value
> just passes the termination status of the child shell
> rather than treating the error as -1. (don't care success)
> 
> Which means, currently it's not working as desired.
> (According to the upper code snippet)
> 
> ex) running ./tracex7 with ext4 env.
> # Current Output
> sh: echo: I/O error
> failed to open event open_ctree
> 
> # Desired Output
> failed to create kprobe 'open_ctree' error 'No such file or directory'
> 
> The problem is, error can't be verified whether from child ps or system.
> 
> But using write() directly can verify the command failure,
> and it will treat all error as -1.
> 
> So I suggest using write() directly to 'kprobe_events'
> rather than calling system().
> 
> Signed-off-by: Daniel T. Lee 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next] bpf: Add BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_QUEUE to bpftool-map

2018-11-23 Thread Daniel Borkmann
On 11/23/2018 06:48 PM, David Calavera wrote:
> Hi,
> 
> Sorry for the mistake, I'll send a new patch. Before doing that, I've
> noticed that the array of map names in tools/bpf/bpftool/map.c is very
> inconsistent in formatting, some lines use tabs to align the names, others
> use spaces, and other are not aligned at all. Is there any formatting
> convention for this? I can fix those lines if you have a preferred method
> now that I'm adding new elements to that array.

I've fixed the typo from the subject and applied your patch. If you want to
send a patch with white-space cleanup for all the entries that would be fine
with me, sure. You could align all the '=' with tabs to the one from
percpu_cgroup_storage.

Thanks,
Daniel

> On Fri, Nov 23, 2018 at 2:56 AM Edward Cree  wrote:
> 
>> On 22/11/18 20:59, David Calavera wrote:
>>> I noticed that these two new BPF Maps are not defined in bpftool.
>>> This patch defines those two maps and adds their names to the
>>> bpftool-map documentation.
>>>
>>> Signed-off-by: David Calavera 
>>> ---
>> Subject line says 'QUEUE' twice, should one of those be 'STACK'?
>>>  tools/bpf/bpftool/Documentation/bpftool-map.rst | 3 ++-
>>>  tools/bpf/bpftool/map.c | 2 ++
>>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst
>> b/tools/bpf/bpftool/Documentation/bpftool-map.rst
>>> index f55a2daed59b..9e827e342d9e 100644
>>> --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
>>> +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
>>> @@ -42,7 +42,8 @@ MAP COMMANDS
>>>  || **percpu_array** | **stack_trace** | **cgroup_array** |
>> **lru_hash**
>>>  || **lru_percpu_hash** | **lpm_trie** | **array_of_maps** |
>> **hash_of_maps**
>>>  || **devmap** | **sockmap** | **cpumap** | **xskmap** |
>> **sockhash**
>>> -|| **cgroup_storage** | **reuseport_sockarray** |
>> **percpu_cgroup_storage** }
>>> +|| **cgroup_storage** | **reuseport_sockarray** |
>> **percpu_cgroup_storage**
>>> +|| **queue** | **stack** }
>>>
>>>  DESCRIPTION
>>>  ===
>>> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
>>> index 7bf38f0e152e..68b656b6edcc 100644
>>> --- a/tools/bpf/bpftool/map.c
>>> +++ b/tools/bpf/bpftool/map.c
>>> @@ -74,6 +74,8 @@ static const char * const map_type_name[] = {
>>>   [BPF_MAP_TYPE_CGROUP_STORAGE]   = "cgroup_storage",
>>>   [BPF_MAP_TYPE_REUSEPORT_SOCKARRAY] = "reuseport_sockarray",
>>>   [BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE]= "percpu_cgroup_storage",
>>> + [BPF_MAP_TYPE_QUEUE] = "queue",
>>> + [BPF_MAP_TYPE_STACK] = "stack",
>>>  };
>>>
>>>  static bool map_is_per_cpu(__u32 type)
>>
>>
>>
> 



Re: [PATCH v2] bpf: fix check of allowed specifiers in bpf_trace_printk

2018-11-23 Thread Daniel Borkmann
On 11/23/2018 05:43 PM, Martynas Pumputis wrote:
> A format string consisting of "%p" or "%s" followed by an invalid
> specifier (e.g. "%p%\n" or "%s%") could pass the check which
> would make format_decode (lib/vsprintf.c) to warn.
> 
> Reported-by: syzbot+1ec5c5ec949c4adaa...@syzkaller.appspotmail.com
> Signed-off-by: Martynas Pumputis 

Applied to bpf, thanks!


Re: [RFC PATCH bpf-next] libbpf: make bpf_object__open default to UNSPEC

2018-11-22 Thread Daniel Borkmann
[ +Wang ]

On 11/22/2018 07:03 AM, Nikita V. Shirokov wrote:
> currently by default libbpf's bpf_object__open requires
> bpf's program to specify  version in a code because of two things:
> 1) default prog type is set to KPROBE
> 2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified
> 
> in this RFC i'm proposing change default to UNSPEC and also changing
> logic of libbpf that it would reflect what we have today in kernel
> (aka only KPROBE type requires for version to be explicitly set).
> 
> reason for change:
> currently only libbpf requires by default version to be
> explicitly set. it would be really hard for mainteiners of other custom
> bpf loaders to migrate to libbpf (as they dont control user's code
> and migration to the new loader (libbpf) wont be transparent for end
> user).
> 
> what is going to be broken after this change:
> if someone were relying on default to be KPROBE for bpf_object__open
> his code will stop to work. however i'm really doubtfull that anyone
> is using this for kprobe type of programs (instead of, say, bcc or
> other tracing frameworks)
> 
> other possible solutions (for discussion, would require more machinery):
> add another function like bpf_object__open w/ default to unspec
> 
> Signed-off-by: Nikita V. Shirokov 
> ---
>  tools/lib/bpf/libbpf.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0f14f7c074c2..ed4212a4c5f9 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -333,7 +333,7 @@ bpf_program__init(void *data, size_t size, char 
> *section_name, int idx,
>   prog->idx = idx;
>   prog->instances.fds = NULL;
>   prog->instances.nr = -1;
> - prog->type = BPF_PROG_TYPE_KPROBE;
> + prog->type = BPF_PROG_TYPE_UNSPEC;
>   prog->btf_fd = -1;

Seems this was mostly for historic reasons, but for a generic library this
would indeed be an odd convention for default. Wang, given 5f44e4c810bf
("tools lib bpf: New API to adjust type of a BPF program"), are you in any
way relying on this default or using things like bpf_program__set_kprobe()
instead which you've added there? If latter, I'd say we should then change
it better now than later when there's even more lib usage (and in particular
before we add official ABI versioning).

>   return 0;
> @@ -1649,12 +1649,12 @@ static bool bpf_prog_type__needs_kver(enum 
> bpf_prog_type type)
>   case BPF_PROG_TYPE_LIRC_MODE2:
>   case BPF_PROG_TYPE_SK_REUSEPORT:
>   case BPF_PROG_TYPE_FLOW_DISSECTOR:
> - return false;
>   case BPF_PROG_TYPE_UNSPEC:
> - case BPF_PROG_TYPE_KPROBE:
>   case BPF_PROG_TYPE_TRACEPOINT:
> - case BPF_PROG_TYPE_PERF_EVENT:
>   case BPF_PROG_TYPE_RAW_TRACEPOINT:
> + case BPF_PROG_TYPE_PERF_EVENT:
> + return false;
> + case BPF_PROG_TYPE_KPROBE:
>   default:
>   return true;
>   }
> 

Thanks,
Daniel


Re: [PATCH] bpf: fix check of allowed specifiers in bpf_trace_printk

2018-11-22 Thread Daniel Borkmann
Hi Martynas,

On 11/22/2018 05:00 PM, Martynas Pumputis wrote:
> A format string consisting of "%p" or "%s" followed by an invalid
> specifier (e.g. "%p%\n" or "%s%") could pass the check which
> would make format_decode (lib/vsprintf.c) to warn.
> 
> Reported-by: syzbot+1ec5c5ec949c4adaa...@syzkaller.appspotmail.com
> Signed-off-by: Martynas Pumputis 
> ---
>  kernel/trace/bpf_trace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 08fcfe440c63..9ab05736e1a1 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -225,6 +225,8 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, 
> u64, arg1,
>   (void *) (long) unsafe_addr,
>   sizeof(buf));
>   }
> + if (fmt[i] == '%')
> + i--;
>   continue;
>   }

Thanks for the fix! Could we simplify the logic a bit to avoid having to
navigate i back and forth which got us in trouble in the first place? Like
below (untested) perhaps?

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 08fcfe4..ff83b8c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -196,11 +196,13 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, 
u64, arg1,
i++;
} else if (fmt[i] == 'p' || fmt[i] == 's') {
mod[fmt_cnt]++;
-   i++;
-   if (!isspace(fmt[i]) && !ispunct(fmt[i]) && fmt[i] != 0)
+   /* Disallow any further format extensions. */
+   if (fmt[i + 1] != 0 &&
+   !isspace(fmt[i + 1]) &&
+   !ispunct(fmt[i + 1]))
return -EINVAL;
fmt_cnt++;
-   if (fmt[i - 1] == 's') {
+   if (fmt[i] == 's') {
if (str_seen)
/* allow only one '%s' per fmt string */
return -EINVAL;

Thanks,
Daniel


Re: [PATCH bpf] bpf: fix integer overflow in queue_stack_map

2018-11-22 Thread Daniel Borkmann
On 11/22/2018 07:49 PM, Alexei Starovoitov wrote:
> fix the following issues:
> - allow queue_stack_map for root only
> - fix u32 max_entries overflow
> - disallow value_size == 0
> 
> Reported-by: Wei Wu 
> Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
> Signed-off-by: Alexei Starovoitov 

Applied, thanks everyone!


Re: selftests/bpf :get_cgroup_id_user: File not found: /sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id

2018-11-22 Thread Daniel Borkmann
Hi Naresh,

On 11/21/2018 06:53 PM, Y Song wrote:
> On Wed, Nov 21, 2018 at 3:44 AM Naresh Kamboju
>  wrote:
>>
>> Kselftest bpf get_cgroup_id_user is failed on all devices.
>>
>> selftests: bpf: get_cgroup_id_user
>> main:PASS:setup_cgroup_environment
>> main:PASS:create_and_get_cgroup
>> main:PASS:join_cgroup
>> main:PASS:bpf_prog_load
>> main:PASS:bpf_find_map
>> main:PASS:bpf_find_map
>> main:FAIL:open err -1 errno 2
>> not ok 1..15 selftests: bpf: get_cgroup_id_user [FAIL]
>>
>> The strace output shows,
>> This expected file not found,
>> "/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id"
>>
>> bpf(BPF_MAP_UPDATE_ELEM, {map_fd=5, key=0x7fff0c68c138,
>> value=0x7fff0c68c13c, flags=BPF_ANY}, 72) = 0
>> open(\"/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id\",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> write(1, \"main:FAIL:open err -1 errno 2\n\", 30main:FAIL:open err -1 errno 2
>>
>> Am I missing any pre-requirement ?
> 
> You probably miss kernel config CONFIG_FTRACE_SYSCALLS.

If that does the trick for you, could you add that to the BPF kselftest
config snippet?

Thanks,
Daniel


Re: [PATCH] samples: bpf: fix: error handling regarding kprobe_events

2018-11-22 Thread Daniel Borkmann
On 11/21/2018 05:07 PM, Daniel T. Lee wrote:
> Currently, kprobe_events failure won't be handled properly.
> Due to calling system() indirectly to write to kprobe_events,
> it can't be identified whether an error is derived from kprobe or system.
> 
> // buf = "echo '%c:%s %s' >> /s/k/d/t/kprobe_events"
> err = system(buf);
> if (err < 0) {
> printf("failed to create kprobe ..");
> return -1;
> }
> 
> For example, running ./tracex7 sample in ext4 partition,
> "echo p:open_ctree open_ctree >> /s/k/d/t/kprobe_events"
> gets 256 error code system() failure.
> => The error comes from kprobe, but it's not handled correctly.
> 
> According to man of system(3), it's return value
> just passes the termination status of the child shell
> rather than treating the error as -1. (don't care success)
> 
> Which means, currently it's not working as desired.
> (According to the upper code snippet)
> 
> ex) running ./tracex7 with ext4 env.
> # Current Output
> sh: echo: I/O error
> failed to open event open_ctree
> 
> # Desired Output
> failed to create kprobe 'open_ctree' error 'No such file or directory'
> 
> The problem is, error can't be verified whether from child ps or system.
> 
> But using write() directly can verify the command failure,
> and it will treat all error as -1.
> 
> So I suggest using write() directly to 'kprobe_events'
> rather than calling system().
> 
> Signed-off-by: Daniel T. Lee 

Looks reasonable to me, thanks for the patch! One tiny nit below:

> ---
>  samples/bpf/bpf_load.c | 33 -
>  1 file changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index e6d7e0fe155b..3e34d733f5f8 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -54,6 +54,23 @@ static int populate_prog_array(const char *event, int 
> prog_fd)
>   return 0;
>  }
>  
> +static int write_kprobe_events(const char *val)
> +{
> + int ret, flags;
> +
> + if ((val != NULL) && (val[0] == '\0'))
> + flags = O_WRONLY | O_TRUNC;
> + else
> + flags = O_WRONLY | O_APPEND;
> +
> + int fd = open("/sys/kernel/debug/tracing/kprobe_events", flags);

Nit: could you move fd declaration to above where you have ret and
flags instead of mixing with the code in order to match kernel style?

> + ret = write(fd, val, strlen(val));
> + close(fd);
> +
> + return ret;
> +}
> +
>  static int load_and_attach(const char *event, struct bpf_insn *prog, int 
> size)
>  {
>   bool is_socket = strncmp(event, "socket", 6) == 0;
> @@ -165,10 +182,9 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
>  
>  #ifdef __x86_64__
>   if (strncmp(event, "sys_", 4) == 0) {
> - snprintf(buf, sizeof(buf),
> -  "echo '%c:__x64_%s __x64_%s' >> 
> /sys/kernel/debug/tracing/kprobe_events",
> -  is_kprobe ? 'p' : 'r', event, event);
> - err = system(buf);
> + snprintf(buf, sizeof(buf), "%c:__x64_%s __x64_%s",
> + is_kprobe ? 'p' : 'r', event, event);
> + err = write_kprobe_events(buf);
>   if (err >= 0) {
>   need_normal_check = false;
>   event_prefix = "__x64_";
> @@ -176,10 +192,9 @@ static int load_and_attach(const char *event, struct 
> bpf_insn *prog, int size)
>   }
>  #endif
>   if (need_normal_check) {
> - snprintf(buf, sizeof(buf),
> -  "echo '%c:%s %s' >> 
> /sys/kernel/debug/tracing/kprobe_events",
> -  is_kprobe ? 'p' : 'r', event, event);
> - err = system(buf);
> + snprintf(buf, sizeof(buf), "%c:%s %s",
> + is_kprobe ? 'p' : 'r', event, event);
> + err = write_kprobe_events(buf);
>   if (err < 0) {
>   printf("failed to create kprobe '%s' error 
> '%s'\n",
>  event, strerror(errno));
> @@ -519,7 +534,7 @@ static int do_load_bpf_file(const char *path, 
> fixup_map_cb fixup_map)
>   return 1;
>  
>   /* clear all kprobes */
> - i = system("echo \"\" > /sys/kernel/debug/tracing/kprobe_events");
> + i = write_kprobe_events("");
>  
>   /* scan over all elf sections to get license and map info */
>   for (i = 1; i < ehdr.e_shnum; i++) {
> 

Thanks,
Daniel


Re: [PATCH bpf-next 1/2] libbpf: Add version script for DSO

2018-11-22 Thread Daniel Borkmann
On 11/21/2018 11:22 PM, Alexei Starovoitov wrote:
> On 11/21/18 12:18 PM, Yonghong Song wrote:
>> On 11/21/18 9:40 AM, Andrey Ignatov wrote:
>>> More and more projects use libbpf and one day it'll likely be packaged
>>> and distributed as DSO and that requires ABI versioning so that both
>>> compatible and incompatible changes to ABI can be introduced in a safe
>>> way in the future without breaking executables dynamically linked with a
>>> previous version of the library.
>>>
>>> Usual way to do ABI versioning is version script for the linker. Add
>>> such a script for libbpf. All global symbols currently exported via
>>> LIBBPF_API macro are added to the version script libbpf.map.
>>>
>>> The version name LIBBPF_0.0.1 is constructed from the name of the
>>> library + version specified by $(LIBBPF_VERSION) in Makefile.
>>>
>>> Version script does not duplicate the work done by LIBBPF_API macro, it
>>> rather complements it. The macro is used at compile time and can be used
>>> by compiler to do optimization that can't be done at link time, it is
>>> purely about global symbol visibility. The version script, in turn, is
>>> used at link time and takes care of ABI versioning. Both techniques are
>>> described in details in [1].
>>>
>>> Whenever ABI is changed in the future, version script should be changed
>>> appropriately.
>>
>> Maybe we should clarify the policy of how version numbers should be
>> change? Each commit which changes default global symbol ABI? Each kernel
>> release?

+1, could you add a documentation file into tools/lib/bpf/ where we
keep note on this process?

>>> [1] https://www.akkadia.org/drepper/dsohowto.pdf
>>>
>>> Signed-off-by: Andrey Ignatov 
>>> ---
>>>tools/lib/bpf/Makefile   |   4 +-
>>>tools/lib/bpf/libbpf.map | 120 +++
>>>2 files changed, 123 insertions(+), 1 deletion(-)
>>>create mode 100644 tools/lib/bpf/libbpf.map
>>>
>>> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
>>> index 425b480bda75..d76c41fa2d39 100644
>>> --- a/tools/lib/bpf/Makefile
>>> +++ b/tools/lib/bpf/Makefile
>>> @@ -145,6 +145,7 @@ include $(srctree)/tools/build/Makefile.include
>>>
>>>BPF_IN:= $(OUTPUT)libbpf-in.o
>>>LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
>>> +VERSION_SCRIPT := libbpf.map
>>>
>>>CMD_TARGETS = $(LIB_FILE)
>>>
>>> @@ -170,7 +171,8 @@ $(BPF_IN): force elfdep bpfdep
>>> $(Q)$(MAKE) $(build)=libbpf
>>>
>>>$(OUTPUT)libbpf.so: $(BPF_IN)
>>> -   $(QUIET_LINK)$(CC) --shared $^ -o $@
>>> +   $(QUIET_LINK)$(CC) --shared -Wl,--version-script=$(VERSION_SCRIPT) \
>>> +   $^ -o $@
>>>
>>>$(OUTPUT)libbpf.a: $(BPF_IN)
>>> $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
>>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>>> new file mode 100644
>>> index ..9fe416b68c7d
>>> --- /dev/null
>>> +++ b/tools/lib/bpf/libbpf.map
>>> @@ -0,0 +1,120 @@
>>> +LIBBPF_0.0.1 {
>>> +   global:
>>> +   bpf_btf_get_fd_by_id;
>>
>> Do you think we could use this opportunities to
>> make naming more consistent? For example,
>> bpf_btf_get_fd_by_id => btf__get_fd_by_id?
> 
> I think this one is fine since it matches
> bpf_[map|prog]_get_fd_by_id()
> and it's a wrapper.
> 
>>> +   bpf_create_map;
>>> +   bpf_create_map_in_map;
>>> +   bpf_create_map_in_map_node;
>>> +   bpf_create_map_name;
>>> +   bpf_create_map_node;
>>> +   bpf_create_map_xattr;
>>> +   bpf_load_btf;
>>> +   bpf_load_program;
>>> +   bpf_load_program_xattr;
>>> +   bpf_map__btf_key_type_id;
>>> +   bpf_map__btf_value_type_id;
>>> +   bpf_map__def;
>>> +   bpf_map_delete_elem; > +bpf_map__fd;
>>> +   bpf_map_get_fd_by_id;
>>> +   bpf_map_get_next_id;
>>> +   bpf_map_get_next_key; > +   
>>> bpf_map__is_offload_neutral;
>>> +   bpf_map_lookup_and_delete_elem;
>>> +   bpf_map_lookup_elem;
>>> +   bpf_map__name;
>>> +   bpf_map__next;
>>> +   bpf_map__pin;
>>> +   bpf_map__prev;
>>> +   bpf_map__priv;
>>> +   bpf_map__reuse_fd;
>>> +   bpf_map__set_ifindex;
>>> +   bpf_map__set_priv;
>>> +   bpf_map__unpin;
>>> +   bpf_map_update_elem;
>>> +   bpf_object__btf_fd;
>>> +   bpf_object__close;
>>> +   bpf_object__find_map_by_name;
>>> +   bpf_object__find_map_by_offset;
>>> +   bpf_object__find_program_by_title;
>>> +   bpf_object__kversion;
>>> +   bpf_object__load;
>>> +   bpf_object__name;
>>> +   bpf_object__next;
>>> +   bpf_object__open;
>>> +   bpf_object__open_buffer;
>>> +   bpf_object__open_xattr;
>>> +   bpf_object__pin;
>>> +   bpf_object__pin_maps;
>>> +   bpf_object__pin_programs;
>>> +   bpf_object__priv;
>>> +   

Re: [PATCH bpf-next v2 1/1] bpf, lpm: make longest_prefix_match() faster

2018-11-22 Thread Daniel Borkmann
On 11/22/2018 06:39 AM, Eric Dumazet wrote:
> At LPC 2018 in Vancouver, Vlad Dumitrescu mentioned that 
> longest_prefix_match()
> has a high cost [1].
> 
> One reason for that cost is a loop handling one byte at a time.
> 
> We can handle more bytes at a time, if enough attention is paid
> to endianness.
> 
> I was able to remove ~55 % of longest_prefix_match() cpu costs.
> 
> [1] 
> https://linuxplumbersconf.org/event/2/contributions/88/attachments/76/87/lpc-bpf-2018-shaping.pdf
> 
> Signed-off-by: Eric Dumazet 
> Cc: Vlad Dumitrescu 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 

Looks good, applied to bpf-next, thanks!


Re: [PATCH bpf] tools: bpftool: fix potential NULL pointer dereference in do_load

2018-11-21 Thread Daniel Borkmann
On 11/21/2018 10:53 PM, Jakub Kicinski wrote:
> This patch fixes a possible null pointer dereference in
> do_load, detected by the semantic patch deref_null.cocci,
> with the following warning:
> 
> ./tools/bpf/bpftool/prog.c:1021:23-25: ERROR: map_replace is NULL but 
> dereferenced.
> 
> The following code has potential null pointer references:
> 881 map_replace = reallocarray(map_replace, old_map_fds + 1,
> 882sizeof(*map_replace));
> 883 if (!map_replace) {
> 884 p_err("mem alloc failed");
> 885 goto err_free_reuse_maps;
> 886 }
> 
> ...
> 1019 err_free_reuse_maps:
> 1020 for (i = 0; i < old_map_fds; i++)
> 1021 close(map_replace[i].fd);
> 1022 free(map_replace);
> 
> Fixes: 3ff5a4dc5d89 ("tools: bpftool: allow reuse of maps with bpftool prog 
> load")
> Co-developed-by: Wen Yang 
> Signed-off-by: Wen Yang 
> Signed-off-by: Jakub Kicinski 

Applied to bpf, thanks everyone!


Re: [PATCH bpf-next] bpf: add read/write access to skb->tstamp from tc clsact progs

2018-11-21 Thread Daniel Borkmann
On 11/21/2018 07:48 PM, Vlad Dumitrescu wrote:
> On Wed, Nov 21, 2018 at 5:08 AM Eric Dumazet  wrote:
>> On 11/20/2018 06:40 PM, Alexei Starovoitov wrote:
>>>
>>> looks good to me.
>>>
>>> Any particular reason you decided to disable it for cg_skb ?
>>> It seems to me the same EDT approach will work from
>>> cgroup-bpf skb hooks just as well and then we can have neat
>>> way of controlling traffic per-container instead of tc-clsbpf global.
>>> If you're already on cgroup v2 it will save you a lot of classifier
>>> cycles, since you'd be able to group apps by cgroup
>>> instead of relying on ip only.
>>
>> Vlad first wrote a complete version, but we felt explaining the _why_
>> was probably harder.
>>
>> No particular reason, other than having to write more tests perhaps.
> 
> This sounds reasonable to me. I can prepare a v2.
> 
> Any concerns regarding capabilities? For example data and data_end are
> only available to CAP_SYS_ADMIN. Note that enforcement of this would
> be done by a global component later in the pipeline (e.g., FQ qdisc).

cg_skb_is_valid_access() has the CAP_SYS_ADMIN enforcement for direct
packet access since cg_skb can also run from unprivileged. Makes sense
to do the same for skb->tstamp for the STX_MEM part at least.

> Any opinions on sk_filter, lwt, and sk_skb before I send v2?

I'd probably leave that out for the time being if there is no concrete
use at this point.

Thanks,
Daniel


Re: [PATCH v5 bpf-next 0/2] bpf: adding support for mapinmap in libbpf

2018-11-21 Thread Daniel Borkmann
On 11/21/2018 05:55 AM, Nikita V. Shirokov wrote:
> in this patch series i'm adding a helper for libbpf which would allow
> it to load map-in-map(BPF_MAP_TYPE_ARRAY_OF_MAPS and
> BPF_MAP_TYPE_HASH_OF_MAPS).
> first patch contains new helper + explains proposed workflow
> second patch contains tests which also could be used as example of usage
> 
> v4->v5:
>  - naming: renamed everything to map_in_map instead of mapinmap
>  - start to return nonzero val if set_inner_map_fd failed
> 
> v3->v4:
>  - renamed helper to set_inner_map_fd
>  - now we set this value only if it haven't
>been set before and only for (array|hash) of maps
> 
> v2->v3:
>  - fixing typo in patch description
>  - initializing inner_map_fd to -1 by default
> 
> v1->v2:
>  - addressing nits
>  - removing const identifier from fd in new helper
>  - starting to check return val for bpf_map_update_elem
> 
> Nikita V. Shirokov (2):
>   bpf: adding support for map in map in libbpf
>   bpf: adding tests for mapinmap helpber in libbpf
> 
>  tools/lib/bpf/libbpf.c| 40 ++--
>  tools/lib/bpf/libbpf.h|  2 +
>  tools/testing/selftests/bpf/Makefile  |  3 +-
>  tools/testing/selftests/bpf/test_map_in_map.c | 49 +++
>  tools/testing/selftests/bpf/test_maps.c   | 90 
> +++
>  5 files changed, 177 insertions(+), 7 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/test_map_in_map.c
> 

Applied, thanks!


Re: [PATCH bpf-next v3 1/3] bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities

2018-11-21 Thread Daniel Borkmann
On 11/21/2018 02:11 AM, Stanislav Fomichev wrote:
> It currently only checks whether kernel supports map/prog names.
> This capability check will be used in the next two commits to skip setting
> prog/map names.
> 
> Suggested-by: Daniel Borkmann 
> Signed-off-by: Stanislav Fomichev 

Looks great, thanks for following through. Applied, thanks!


Re: [PATCH bpf-next v3] libbpf: make sure bpf headers are c++ include-able

2018-11-21 Thread Daniel Borkmann
On 11/21/2018 06:29 PM, Stanislav Fomichev wrote:
> Wrap headers in extern "C", to turn off C++ mangling.
> This simplifies including libbpf in c++ and linking against it.
> 
> v2 changes:
> * do the same for btf.h
> 
> v3 changes:
> * test_libbpf.cpp to test for possible future c++ breakages
> 
> Signed-off-by: Stanislav Fomichev 

Applied, thanks!


Re: [PATCH bpf-next] bpf: fix a libbpf loader issue

2018-11-21 Thread Daniel Borkmann
On 11/21/2018 08:22 PM, Yonghong Song wrote:
> Commit 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
> added support to read .BTF.ext sections from an object file, create
> and pass prog_btf_fd and func_info to the kernel.
> 
> The program btf_fd (prog->btf_fd) is initialized to be -1 to please
> zclose so we do not need special handling dur prog close.
> Passing -1 to the kernel, however, will cause loading error.
> Passing btf_fd 0 to the kernel if prog->btf_fd is invalid
> fixed the problem.
> 
> Fixes: 2993e0515bb4 ("tools/bpf: add support to read .BTF.ext sections")
> Reported-by: Andrey Ignatov 
> Reported-by: Emre Cantimur 
> Tested-by: Andrey Ignatov 
> Signed-off-by: Yonghong Song 

Applied, thanks!


Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash

2018-11-21 Thread Daniel Borkmann
Hi Eric,

On 11/20/2018 02:49 PM, Eric Dumazet wrote:
> On 11/20/2018 02:17 AM, Paolo Abeni wrote:
>> Eric noted that with UDP GRO and napi timeout, we could keep a single
>> UDP packet inside the GRO hash forever, if the related NAPI instance
>> calls napi_gro_complete() at an higher frequency than the napi timeout.
>> Willem noted that even TCP packets could be trapped there, till the
>> next retransmission.
>> This patch tries to address the issue, flushing the oldest packets before
>> scheduling the NAPI timeout. The rationale is that such a timeout should be
>> well below a jiffy and we are not flushing packets eligible for sane GRO.
>>
>> Reported-by: Eric Dumazet 
>> Signed-off-by: Paolo Abeni 
>> ---
>> Sending as RFC, as I fear I'm missing some relevant pieces.
>> Also I'm unsure if this should considered a fixes for "udp: implement
>> GRO for plain UDP sockets." or for "net: gro: add a per device gro flush 
>> timer"
> 
> You can add both, now worries.
> 
> Google DC TCP forces a PSH flag on all TSO packets, so for us the flush is 
> done
> because of the PSH flag, not upon a timer/jiffie.

Any plans to upstream this change? Or is the concern that this should only
be used DC internally? (If the latter, perhaps it would make sense to add
this under DCTCP mode then?)

Thanks a lot,
Daniel


Re: [PATCH bpf-next] bpf: libbpf: retry program creation without the name

2018-11-20 Thread Daniel Borkmann
On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
>> On 11/20, Alexei Starovoitov wrote:
>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
 [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
 the name") fixed this issue for maps, let's do the same for programs.]

 Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
 to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
 for programs. Pre v4.14 kernels don't know about programs names and
 return an error about unexpected non-zero data. Retry sys_bpf without
 a program name to cover older kernels.

 Signed-off-by: Stanislav Fomichev 
 ---
  tools/lib/bpf/bpf.c | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
 index 961e1b9fc592..cbe9d757c646 100644
 --- a/tools/lib/bpf/bpf.c
 +++ b/tools/lib/bpf/bpf.c
 @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct 
 bpf_load_program_attr *load_attr,
if (fd >= 0 || !log_buf || !log_buf_sz)
return fd;
  
 +  if (fd < 0 && errno == E2BIG && load_attr->name) {
 +  /* Retry the same syscall, but without the name.
 +   * Pre v4.14 kernels don't support prog names.
 +   */
>>>
>>> I'm afraid that will put unnecessary stress on the kernel.
>>> This check needs to be tighter.
>>> Like E2BIG and anything in the log_buf probably means that
>>> E2BIG came from the verifier and nothing to do with prog_name.
>>> Asking kernel to repeat is an unnecessary work.
>>>
>>> In general we need to think beyond this single prog_name field.
>>> There are bunch of other fields in bpf_load_program_xattr() and older 
>>> kernels
>>> won't support them. Are we going to zero them out one by one
>>> and retry? I don't think that would be practical.
>> I general, we don't want to zero anything out. However,
>> for this particular problem the rationale is the following:
>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
>> from the 'higher' libbpfc layer which breaks users on the older kernels.
>>
>>> Also libbpf silently ignoring prog_name is not great for debugging.
>>> A warning is needed.
>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
>>> wrappers.
>>> Imo such "old kernel -> lets retry" feature should probably be done
>>> at lib/bpf/libbpf.c level. inside load_program().
>> For maps bpftools calls bpf_create_map_xattr directly, that's why
>> for maps I did the retry on the lower level (and why for programs I initially
>> thought about doing the same). However, in this case maybe asking
>> user to omit 'name' argument might be a better option.
>>
>> For program names, I agree, we might think about doing it on the higher
>> level (although I'm not sure whether we want to have different API
>> expectations, i.e. bpf_create_map_xattr ignoring the name and
>> bpf_load_program_xattr not ignoring the name).
>>
>> So given that rationale above, what do you think is the best way to
>> move forward?
>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
>> 2. Move this retry logic into load_program and have different handling
>>for bpf_create_map_xattr vs bpf_load_program_xattr ?
>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>>into bpf_object__create_maps ?
>>
>> (I'm slightly leaning towards #3)
> 
> me too. I think it's cleaner for maps to do it in
> bpf_object__create_maps().
> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
> Whereas 'smart bits' would go into libbpf.c

Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
which would figure this out _once_ upon start with a few things to probe for
availability in the underlying kernel for maps and programs? E.g. programs
it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
things like prog name support etc. Given underlying kernel doesn't change, we
would only try this once and it doesn't require fallback every time.

> Right now this boundary is unfortunately blurry.
> May be as #4 long term option we'll introduce another 'smart' layer
> between bpf.c that will assume the latest kernel and libbpf.c that deals
> with elf. May be will call this new layer a 'compat' layer?
> For now I think doing #3 as you suggested is probably the best short term.
> 



Re: [PATCH bpf-next v2] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-19 Thread Daniel Borkmann
On 11/13/2018 05:35 PM, Nicolas Dichtel wrote:
> This new mode enables to add or remove an l2 header in a programmatic way
> with cls_bpf.
> For example, it enables to play with mpls headers.
> 
> Signed-off-by: Nicolas Dichtel 
> Acked-by: Martin KaFai Lau 

(Sorry for late reply, swamped due to Plumbers.)

> ---
> 
> v2: use skb_set_network_header()
> 
>  include/uapi/linux/bpf.h   |  3 ++
>  net/core/filter.c  | 53 ++
>  tools/include/uapi/linux/bpf.h |  3 ++
>  3 files changed, 59 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 47d606d744cc..9762ef3a536c 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1467,6 +1467,8 @@ union bpf_attr {
>   *
>   *   * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
>   * (room space is added or removed below the layer 3 header).
> + *   * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
> + * packet (room space is added or removed below skb->data).

Nit: in this case it would probably make more sense to name it BPF_ADJ_ROOM_MAC.
BPF_ADJ_ROOM_DATA reads like we're adjusting payload data.

>   *   All values for *flags* are reserved for future usage, and must
>   *   be left at zero.
> @@ -2412,6 +2414,7 @@ enum bpf_func_id {
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
>   BPF_ADJ_ROOM_NET,
> + BPF_ADJ_ROOM_DATA,
>  };
>  
>  /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 53d50fb75ea1..e56da3077dca 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2884,6 +2884,57 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, s32 
> len_diff)
>   return ret;
>  }
>  
> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
> +{
> + unsigned short hhlen = skb->dev->header_ops ?
> +skb->dev->hard_header_len : 0;
> + int ret;
> +
> + ret = skb_unclone(skb, GFP_ATOMIC);
> + if (unlikely(ret < 0))
> + return ret;
> +
> + __skb_pull(skb, len);
> + skb_reset_mac_header(skb);
> + skb_set_network_header(skb, hhlen);
> + skb_reset_transport_header(skb);
> + return 0;
> +}
> +
> +static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
> +{
> + unsigned short hhlen = skb->dev->header_ops ?
> +skb->dev->hard_header_len : 0;
> + int ret;
> +
> + ret = skb_cow(skb, len);
> + if (unlikely(ret < 0))
> + return ret;
> +
> + skb_push(skb, len);
> + skb_reset_mac_header(skb);

Looks like this could leak uninitialized mem into the BPF prog as it's
not zeroed out. Can't we just reuse bpf_skb_generic_push() and the
bpf_skb_generic_pop()?

bpf_skb_vlan_push() and bpf_skb_vlan_pop() do a bpf_push_mac_rcsum() /
bpf_pull_mac_rcsum() in order to not screw up things like csum complete.
Wouldn't this be needed here as well?

How does this interact with MPLS GSO?

If the packet gets pushed back into the stack, would AF_MPLS handle it?
Probably good to add test cases to BPF kselftest suite with it.

Don't we need to reject the packet in case of skb->encapsulation?

Looking at push_mpls() and pop_mpls() from OVS implementation, do we
also need to deal with skb inner network header / proto?

Given all this would it rather make sense to add MPLS specific helpers
in similar fashion to the vlan ones we have?

Is there any other use case aside from MPLS?

> + return 0;
> +}
> +
> +static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
> +{
> + u32 len_diff_abs = abs(len_diff);
> + bool shrink = len_diff < 0;
> + int ret;
> +
> + if (unlikely(len_diff_abs > 0xfffU))
> + return -EFAULT;
> +
> + if (shrink && len_diff_abs >= skb_headlen(skb))
> + return -EFAULT;
> +
> + ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
> +bpf_skb_data_grow(skb, len_diff_abs);

Hmm, I think this has a bug in that the above two do not adjust
skb->mac_len. Then things like cls_bpf_classify() will __skb_pull()
based on the old mac_len, getting you to a wrong offset in the
packet when this is for example pushed back into ingress, etc.

> + bpf_compute_data_pointers(skb);
> + return ret;
> +}
> +
>  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>  u32, mode, u64, flags)
>  {
> @@ -2891,6 +2942,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, 
> s32, len_diff,
>   return -EINVAL;
>   if (likely(mode == BPF_ADJ_ROOM_NET))
>   return bpf_skb_adjust_net(skb, len_diff);
> + if (likely(mode == BPF_ADJ_ROOM_DATA))
> + return bpf_skb_adjust_data(skb, len_diff);
>  
>   return -ENOTSUPP;
>  }
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 

Re: [PATCH 0/3] Fix unsafe BPF_PROG_TEST_RUN interface

2018-11-19 Thread Daniel Borkmann
On 11/19/2018 03:30 PM, Lorenz Bauer wrote:
> On Sun, 18 Nov 2018 at 06:13, Y Song  wrote:
>>
>> There is a slight change of user space behavior for this patch.
>> Without this patch, the value bpf_attr.test.data_size_out is output only.
>> For example,
>>output buffer : out_buf (user allocated size 10)
>>data_size_out is a random value (e.g., 1),
>>
>> The actual data to copy is 5.
>>
>> In today's implementation, the kernel will copy 5 and set data_size_out is 5.
>>
>> With this patch, the kernel will copy 1 and set data_size_out is 5.
>>
>> I am not 100% sure at this time whether we CAN overload data_size_out
>> since it MAY break existing applications.
> 
> Yes, that's correct. I think that the likelihood of this is low. It would
> affect code that uses bpf_attr without zeroing it first. I had a look around,
> and I could only find code that would keep working:

Agree, it seems like this would be rather unlikely to break the old behavior
and only if some test app forgot to zero it (given data_size_out is also in
the middle and not at the end). I'd rather prefer this approach here and then
push the patch via stable than adding yet another data_size_out-like member.

I think it also makes sense to return a -ENOSPC as Yonghong suggested in order
to indicate to user space that the buffer is not sufficient. Right now this
would have no such indication to the user so it would not be possible to
distinguish whether truncation or not happened. Was thinking whether it makes
sense to indicate through a new flag member that buffer truncation happened,
but I do like -ENOSPC better.

Thanks,
Daniel


Re: [PATCH bpf-next v2] bpf: libbpf: retry map creation without the name

2018-11-19 Thread Daniel Borkmann
On 11/19/2018 11:49 PM, Stanislav Fomichev wrote:
> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> for maps. Pre v4.14 kernels don't know about map names and return an
> error about unexpected non-zero data. Retry sys_bpf without a map
> name to cover older kernels.
> 
> v2 changes:
> * check for errno == EINVAL as suggested by Daniel Borkmann
> 
> Signed-off-by: Stanislav Fomichev 

Applied to bpf-next, thanks!


Re: [PATCH v3 0/4] bpf: allow zero-initialising hash map seed

2018-11-19 Thread Daniel Borkmann
On 11/16/2018 12:41 PM, Lorenz Bauer wrote:
> Allow forcing the seed of a hash table to zero, for deterministic
> execution during benchmarking and testing.
> 
> Changes from v2:
> * Change ordering of BPF_F_ZERO_SEED in linux/bpf.h
> 
> Comments adressed from v1:
> * Add comment to discourage production use to linux/bpf.h
> * Require CAP_SYS_ADMIN
> 
> Lorenz Bauer (4):
>   bpf: allow zero-initializing hash map seed
>   bpf: move BPF_F_QUERY_EFFECTIVE after map flags
>   tools: sync linux/bpf.h
>   tools: add selftest for BPF_F_ZERO_SEED
> 
>  include/uapi/linux/bpf.h|  9 ++--
>  kernel/bpf/hashtab.c| 13 -
>  tools/include/uapi/linux/bpf.h  | 13 +++--
>  tools/testing/selftests/bpf/test_maps.c | 68 +
>  4 files changed, 84 insertions(+), 19 deletions(-)
> 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next] bpf: libbpf: retry map creation without the name

2018-11-19 Thread Daniel Borkmann
On 11/19/2018 10:35 PM, Stanislav Fomichev wrote:
> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> for maps. Pre v4.14 kernels don't know about map names and return an
> error about unexpected non-zero data. Retry sys_bpf without a map
> name to cover older kernels.

Looks good, small nit below:

> Signed-off-by: Stanislav Fomichev 
> ---
>  tools/lib/bpf/bpf.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 03f9bcc4ef50..673175bc06ee 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -69,6 +69,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr 
> *create_attr)
>  {
>   __u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
>   union bpf_attr attr;
> + int ret;
>  
>   memset(, '\0', sizeof(attr));
>  
> @@ -86,7 +87,15 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr 
> *create_attr)
>   attr.map_ifindex = create_attr->map_ifindex;
>   attr.inner_map_fd = create_attr->inner_map_fd;
>  
> - return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
> + ret = sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
> + if (ret < 0 && create_attr->name) {

Could you also check errno being EINVAL here so that we do not try to
confuse it with an error coming from insufficient memlock which is
EPERM in that case.

> + /* Retry the same syscall, but without the name.
> +  * Pre v4.14 kernels don't support map names.
> +  */
> + memset(attr.map_name, 0, sizeof(attr.map_name));
> + return sys_bpf(BPF_MAP_CREATE, , sizeof(attr));
> + }
> + return ret;
>  }
>  
>  int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
> 



Re: [PATCH net-next 2/6] net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI

2018-11-19 Thread Daniel Borkmann
On 11/10/2018 07:58 PM, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław 

Why you have empty commit messages for non-trivial changes like this in
4 out of 6 of your patches ...

How was it tested on the JITs you were changing? Did you test on both,
big and little endian machines?

> ---
>  net/core/filter.c | 40 +---
>  1 file changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e521c5ebc7d1..c151b906df53 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -296,22 +296,21 @@ static u32 convert_skb_access(int skb_field, int 
> dst_reg, int src_reg,
>   break;
>  
>   case SKF_AD_VLAN_TAG:
> - case SKF_AD_VLAN_TAG_PRESENT:
>   BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
> - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
>  
>   /* dst_reg = *(u16 *) (src_reg + offsetof(vlan_tci)) */
>   *insn++ = BPF_LDX_MEM(BPF_H, dst_reg, src_reg,
> offsetof(struct sk_buff, vlan_tci));
> - if (skb_field == SKF_AD_VLAN_TAG) {
> - *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg,
> - ~VLAN_TAG_PRESENT);
> - } else {
> - /* dst_reg >>= 12 */
> - *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 12);
> - /* dst_reg &= 1 */
> +#ifdef VLAN_TAG_PRESENT
> + *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, ~VLAN_TAG_PRESENT);
> +#endif
> + break;
> + case SKF_AD_VLAN_TAG_PRESENT:
> + *insn++ = BPF_LDX_MEM(BPF_B, dst_reg, src_reg, 
> PKT_VLAN_PRESENT_OFFSET());
> + if (PKT_VLAN_PRESENT_BIT)
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, dst_reg, 
> PKT_VLAN_PRESENT_BIT);
> + if (PKT_VLAN_PRESENT_BIT < 7)
>   *insn++ = BPF_ALU32_IMM(BPF_AND, dst_reg, 1);
> - }
>   break;
>   }
>  
> @@ -6140,19 +6139,22 @@ static u32 bpf_convert_ctx_access(enum 
> bpf_access_type type,
>   break;
>  
>   case offsetof(struct __sk_buff, vlan_present):
> + *target_size = 1;
> + *insn++ = BPF_LDX_MEM(BPF_B, si->dst_reg, si->src_reg,
> +   PKT_VLAN_PRESENT_OFFSET());
> + if (PKT_VLAN_PRESENT_BIT)
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 
> PKT_VLAN_PRESENT_BIT);
> + if (PKT_VLAN_PRESENT_BIT < 7)
> + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1);
> + break;
> +
>   case offsetof(struct __sk_buff, vlan_tci):
> - BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
> -
>   *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> bpf_target_off(struct sk_buff, vlan_tci, 
> 2,
>target_size));
> - if (si->off == offsetof(struct __sk_buff, vlan_tci)) {
> - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg,
> - ~VLAN_TAG_PRESENT);
> - } else {
> - *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 12);
> - *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 1);
> - }
> +#ifdef VLAN_TAG_PRESENT
> + *insn++ = BPF_ALU32_IMM(BPF_AND, si->dst_reg, 
> ~VLAN_TAG_PRESENT);
> +#endif
>   break;
>  
>   case offsetof(struct __sk_buff, cb[0]) ...
> 



Re: [PATCH net-next 0/6] Remove VLAN.CFI overload

2018-11-19 Thread Daniel Borkmann
On 11/10/2018 10:47 PM, David Miller wrote:
> From: Michał Mirosław 
> Date: Sat, 10 Nov 2018 19:58:29 +0100
> 
>> Fix BPF code/JITs to allow for separate VLAN_PRESENT flag
>> storage and finally move the flag to separate storage in skbuff.
>>
>> This is final step to make CLAN.CFI transparent to core Linux
>> networking stack.
>>
>> An #ifdef is introduced temporarily to mark fragments masking
>> VLAN_TAG_PRESENT. This is removed altogether in the final patch.
> 
> Daniel and Alexei, please review.

Sorry, was completely swamped due to plumbers, just getting to it now.


Re: [PATCH 1/4] bpf: account for freed JIT allocations in arch code

2018-11-19 Thread Daniel Borkmann
On 11/17/2018 07:57 PM, Ard Biesheuvel wrote:
> Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
> allocations") added a call to bpf_jit_uncharge_modmem() to the routine
> bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
> This function is overridden by arches, some of which do not call
> bpf_jit_binary_free() to release the memory, and so the released
> memory is not accounted for, potentially leading to spurious allocation
> failures.
> 
> So replace the direct calls to module_memfree() in the arch code with
> calls to bpf_jit_binary_free().

Sorry but this patch is completely buggy, and above description on the
accounting incorrect as well. Looks like this patch was not tested at all.

The below cBPF JITs that use module_memfree() which you replace with
bpf_jit_binary_free() are using module_alloc() internally to get the JIT
image buffer ...

> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/mips/net/bpf_jit.c   | 2 +-
>  arch/powerpc/net/bpf_jit_comp.c   | 2 +-
>  arch/powerpc/net/bpf_jit_comp64.c | 5 +
>  arch/sparc/net/bpf_jit_comp_32.c  | 2 +-
>  4 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 4d8cb9bb8365..1b69897274a1 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index d5bfe24bb3b5..a1ea1ea6b40d 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 50b129785aee..84c8f013a6c6 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
> *fp)
>  /* Overriding bpf_jit_free() as we don't set images read-only. */
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
> - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> - struct bpf_binary_header *bpf_hdr = (void *)addr;
> -
>   if (fp->jited)
> - bpf_jit_binary_free(bpf_hdr);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c 
> b/arch/sparc/net/bpf_jit_comp_32.c
> index a5ff88643d5c..01bda6bc9e7f 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -759,7 +759,7 @@ cond_branch:  f_offset = addrs[i + 
> filter[i].jf];
>  void bpf_jit_free(struct bpf_prog *fp)
>  {
>   if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>  
>   bpf_prog_unlock_free(fp);
>  }
> 



Re: [PATCH v2 bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.

2018-11-09 Thread Daniel Borkmann
On 10/29/2018 05:02 AM, Nitin Hande wrote:
> 
> This patch proposes to extend the sk_lookup() BPF API to the
> XDP hookpoint. The sk_lookup() helper supports a lookup
> on incoming packet to find the corresponding socket that will
> receive this packet. Current support for this BPF API is
> at the tc hookpoint. This patch will extend this API at XDP
> hookpoint. A XDP program can map the incoming packet to the
> 5-tuple parameter and invoke the API to find the corresponding
> socket structure.
> 
> Signed-off-by: Nitin Hande 

Looks good to me, applied to bpf-next, thanks Nitin!


Re: [PATCH bpf-next] bpftool: Improve handling of ENOENT on map dumps

2018-11-09 Thread Daniel Borkmann
On 11/08/2018 10:25 PM, Jakub Kicinski wrote:
> On Thu,  8 Nov 2018 13:00:07 -0800, David Ahern wrote:
>> From: David Ahern 
>>
>> bpftool output is not user friendly when dumping a map with only a few
>> populated entries:
>>
>> $ bpftool map
>> 1: devmap  name tx_devmap  flags 0x0
>> key 4B  value 4B  max_entries 64  memlock 4096B
>> 2: array  name tx_idxmap  flags 0x0
>> key 4B  value 4B  max_entries 64  memlock 4096B
>>
>> $ bpftool map dump id 1
>> key:
>> 00 00 00 00
>> value:
>> No such file or directory
>> key:
>> 01 00 00 00
>> value:
>> No such file or directory
>> key:
>> 02 00 00 00
>> value:
>> No such file or directory
>> key: 03 00 00 00  value: 03 00 00 00
>>
>> Handle ENOENT by keeping the line format sane and dumping
>> "" for the value
>>
>> $ bpftool map dump id 1
>> key: 00 00 00 00  value: 
>> key: 01 00 00 00  value: 
>> key: 02 00 00 00  value: 
>> key: 03 00 00 00  value: 03 00 00 00
>> ...
>>
>> Signed-off-by: David Ahern 
> 
> Seems good.  I wonder why "fd" maps report all indexes in get_next..
> 
> Acked-by: Jakub Kicinski 
> 
>> Alternatively, could just omit the value, so:
>> key: 00 00 00 00  value:
>> key: 01 00 00 00  value:
>> key: 02 00 00 00  value:
>> key: 03 00 00 00  value: 03 00 00 00
>>
>>  tools/bpf/bpftool/map.c | 19 +++
>>  1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
>> index 101b8a881225..1f0060644e0c 100644
>> --- a/tools/bpf/bpftool/map.c
>> +++ b/tools/bpf/bpftool/map.c
>> @@ -383,7 +383,10 @@ static void print_entry_plain(struct bpf_map_info 
>> *info, unsigned char *key,
>>  printf(single_line ? "  " : "\n");
>>  
>>  printf("value:%c", break_names ? '\n' : ' ');
>> -fprint_hex(stdout, value, info->value_size, " ");
>> +if (value)
>> +fprint_hex(stdout, value, info->value_size, " ");
>> +else
>> +printf("");
>>  
>>  printf("\n");
>>  } else {
>> @@ -398,8 +401,12 @@ static void print_entry_plain(struct bpf_map_info 
>> *info, unsigned char *key,
>>  for (i = 0; i < n; i++) {
>>  printf("value (CPU %02d):%c",
>> i, info->value_size > 16 ? '\n' : ' ');
>> -fprint_hex(stdout, value + i * step,
>> -   info->value_size, " ");
>> +if (value) {
>> +fprint_hex(stdout, value + i * step,
>> +   info->value_size, " ");
>> +} else {
>> +printf("");
>> +}
> 
> nit: in other places you don't add { }

Applied to bpf-next and fixed this nit up while doing so, thanks everyone!


Re: [PATCH bpf-next 0/2] bpf: offer maximum packet offset info

2018-11-09 Thread Daniel Borkmann
On 11/08/2018 10:08 AM, Jiong Wang wrote:
> The maximum packet offset accessed by one BPF program is useful
> information.
> 
> Because sometimes there could be packet split and it is possible for some
> reasons (for example performance) we want to reject the BPF program if the
> maximum packet size would trigger such split. Normally, MTU value is
> treated as the maximum packet size, but one BPF program does not always
> access the whole packet, it could only access the head portion of the data.
> 
> We could let verifier calculate the maximum packet offset ever used and
> record it inside prog auxiliar information structure as a new field
> "max_pkt_offset".
> 
> Jiong Wang (2):
>   bpf: let verifier to calculate and record max_pkt_offset
>   nfp: bpf: relax prog rejection through max_pkt_offset
> 
>  drivers/net/ethernet/netronome/nfp/bpf/offload.c |  9 +
>  include/linux/bpf.h  |  1 +
>  kernel/bpf/verifier.c| 12 
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 

Applied to bpf-next, thanks!


Re: [PATCH bpf] bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp

2018-11-08 Thread Daniel Borkmann
On 11/07/2018 10:36 PM, Andrey Ignatov wrote:
> Lookup functions in sk_lookup have different expectations about byte
> order of provided arguments.
> 
> Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
> expect dport to be in network byte order and do ntohs(dport) internally.
> 
> At the same time __inet6_lookup expects dport to be in host byte order
> and correspondingly name the argument hnum.
> 
> sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
> __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
> uses host instead of expected network byte order. It makes result
> returned by bpf_sk_lookup_udp for IPv6 incorrect.
> 
> The patch fixes byte order of dport passed to __udp6_lib_lookup.
> 
> Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f02a
> fixes TCPv6 but breaks UDPv6.
> 
> Fixes: 5ef0ae84f02a ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
> Signed-off-by: Andrey Ignatov 

Applied to bpf, thanks Andrey!


Re: [PATCH bpf v2] tools/bpftool: copy a few net uapi headers to tools directory

2018-11-08 Thread Daniel Borkmann
On 11/08/2018 04:55 AM, Yonghong Song wrote:
> Commit f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
> added certain networking support to bpftool.
> The implementation relies on a relatively recent uapi header file
> linux/tc_act/tc_bpf.h on the host which contains the marco
> definition of TCA_ACT_BPF_ID.
> 
> Unfortunately, this is not the case for all distributions.
> See the email message below where rhel-7.2 does not have
> an up-to-date linux/tc_act/tc_bpf.h.
>   https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1799211.html
> Further investigation found that linux/pkt_cls.h is also needed for macro
> TCA_BPF_TAG.
> 
> This patch fixed the issue by copying linux/tc_act/tc_bpf.h
> and linux/pkt_cls.h from kernel include/uapi directory to
> tools/include/uapi directory so building the bpftool does not depend
> on host system for these files.
> 
> Fixes: f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
> Reported-by: kernel test robot 
> Cc: Li Zhijian 
> Signed-off-by: Yonghong Song 

Applied to bpf, thanks Yonghong!


Re: [PATCH bpf 0/4] tools: bpftool: bring several minor fixes to bpftool

2018-11-08 Thread Daniel Borkmann
On 11/08/2018 12:52 PM, Quentin Monnet wrote:
> Hi,
> This set contains minor fixes for bpftool code and documentation.
> Please refer to individual patches for details.
> 
> Quentin Monnet (4):
>   tools: bpftool: prevent infinite loop in get_fdinfo()
>   tools: bpftool: fix plain output and doc for --bpffs option
>   tools: bpftool: pass an argument to silence open_obj_pinned()
>   tools: bpftool: update references to other man pages in documentation
> 
>  tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-map.rst|  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-net.rst|  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-perf.rst   |  8 +++-
>  tools/bpf/bpftool/Documentation/bpftool-prog.rst   | 11 +--
>  tools/bpf/bpftool/Documentation/bpftool.rst|  9 +++--
>  tools/bpf/bpftool/common.c | 17 +
>  tools/bpf/bpftool/main.h   |  2 +-
>  tools/bpf/bpftool/prog.c   |  3 +--
>  9 files changed, 55 insertions(+), 19 deletions(-)
> 

Applied to bpf, thanks Quentin!


Re: [PATCH bpf-next] xdp: sample code for redirecting vlan packets to specific cpus

2018-11-07 Thread Daniel Borkmann
On 10/29/2018 11:11 PM, John Fastabend wrote:
> On 10/29/2018 02:19 PM, Shannon Nelson wrote:
>> This is an example of using XDP to redirect the processing of
>> particular vlan packets to specific CPUs.  This is in response
>> to comments received on a kernel patch put forth previously
>> to do something similar using RPS.
>>  https://www.spinics.net/lists/netdev/msg528210.html
>>  [PATCH net-next] net: enable RPS on vlan devices
>>
>> This XDP application watches for inbound vlan-tagged packets
>> and redirects those packets to be processed on a specific CPU
>> as configured in a BPF map.  The BPF map can be modified by
>> this user program, which can also load and unload the kernel
>> XDP code.
>>
>> One example use is for supporting VMs where we can't control the
>> OS being used: we'd like to separate the VM CPU processing from
>> the host's CPUs as a way to help mitigate L1TF related issues.
>> When running the VM's traffic on a vlan we can stick the host's
>> Rx processing on one set of CPUs separate from the VM's CPUs.
>>
>> This example currently uses a vlan key and cpu value in the
>> BPF map, so only can do one CPU per vlan.  This could easily
>> be modified to use a bitpattern of CPUs rather than a CPU id
>> to allow multiple CPUs per vlan.
> 
> Great, so does this solve your use case then? At least on drivers
> with XDP support?
> 
>>
>> Signed-off-by: Shannon Nelson 
>> ---
> 
> Some really small and trivial nits below.
> 
> Acked-by: John Fastabend 
> 
> [...]
> 
>> +if (install) {
>> +
> 
> new line probably not needed. 
> 
>> +/* check to see if already installed */
>> +errno = 0;
>> +access(pin_prog_name, R_OK);
>> +if (errno != ENOENT) {
>> +fprintf(stderr, "ERR: %s is already installed\n", 
>> argv[0]);
>> +return -1;
>> +}
>> +
>> +/* load the XDP program and maps with the convenient library */
>> +if (load_bpf_file(filename)) {
>> +fprintf(stderr, "ERR: load_bpf_file(%s): \n%s",
>> +filename, bpf_log_buf);
>> +return -1;
>> +}
>> +if (!prog_fd[0]) {
>> +fprintf(stderr, "ERR: load_bpf_file(%s): %d %s\n",
>> +filename, errno, strerror(errno));
>> +return -1;
>> +}
>> +
>> +/* pin the XDP program and maps */
>> +if (bpf_obj_pin(prog_fd[0], pin_prog_name) < 0) {
>> +fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
>> +pin_prog_name, errno, strerror(errno));
>> +if (errno == 2)
>> +fprintf(stderr, " (is the BPF fs mounted on 
>> /sys/fs/bpf?)\n");
>> +return -1;
>> +}
>> +if (bpf_obj_pin(map_fd[0], pin_vlanmap_name) < 0) {
>> +fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
>> +pin_vlanmap_name, errno, strerror(errno));
>> +return -1;
>> +}
>> +if (bpf_obj_pin(map_fd[2], pin_countermap_name) < 0) {
>> +fprintf(stderr, "ERR: bpf_obj_pin(%s): %d %s\n",
>> +pin_countermap_name, errno, strerror(errno));
>> +return -1;
>> +}
>> +
>> +/* prep the vlan map with "not used" values */
>> +c64 = UNDEF_CPU;
>> +for (v64 = 0; v64 < 4096; v64++) {
> 
> maybe #define MAX_VLANS 4096 just to avoid constants.
> 
>> +if (bpf_map_update_elem(map_fd[0], , , 0)) {
>> +fprintf(stderr, "ERR: preping vlan map failed 
>> on v=%llu: %d %s\n",
>> +v64, errno, strerror(errno));
>> +return -1;
>> +}
>> +}
>> +
>> +/* prep the cpumap with queue sizes */
>> +c64 = 128+64;  /* see note in xdp_redirect_cpu_user.c */
>> +for (v64 = 0; v64 < MAX_CPUS; v64++) {
>> +if (bpf_map_update_elem(map_fd[1], , , 0)) {
>> +if (errno == ENODEV) {
>> +/* Save the last CPU number attempted
>> + * into the counters map
>> + */
>> +c64 = CPU_COUNT;
>> +ret = bpf_map_update_elem(map_fd[2], 
>> , , 0);
>> +break;
>> +}
>> +
>> +fprintf(stderr, "ERR: preping cpu map failed on 
>> v=%llu: %d %s\n",
>> +v64, errno, strerror(errno));
>> +return -1;
>> +}
>> +}
>> +
>> +/* wire 

Re: [PATCH bpf-next] bpf_load: add map name to load_maps error message

2018-11-07 Thread Daniel Borkmann
On 11/07/2018 01:26 AM, Song Liu wrote:
> On Mon, Oct 29, 2018 at 3:12 PM John Fastabend  
> wrote:
>>
>> On 10/29/2018 02:14 PM, Shannon Nelson wrote:
>>> To help when debugging bpf/xdp load issues, have the load_map()
>>> error message include the number and name of the map that
>>> failed.
>>>
>>> Signed-off-by: Shannon Nelson 
>>> ---
>>>  samples/bpf/bpf_load.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
>>> index 89161c9..5de0357 100644
>>> --- a/samples/bpf/bpf_load.c
>>> +++ b/samples/bpf/bpf_load.c
>>> @@ -282,8 +282,8 @@ static int load_maps(struct bpf_map_data *maps, int 
>>> nr_maps,
>>>   numa_node);
>>>   }
>>>   if (map_fd[i] < 0) {
>>> - printf("failed to create a map: %d %s\n",
>>> -errno, strerror(errno));
>>> + printf("failed to create map %d (%s): %d %s\n",
>>> +i, maps[i].name, errno, strerror(errno));
>>>   return 1;
>>>   }
>>>   maps[i].fd = map_fd[i];
>>>
>>
>> LGTM
>>
>> Acked-by: John Fastabend 
> 
> Acked-by: Song Liu 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next 2/2] selftests/bpf: add a test case for sock_ops perf-event notification

2018-11-07 Thread Daniel Borkmann
On 11/06/2018 09:28 PM, Sowmini Varadhan wrote:
> This patch provides a tcp_bpf based eBPF sample. The test
> - ncat(1) as the TCP client program to connect() to a port
>   with the intention of triggerring SYN retransmissions: we
>   first install an iptables DROP rule to make sure ncat SYNs are
>   resent (instead of aborting instantly after a TCP RST)
> - has a bpf kernel module that sends a perf-event notification for
>   each TCP retransmit, and also tracks the number of such notifications
>   sent in the global_map
> The test passes when the number of event notifications intercepted
> in user-space matches the value in the global_map.
> 
> Signed-off-by: Sowmini Varadhan 
> ---
>  tools/testing/selftests/bpf/Makefile  |4 +-
>  tools/testing/selftests/bpf/perf-sys.h|   74 
>  tools/testing/selftests/bpf/test_tcpnotify.h  |   19 ++
>  tools/testing/selftests/bpf/test_tcpnotify_kern.c |   95 +++
>  tools/testing/selftests/bpf/test_tcpnotify_user.c |  186 
> +
>  5 files changed, 377 insertions(+), 1 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/perf-sys.h
>  create mode 100644 tools/testing/selftests/bpf/test_tcpnotify.h
>  create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_tcpnotify_user.c
> 
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index e39dfb4..6c94048 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -24,12 +24,13 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps 
> test_lru_map test_lpm_map test
>   test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
>   test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user 
> \
>   test_socket_cookie test_cgroup_storage test_select_reuseport 
> test_section_names \
> - test_netcnt
> + test_netcnt test_tcpnotify_user
>  
>  TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
> test_obj_id.o \
>   test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
> sockmap_parse_prog.o \
>   sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
>   test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
> + test_tcpnotify_kern.o \
>   sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
>   sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o 
> test_adjust_tail.o \
>   test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o 
> \
> @@ -74,6 +75,7 @@ $(OUTPUT)/test_sock_addr: cgroup_helpers.c
>  $(OUTPUT)/test_socket_cookie: cgroup_helpers.c
>  $(OUTPUT)/test_sockmap: cgroup_helpers.c
>  $(OUTPUT)/test_tcpbpf_user: cgroup_helpers.c
> +$(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
>  $(OUTPUT)/test_progs: trace_helpers.c
>  $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
>  $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
> diff --git a/tools/testing/selftests/bpf/perf-sys.h 
> b/tools/testing/selftests/bpf/perf-sys.h
> new file mode 100644
> index 000..3eb7a39
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/perf-sys.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _PERF_SYS_H
> +#define _PERF_SYS_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef __powerpc__
> +#define CPUINFO_PROC {"cpu"}
> +#endif
> +
> +#ifdef __s390__
> +#define CPUINFO_PROC {"vendor_id"}
> +#endif
> +
> +#ifdef __sh__
> +#define CPUINFO_PROC {"cpu type"}
> +#endif
> +
> +#ifdef __hppa__
> +#define CPUINFO_PROC {"cpu"}
> +#endif
> +
> +#ifdef __sparc__
> +#define CPUINFO_PROC {"cpu"}
> +#endif
> +
> +#ifdef __alpha__
> +#define CPUINFO_PROC {"cpu model"}
> +#endif
> +
> +#ifdef __arm__
> +#define CPUINFO_PROC {"model name", "Processor"}
> +#endif
> +
> +#ifdef __mips__
> +#define CPUINFO_PROC {"cpu model"}
> +#endif
> +
> +#ifdef __arc__
> +#define CPUINFO_PROC {"Processor"}
> +#endif
> +
> +#ifdef __xtensa__
> +#define CPUINFO_PROC {"core ID"}
> +#endif
> +
> +#ifndef CPUINFO_PROC
> +#define CPUINFO_PROC { "model name", }
> +#endif
> +
> +static inline int
> +sys_perf_event_open(struct perf_event_attr *attr,
> +   pid_t pid, int cpu, int group_fd,
> +   unsigned long flags)
> +{
> + int fd;
> +
> + fd = syscall(__NR_perf_event_open, attr, pid, cpu,
> +  group_fd, flags);
> +
> +#ifdef HAVE_ATTR_TEST
> + if (unlikely(test_attr__enabled))
> + test_attr__open(attr, pid, cpu, fd, group_fd, flags);
> +#endif
> + return fd;
> +}

I would prefer if we could avoid adding whole perf-sys duplicate right
into BPF kselftest directory. Agree it would be nice to have the mini
wrapper somewhere, but then lets make that a separate commit and place
the wrapper-only somewhere as tools/include/linux/perf.h that all 

Re: [PATCH bpf-next] selftests/bpf: enable (uncomment) all tests in test_libbpf.sh

2018-11-07 Thread Daniel Borkmann
On 11/07/2018 01:28 PM, Quentin Monnet wrote:
> libbpf is now able to load successfully test_l4lb_noinline.o and
> samples/bpf/tracex3_kern.o.
> 
> For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
> and remove the associated "TODO".
> 
> For tracex3_kern.o, instead of loading a program from samples/bpf/ that
> might not have been compiled at this stage, try loading a program from
> BPF selftests. Since this test case is about loading a program compiled
> without the "-target bpf" flag, change the Makefile to compile one
> program accordingly (instead of passing the flag for compiling all
> programs).
> 
> Regarding test_xdp_noinline.o: in its current shape the program fails to
> load because it provides no version section, but the loader needs one.
> The test was added to make sure that libbpf could load XDP programs even
> if they do not provide a version number in a dedicated section. But
> libbpf is already capable of doing that: in our case loading fails
> because the loader does not know that this is an XDP program (it does
> not need to, since it does not attach the program). So trying to load
> test_xdp_noinline.o does not bring much here: just delete this subtest.
> 
> For the record, the error message obtained with tracex3_kern.o was
> fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF
> objects containing .eh_frames")
> 
> I have not been abled to reproduce the "libbpf: incorrect bpf_call
> opcode" error for test_l4lb_noinline.o, even with the version of libbpf
> present at the time when test_libbpf.sh and test_libbpf_open.c were
> created.
> 
> RFC -> v1:
> - Compile test_xdp without the "-target bpf" flag, and try to load it
>   instead of ../../samples/bpf/tracex3_kern.o.
> - Delete test_xdp_noinline.o subtest.
> 
> Cc: Jesper Dangaard Brouer 
> Signed-off-by: Quentin Monnet 
> Acked-by: Jakub Kicinski 

Applied to bpf-next, thanks!


Re: [PATCH bpf-next] tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps

2018-11-07 Thread Daniel Borkmann
On 11/07/2018 01:29 PM, Quentin Monnet wrote:
> The limit for memory locked in the kernel by a process is usually set to
> 64 bytes by default. This can be an issue when creating large BPF maps
> and/or loading many programs. A workaround is to raise this limit for
> the current process before trying to create a new BPF map. Changing the
> hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
> root user (for non-root users, a call to setrlimit fails (and sets
> errno) and the program simply goes on with its rlimit unchanged).
> 
> There is no API to get the current amount of memory locked for a user,
> therefore we cannot raise the limit only when required. One solution,
> used by bcc, is to try to create the map, and on getting a EPERM error,
> raising the limit to infinity before giving another try. Another
> approach, used in iproute2, is to raise the limit in all cases, before
> trying to create the map.
> 
> Here we do the same as in iproute2: the rlimit is raised to infinity
> before trying to load programs or to create maps with bpftool.
> 
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jakub Kicinski 

Applied to bpf-next and fixed commit log, thanks!


bpf-next is OPEN

2018-11-06 Thread Daniel Borkmann
Merge window is over so bpf-next is open again!

Thanks,
Daniel


pull-request: bpf 2018-11-03

2018-11-02 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix BPF prog kallsyms and bpf_prog_get_info_by_fd() jited address export
   to not use page start but actual start instead to work correctly with
   profiling e.g. finding hot instructions from stack traces. Also fix latter
   with regards to dumping address and jited len for !multi prog, from Song.

2) Fix bpf_prog_get_info_by_fd() for !root to zero info.nr_jited_func_lens
   instead of leaving the user provided length, from Daniel.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 7de414a9dd91426318df7b63da024b2b07e53df5:

  net: drop skb on failure in ip_check_defrag() (2018-11-01 13:55:30 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 28c2fae726bf5003cd209b0d5910a642af98316f:

  bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv (2018-11-02 
13:51:15 -0700)


Daniel Borkmann (2):
  Merge branch 'bpf-accurate-prog-addr'
  bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv

Song Liu (3):
  bpf: show real jited prog address in /proc/kallsyms
  bpf: show real jited address in bpf_prog_info->jited_ksyms
  bpf: show main program address and length in bpf_prog_info

 kernel/bpf/core.c|  4 +---
 kernel/bpf/syscall.c | 35 +--
 2 files changed, 26 insertions(+), 13 deletions(-)


Re: [PATCH v2 bpf 0/3] show more accurrate bpf program address

2018-11-02 Thread Daniel Borkmann
On 11/02/2018 06:16 PM, Song Liu wrote:
> Changes v1 -> v2:
> 1. Added main program length to bpf_prog_info->jited_fun_lens (3/3).
> 2. Updated commit message of 1/3 and 2/3 with more background about the
>address masking, and why it is still save after the changes.
> 3. Replace "ulong" with "unsigned long".
> 
> This set improves bpf program address showed in /proc/kallsyms and in
> bpf_prog_info. First, real program address is showed instead of page
> address. Second, when there is no subprogram, bpf_prog_info->jited_ksyms
> and bpf_prog_info->jited_fun_lens returns the main prog address and
> length.
> 
> Song Liu (3):
>   bpf: show real jited prog address in /proc/kallsyms
>   bpf: show real jited address in bpf_prog_info->jited_ksyms
>   bpf: show main program address and length in bpf_prog_info
> 
>  kernel/bpf/core.c|  4 +---
>  kernel/bpf/syscall.c | 34 --
>  2 files changed, 25 insertions(+), 13 deletions(-)
> 
> --
> 2.17.1
> 

Applied to bpf, thanks!


[PATCH bpf] bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv

2018-11-02 Thread Daniel Borkmann
While dbecd7388476 ("bpf: get kernel symbol addresses via syscall")
zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image
lengths of functions via syscall") forgot about doing so and therefore
returns the #elems of the user set up buffer which is incorrect. It
also needs to indicate a info.nr_jited_func_lens of zero.

Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall")
Signed-off-by: Daniel Borkmann 
Cc: Sandipan Das 
Cc: Song Liu 
---
 kernel/bpf/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ccb9327..1ea5ce1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2078,6 +2078,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
info.jited_prog_len = 0;
info.xlated_prog_len = 0;
info.nr_jited_ksyms = 0;
+   info.nr_jited_func_lens = 0;
goto done;
}
 
-- 
2.9.5



Re: [PATCH bpf 2/3] bpf: show real jited address in bpf_prog_info->jited_ksyms

2018-11-02 Thread Daniel Borkmann
On 11/02/2018 11:09 AM, Daniel Borkmann wrote:
> On 11/01/2018 08:00 AM, Song Liu wrote:
>> Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
>> bpf program. This is not ideal for detailed profiling (find hot
>> instructions from stack traces). This patch replaces the page address
>> with real prog start address.
>>
>> Signed-off-by: Song Liu 
>> ---
>>  kernel/bpf/syscall.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index ccb93277aae2..34a9eef5992c 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2172,7 +2172,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
>> *prog,
>>  user_ksyms = u64_to_user_ptr(info.jited_ksyms);
>>  for (i = 0; i < ulen; i++) {
>>  ksym_addr = (ulong) 
>> prog->aux->func[i]->bpf_func;
>> -ksym_addr &= PAGE_MASK;
> 
> Note that the masking was done on purpose here and in patch 1/3 in order to
> not expose randomized start address to kallsyms at least. I suppose it's
> okay to change it here and for kallsyms given bpf_prog_get_info_by_fd() dump
> is for root only, and in each of the two cases we additionally apply
> kallsyms_show_value() logic, so for unpriv this is zeroed out plus only root
> loaded programs are added under kallsyms (capable(CAP_SYS_ADMIN)) anyway.

(Btw, something like above should have been in changelog to provide some more
historical context of why we used to do it like that and explaining why it is
okay to change it this way.)

>>  if (put_user((u64) ksym_addr, _ksyms[i]))
>>  return -EFAULT;
>>  }
>>
> 



Re: [PATCH bpf 2/3] bpf: show real jited address in bpf_prog_info->jited_ksyms

2018-11-02 Thread Daniel Borkmann
On 11/01/2018 08:00 AM, Song Liu wrote:
> Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
> bpf program. This is not ideal for detailed profiling (find hot
> instructions from stack traces). This patch replaces the page address
> with real prog start address.
> 
> Signed-off-by: Song Liu 
> ---
>  kernel/bpf/syscall.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ccb93277aae2..34a9eef5992c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2172,7 +2172,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   user_ksyms = u64_to_user_ptr(info.jited_ksyms);
>   for (i = 0; i < ulen; i++) {
>   ksym_addr = (ulong) 
> prog->aux->func[i]->bpf_func;
> - ksym_addr &= PAGE_MASK;

Note that the masking was done on purpose here and in patch 1/3 in order to
not expose randomized start address to kallsyms at least. I suppose it's
okay to change it here and for kallsyms given bpf_prog_get_info_by_fd() dump
is for root only, and in each of the two cases we additionally apply
kallsyms_show_value() logic, so for unpriv this is zeroed out plus only root
loaded programs are added under kallsyms (capable(CAP_SYS_ADMIN)) anyway.

>   if (put_user((u64) ksym_addr, _ksyms[i]))
>   return -EFAULT;
>   }
> 



Re: [PATCH bpf 3/3] bpf: show main program address in bpf_prog_info->jited_ksyms

2018-11-02 Thread Daniel Borkmann
On 11/01/2018 08:00 AM, Song Liu wrote:
> Currently, when there is not subprog (prog->aux->func_cnt == 0),
> bpf_prog_info does not return any jited_ksyms. This patch adds
> main program address (prog->bpf_func) to jited_ksyms.
> 
> Signed-off-by: Song Liu 
> ---
>  kernel/bpf/syscall.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 34a9eef5992c..7293b17ca62a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2158,7 +2158,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   }
>  
>   ulen = info.nr_jited_ksyms;
> - info.nr_jited_ksyms = prog->aux->func_cnt;
> + info.nr_jited_ksyms = prog->aux->func_cnt ? : 1;
>   if (info.nr_jited_ksyms && ulen) {
>   if (bpf_dump_raw_ok()) {
>   u64 __user *user_ksyms;
> @@ -2170,9 +2170,17 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>*/
>   ulen = min_t(u32, info.nr_jited_ksyms, ulen);
>   user_ksyms = u64_to_user_ptr(info.jited_ksyms);
> - for (i = 0; i < ulen; i++) {
> - ksym_addr = (ulong) 
> prog->aux->func[i]->bpf_func;
> - if (put_user((u64) ksym_addr, _ksyms[i]))
> + if (prog->aux->func_cnt) {
> + for (i = 0; i < ulen; i++) {
> + ksym_addr = (ulong)

Small nit: can we change ksym_addr, the above and below cast to kernel-style
'unsigned long' while at it?

> + prog->aux->func[i]->bpf_func;
> + if (put_user((u64) ksym_addr,
> +  _ksyms[i]))
> + return -EFAULT;
> + }
> + } else {
> + ksym_addr = (ulong) prog->bpf_func;
> + if (put_user((u64) ksym_addr, _ksyms[0]))
>   return -EFAULT;

If we do this here, I think we should also update nr_jited_func_lens to copy
prog->jited_len to user space to be consistent with this change here. In case
of multi-func, the latter copies the len of the main program, and the lens of
the subprogs. Given we push the address for it to user space, we should then
also push the main prog len if it's only main prog there so this case doesn't
need any special handling by user space.

>   }
>   } else {
> 



Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps

2018-11-02 Thread Daniel Borkmann
On 11/01/2018 06:18 PM, Quentin Monnet wrote:
> 2018-10-30 15:23 UTC+ ~ Quentin Monnet 
>> The limit for memory locked in the kernel by a process is usually set to
>> 64 bytes by default. This can be an issue when creating large BPF maps.
>> A workaround is to raise this limit for the current process before
>> trying to create a new BPF map. Changing the hard limit requires the
>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>> only root can create BPF maps).
> 
> Sorry, the parenthesis is not correct: non-root users can in fact create
> BPF maps as well. If a non-root user calls the function to create a map,
> setrlimit() will fail silently (but set errno), and the program will
> simply go on with its rlimit unchanged.
> 
>> As far as I know there is not API to get the current amount of memory
>> locked for a user, therefore we cannot raise the limit only when
>> required. One solution, used by bcc, is to try to create the map, and on
>> getting a EPERM error, raising the limit to infinity before giving
>> another try. Another approach, used in iproute, is to raise the limit in
>> all cases, before trying to create the map.
>>
>> Here we do the same as in iproute2: the rlimit is raised to infinity
>> before trying to load the map.
>>
>> I send this patch as a RFC to see if people would prefer the bcc
>> approach instead, or the rlimit change to be in bpftool rather than in
>> libbpf.

I'd avoid doing something like this in a generic library; it's basically an
ugly hack for the kind of accounting we're doing and only shows that while
this was "good enough" to start off with in the early days, we should be
doing something better today if every application raises it to inf anyway
then it's broken. :) It just shows that this missed its purpose. Similarly
to the jit_limit discussion on rlimit, perhaps we should be considering
switching to something else entirely from kernel side. Could be something
like memcg but this definitely needs some more evaluation first. (Meanwhile
I'd not change the lib but callers instead and once we have something better
in place we remove this type of "raising to inf" from the tree ...)

>> Signed-off-by: Quentin Monnet 
>> ---
>>  tools/lib/bpf/bpf.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 03f9bcc4ef50..456a5a7b112c 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -26,6 +26,8 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>>  #include "bpf.h"
>>  #include "libbpf.h"
>>  #include 
>> @@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union 
>> bpf_attr *attr,
>>  int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>>  {
>>  __u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
>> +struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>  union bpf_attr attr;
>>  
>> +setrlimit(RLIMIT_MEMLOCK, );
>> +
>>  memset(, '\0', sizeof(attr));
>>  
>>  attr.map_type = create_attr->map_type;
>>
> 



pull-request: bpf 2018-11-01

2018-10-31 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix tcp_bpf_recvmsg() to return -EAGAIN instead of 0 in non-blocking
   case when no data is available yet, from John.

2) Fix a compilation error in libbpf_attach_type_by_name() when compiled
   with clang 3.8, from Andrey.

3) Fix a partial copy of map pointer on scalar alu and remove id
   generation for RET_PTR_TO_MAP_VALUE return types, from Daniel.

4) Add unlimited memlock limit for kernel selftest's flow_dissector_load
   program, from Yonghong.

5) Fix ping for some BPF shell based kselftests where distro does not
   ship "ping -6" anymore, from Li.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit a6b3a3fa042343e29ffaf9169f5ba3c819d4f9a2:

  net: mvpp2: Fix affinity hint allocation (2018-10-30 11:34:41 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to dfeb8f4c9692fd5e6c3eef19c2e4ae5338dbdb01:

  Merge branch 'verifier-fixes' (2018-10-31 16:53:18 -0700)


Alexei Starovoitov (1):
  Merge branch 'verifier-fixes'

Andrey Ignatov (1):
  libbpf: Fix compile error in libbpf_attach_type_by_name

Daniel Borkmann (4):
  bpf: fix partial copy of map_ptr when dst is scalar
  bpf: don't set id on after map lookup with ptr_to_map_val return
  bpf: add various test cases to test_verifier
  bpf: test make sure to run unpriv test cases in test_verifier

John Fastabend (1):
  bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data

Li Zhijian (1):
  kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists

Yonghong Song (1):
  tools/bpf: add unlimited rlimit for flow_dissector_load

 include/linux/bpf_verifier.h  |   3 +
 kernel/bpf/verifier.c |  21 +-
 net/ipv4/tcp_bpf.c|   1 +
 tools/lib/bpf/libbpf.c|  13 +-
 tools/testing/selftests/bpf/flow_dissector_load.c |   2 +
 tools/testing/selftests/bpf/test_skb_cgroup_id.sh |   3 +-
 tools/testing/selftests/bpf/test_sock_addr.sh |   3 +-
 tools/testing/selftests/bpf/test_verifier.c   | 321 +++---
 8 files changed, 319 insertions(+), 48 deletions(-)


[PATCH bpf 0/4] BPF fixes and tests

2018-10-31 Thread Daniel Borkmann
The series contains two fixes in BPF core and test cases. For details
please see individual patches. Thanks!

Daniel Borkmann (4):
  bpf: fix partial copy of map_ptr when dst is scalar
  bpf: don't set id on after map lookup with ptr_to_map_val return
  bpf: add various test cases to test_verifier
  bpf: test make sure to run unpriv test cases in test_verifier

 include/linux/bpf_verifier.h|   3 +
 kernel/bpf/verifier.c   |  21 +-
 tools/testing/selftests/bpf/test_verifier.c | 321 +---
 3 files changed, 305 insertions(+), 40 deletions(-)

-- 
2.9.5



[PATCH bpf 1/4] bpf: fix partial copy of map_ptr when dst is scalar

2018-10-31 Thread Daniel Borkmann
ALU operations on pointers such as scalar_reg += map_value_ptr are
handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
and range in the register state share a union, so transferring state
through dst_reg->range = ptr_reg->range is just buggy as any new
map_ptr in the dst_reg is then truncated (or null) for subsequent
checks. Fix this by adding a raw member and use it for copying state
over to dst_reg.

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann 
Cc: Edward Cree 
Acked-by: Alexei Starovoitov 
---
 include/linux/bpf_verifier.h |  3 +++
 kernel/bpf/verifier.c| 10 ++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9e8056e..d93e897 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -51,6 +51,9 @@ struct bpf_reg_state {
 *   PTR_TO_MAP_VALUE_OR_NULL
 */
struct bpf_map *map_ptr;
+
+   /* Max size from any of the above. */
+   unsigned long raw;
};
/* Fixed part of pointer offset, pointer types only */
s32 off;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 171a2c8..774fa40 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3046,7 +3046,7 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
dst_reg->umax_value = umax_ptr;
dst_reg->var_off = ptr_reg->var_off;
dst_reg->off = ptr_reg->off + smin_val;
-   dst_reg->range = ptr_reg->range;
+   dst_reg->raw = ptr_reg->raw;
break;
}
/* A new variable offset is created.  Note that off_reg->off
@@ -3076,10 +3076,11 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
}
dst_reg->var_off = tnum_add(ptr_reg->var_off, off_reg->var_off);
dst_reg->off = ptr_reg->off;
+   dst_reg->raw = ptr_reg->raw;
if (reg_is_pkt_pointer(ptr_reg)) {
dst_reg->id = ++env->id_gen;
/* something was added to pkt_ptr, set range to zero */
-   dst_reg->range = 0;
+   dst_reg->raw = 0;
}
break;
case BPF_SUB:
@@ -3108,7 +3109,7 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
dst_reg->var_off = ptr_reg->var_off;
dst_reg->id = ptr_reg->id;
dst_reg->off = ptr_reg->off - smin_val;
-   dst_reg->range = ptr_reg->range;
+   dst_reg->raw = ptr_reg->raw;
break;
}
/* A new variable offset is created.  If the subtrahend is known
@@ -3134,11 +3135,12 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
}
dst_reg->var_off = tnum_sub(ptr_reg->var_off, off_reg->var_off);
dst_reg->off = ptr_reg->off;
+   dst_reg->raw = ptr_reg->raw;
if (reg_is_pkt_pointer(ptr_reg)) {
dst_reg->id = ++env->id_gen;
/* something was added to pkt_ptr, set range to zero */
if (smin_val < 0)
-   dst_reg->range = 0;
+   dst_reg->raw = 0;
}
break;
case BPF_AND:
-- 
2.9.5



[PATCH bpf 2/4] bpf: don't set id on after map lookup with ptr_to_map_val return

2018-10-31 Thread Daniel Borkmann
In the verifier there is no such semantics where registers with
PTR_TO_MAP_VALUE type have an id assigned to them. This is only
used in PTR_TO_MAP_VALUE_OR_NULL and later on nullified once the
test against NULL has been pattern matched and type transformed
into PTR_TO_MAP_VALUE.

Fixes: 3e6a4b3e0289 ("bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE")
Signed-off-by: Daniel Borkmann 
Cc: Roman Gushchin 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 774fa40..1971ca32 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2852,10 +2852,6 @@ static int check_helper_call(struct bpf_verifier_env 
*env, int func_id, int insn
regs[BPF_REG_0].type = NOT_INIT;
} else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL ||
   fn->ret_type == RET_PTR_TO_MAP_VALUE) {
-   if (fn->ret_type == RET_PTR_TO_MAP_VALUE)
-   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE;
-   else
-   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
/* There is no offset yet applied, variable or fixed */
mark_reg_known_zero(env, regs, BPF_REG_0);
/* remember map_ptr, so that check_map_access()
@@ -2868,7 +2864,12 @@ static int check_helper_call(struct bpf_verifier_env 
*env, int func_id, int insn
return -EINVAL;
}
regs[BPF_REG_0].map_ptr = meta.map_ptr;
-   regs[BPF_REG_0].id = ++env->id_gen;
+   if (fn->ret_type == RET_PTR_TO_MAP_VALUE) {
+   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE;
+   } else {
+   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
+   regs[BPF_REG_0].id = ++env->id_gen;
+   }
} else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) {
int id = acquire_reference_state(env, insn_idx);
if (id < 0)
-- 
2.9.5



[PATCH bpf 4/4] bpf: test make sure to run unpriv test cases in test_verifier

2018-10-31 Thread Daniel Borkmann
Right now unprivileged tests are never executed as a BPF test run,
only loaded. Allow for running them as well so that we can check
the outcome and probe for regressions.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 71 -
 1 file changed, 40 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 4c7445d..6f61df6 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -76,7 +76,7 @@ struct bpf_test {
int fixup_percpu_cgroup_storage[MAX_FIXUPS];
const char *errstr;
const char *errstr_unpriv;
-   uint32_t retval;
+   uint32_t retval, retval_unpriv;
enum {
UNDEF,
ACCEPT,
@@ -3084,6 +3084,8 @@ static struct bpf_test tests[] = {
.fixup_prog1 = { 2 },
.result = ACCEPT,
.retval = 42,
+   /* Verifier rewrite for unpriv skips tail call here. */
+   .retval_unpriv = 2,
},
{
"stack pointer arithmetic",
@@ -14149,6 +14151,33 @@ static void do_test_fixup(struct bpf_test *test, enum 
bpf_map_type prog_type,
}
 }
 
+static int set_admin(bool admin)
+{
+   cap_t caps;
+   const cap_value_t cap_val = CAP_SYS_ADMIN;
+   int ret = -1;
+
+   caps = cap_get_proc();
+   if (!caps) {
+   perror("cap_get_proc");
+   return -1;
+   }
+   if (cap_set_flag(caps, CAP_EFFECTIVE, 1, _val,
+   admin ? CAP_SET : CAP_CLEAR)) {
+   perror("cap_set_flag");
+   goto out;
+   }
+   if (cap_set_proc(caps)) {
+   perror("cap_set_proc");
+   goto out;
+   }
+   ret = 0;
+out:
+   if (cap_free(caps))
+   perror("cap_free");
+   return ret;
+}
+
 static void do_test_single(struct bpf_test *test, bool unpriv,
   int *passes, int *errors)
 {
@@ -14157,6 +14186,7 @@ static void do_test_single(struct bpf_test *test, bool 
unpriv,
struct bpf_insn *prog = test->insns;
int map_fds[MAX_NR_MAPS];
const char *expected_err;
+   uint32_t expected_val;
uint32_t retval;
int i, err;
 
@@ -14176,6 +14206,8 @@ static void do_test_single(struct bpf_test *test, bool 
unpriv,
   test->result_unpriv : test->result;
expected_err = unpriv && test->errstr_unpriv ?
   test->errstr_unpriv : test->errstr;
+   expected_val = unpriv && test->retval_unpriv ?
+  test->retval_unpriv : test->retval;
 
reject_from_alignment = fd_prog < 0 &&
(test->flags & 
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS) &&
@@ -14209,16 +14241,20 @@ static void do_test_single(struct bpf_test *test, 
bool unpriv,
__u8 tmp[TEST_DATA_LEN << 2];
__u32 size_tmp = sizeof(tmp);
 
+   if (unpriv)
+   set_admin(true);
err = bpf_prog_test_run(fd_prog, 1, test->data,
sizeof(test->data), tmp, _tmp,
, NULL);
+   if (unpriv)
+   set_admin(false);
if (err && errno != 524/*ENOTSUPP*/ && errno != EPERM) {
printf("Unexpected bpf_prog_test_run error\n");
goto fail_log;
}
-   if (!err && retval != test->retval &&
-   test->retval != POINTER_VALUE) {
-   printf("FAIL retval %d != %d\n", retval, test->retval);
+   if (!err && retval != expected_val &&
+   expected_val != POINTER_VALUE) {
+   printf("FAIL retval %d != %d\n", retval, expected_val);
goto fail_log;
}
}
@@ -14261,33 +14297,6 @@ static bool is_admin(void)
return (sysadmin == CAP_SET);
 }
 
-static int set_admin(bool admin)
-{
-   cap_t caps;
-   const cap_value_t cap_val = CAP_SYS_ADMIN;
-   int ret = -1;
-
-   caps = cap_get_proc();
-   if (!caps) {
-   perror("cap_get_proc");
-   return -1;
-   }
-   if (cap_set_flag(caps, CAP_EFFECTIVE, 1, _val,
-   admin ? CAP_SET : CAP_CLEAR)) {
-   perror("cap_set_flag");
-   goto out;
-   }
-   if (cap_set_proc(caps)) {
-   perror("cap_set_proc");
-   goto out;
-   }
-   ret = 0;
-out:
-   if (cap_free(caps))
-   perror("cap_free");
-   return ret;
-}
-
 static void get_unpriv_disabled()
 {
char buf[2];
-- 
2.9.5



[PATCH bpf 3/4] bpf: add various test cases to test_verifier

2018-10-31 Thread Daniel Borkmann
Add some more map related test cases to test_verifier kselftest
to improve test coverage. Summary: 1012 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 250 
 1 file changed, 250 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 36f3d30..4c7445d 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6455,6 +6455,256 @@ static struct bpf_test tests[] = {
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
{
+   "map access: known scalar += value_ptr",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+   BPF_MOV64_IMM(BPF_REG_1, 4),
+   BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_0),
+   BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_array_48b = { 3 },
+   .result = ACCEPT,
+   .retval = 1,
+   },
+   {
+   "map access: value_ptr += known scalar",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+   BPF_MOV64_IMM(BPF_REG_1, 4),
+   BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+   BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_array_48b = { 3 },
+   .result = ACCEPT,
+   .retval = 1,
+   },
+   {
+   "map access: unknown scalar += value_ptr",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+   BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+   BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0xf),
+   BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_0),
+   BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_array_48b = { 3 },
+   .result = ACCEPT,
+   .retval = 1,
+   },
+   {
+   "map access: value_ptr += unknown scalar",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+   BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+   BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0xf),
+   BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+   BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_array_48b = { 3 },
+   .result = ACCEPT,
+   .retval = 1,
+   },
+   {
+   "map access: value_ptr += value_ptr",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),

Re: [PATCH bpf] libbpf: Fix compile error in libbpf_attach_type_by_name

2018-10-31 Thread Daniel Borkmann
On 10/31/2018 09:49 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Oct 31, 2018 at 12:57:18PM -0700, Andrey Ignatov escreveu:
>> Arnaldo Carvalho de Melo reported build error in libbpf when clang
>> version 3.8.1-24 (tags/RELEASE_381/final) is used:
>>
>> libbpf.c:2201:36: error: comparison of constant -22 with expression of
>> type 'const enum bpf_attach_type' is always false
>> [-Werror,-Wtautological-constant-out-of-range-compare]
>> if (section_names[i].attach_type == -EINVAL)
>>  ^  ~~~
>> 1 error generated.
>>
>> Fix the error by keeping "is_attachable" property of a program in a
>> separate struct field instead of trying to use attach_type itself.
> 
> Thanks, now it builds in all the previously failing systems:
> 
> # export PERF_TARBALL=http://192.168.86.4/perf/perf-4.19.0.tar.xz
> # dm debian:9 fedora:25 fedora:26 fedora:27 ubuntu:16.04 ubuntu:17.10
>1 debian:9: Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516   
>clang version 3.8.1-24 (tags/RELEASE_381/final)
>2 fedora:25   : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
>clang version 3.9.1 (tags/RELEASE_391/final)
>3 fedora:26   : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
>clang version 4.0.1 (tags/RELEASE_401/final)
>4 fedora:27   : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
>clang version 5.0.2 (tags/RELEASE_502/final)
>5 ubuntu:16.04: Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 
> 20160609  clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
>6 ubuntu:17.10: Ok   gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0   
>clang version 4.0.1-6 (tags/RELEASE_401/final)
> #
> 
> Tested-by: Arnaldo Carvalho de Melo 

Thanks everyone, applied to bpf tree.


Re: [PATCH] bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data

2018-10-30 Thread Daniel Borkmann
On 10/29/2018 08:31 PM, John Fastabend wrote:
> We return 0 in the case of a nonblocking socket that has no data
> available. However, this is incorrect and may confuse applications.
> After this patch we do the correct thing and return the error
> EAGAIN.
> 
> Quoting return codes from recvmsg manpage,
> 
> EAGAIN or EWOULDBLOCK
>  The socket is marked nonblocking and the receive operation would
>  block, or a receive timeout had been set and the timeout expired
>  before data was received.
> 
> Signed-off-by: John Fastabend 

Applied to bpf, thanks!


Re: [PATCH bpf] tools/bpf: add unlimited rlimit for flow_dissector_load

2018-10-30 Thread Daniel Borkmann
On 10/29/2018 10:56 PM, Yonghong Song wrote:
> On our test machine, bpf selftest test_flow_dissector.sh failed
> with the following error:
>   # ./test_flow_dissector.sh
>   bpffs not mounted. Mounting...
>   libbpf: failed to create map (name: 'jmp_table'): Operation not permitted
>   libbpf: failed to load object 'bpf_flow.o'
>   ./flow_dissector_load: bpf_prog_load bpf_flow.o
>   selftests: test_flow_dissector [FAILED]
> 
> Let us increase the rlimit to remove the above map
> creation failure.
> 
> Signed-off-by: Yonghong Song 

Applied to bpf, thanks!


Re: [PATCH iproute2] bpf: check map symbol type properly with newer llvm compiler

2018-10-29 Thread Daniel Borkmann
On 10/29/2018 11:32 PM, Yonghong Song wrote:
> With llvm 7.0 or earlier, the map symbol type is STT_NOTYPE.
>   -bash-4.4$ cat t.c
>   __attribute__((section("maps"))) int g;
>   -bash-4.4$ clang -target bpf -O2 -c t.c
>   -bash-4.4$ readelf -s t.o
> 
>   Symbol table '.symtab' contains 2 entries:
>  Num:Value  Size TypeBind   Vis  Ndx Name
>0:  0 NOTYPE  LOCAL  DEFAULT  UND
>1:  0 NOTYPE  GLOBAL DEFAULT3 g
>   -bash-4.4$
> 
> The following llvm commit enables BPF target to generate
> proper symbol type and size.
>   commit bf6ec206615b9718869d48b4e5400d0c6e3638dd
>   Author: Yonghong Song 
>   Date:   Wed Sep 19 16:04:13 2018 +
> 
>   [bpf] Symbol sizes and types in object file
> 
>   Clang-compiled object files currently don't include the symbol sizes and
>   types.  Some tools however need that information.  For example, 
> ctfconvert
>   uses that information to generate FreeBSD's CTF representation from ELF
>   files.
>   With this patch, symbol sizes and types are included in object files.
> 
>   Signed-off-by: Paul Chaignon 
>   Reported-by: Yutaro Hayakawa 
> 
> Hence, for llvm 8.0.0 (currently trunk), symbol type will be not NOTYPE, but 
> OBJECT.
>   -bash-4.4$ clang -target bpf -O2 -c t.c
>   -bash-4.4$ readelf -s t.o
> 
>   Symbol table '.symtab' contains 3 entries:
>  Num:Value  Size TypeBind   Vis  Ndx Name
>0:  0 NOTYPE  LOCAL  DEFAULT  UND
>1:  0 FILELOCAL  DEFAULT  ABS t.c
>2:  4 OBJECT  GLOBAL DEFAULT3 g
> -bash-4.4$
> 
> This patch makes sure bpf library accepts both NOTYPE and OBJECT types
> of global map symbols.
> 
> Signed-off-by: Yonghong Song 

Thanks!

Acked-by: Daniel Borkmann 


pull-request: bpf 2018-10-27

2018-10-26 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix toctou race in BTF header validation, from Martin and Wenwen.

2) Fix devmap interface comparison in notifier call which was
   neglecting netns, from Taehee.

3) Several fixes in various places, for example, correcting direct
   packet access and helper function availability, from Daniel.

4) Fix BPF kselftest config fragment to include af_xdp and sockmap,
   from Naresh.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 42d0f71c9b5fd48861d61cfc05c9e001f847c9d5:

  octeontx2-af: Use GFP_ATOMIC under spin lock (2018-10-25 11:36:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to d8fd9e106fbc291167ebb675ad69234597d0fd98:

  bpf: fix wrong helper enablement in cgroup local storage (2018-10-26 16:03:30 
-0700)


Alexei Starovoitov (1):
  Merge branch 'pkt-access-fixes'

Daniel Borkmann (9):
  bpf: fix test suite to enable all unpriv program types
  bpf: disallow direct packet access for unpriv in cg_skb
  bpf: fix direct packet access for flow dissector progs
  bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data
  bpf: fix direct packet write into pop/peek helpers
  bpf: fix leaking uninitialized memory on pop/peek helpers
  bpf: make direct packet write unclone more robust
  bpf: add bpf_jit_limit knob to restrict unpriv allocations
  bpf: fix wrong helper enablement in cgroup local storage

Martin Lau (1):
  bpf, btf: fix a missing check bug in btf_parse

Naresh Kamboju (1):
  selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS

Taehee Yoo (1):
  bpf: devmap: fix wrong interface selection in notifier_call

 Documentation/sysctl/net.txt|  8 
 include/linux/filter.h  |  1 +
 kernel/bpf/btf.c| 58 +
 kernel/bpf/core.c   | 49 ++--
 kernel/bpf/devmap.c |  3 +-
 kernel/bpf/helpers.c|  2 -
 kernel/bpf/queue_stack_maps.c   |  2 +
 kernel/bpf/verifier.c   | 13 +--
 net/core/filter.c   | 21 +--
 net/core/sysctl_net_core.c  | 10 -
 tools/testing/selftests/bpf/config  |  2 +
 tools/testing/selftests/bpf/test_verifier.c | 15 +++-
 12 files changed, 133 insertions(+), 51 deletions(-)


[PATCH bpf] bpf: fix wrong helper enablement in cgroup local storage

2018-10-26 Thread Daniel Borkmann
Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage()
helper function") enabled the bpf_get_local_storage() helper also
for BPF program types where it does not make sense to use them.

They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
even though both program types are not invoked in combination with
cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
bpf_cgroup_storage_set() is set shortly before BPF program invocation.

Later, the helper bpf_get_local_storage() retrieves this prior set
up per-cpu pointer and hands the buffer to the BPF program. The map
argument in there solely retrieves the enum bpf_cgroup_storage_type
from a local storage map associated with the program and based on the
type returns either the global or per-cpu storage. However, there
is no specific association between the program's map and the actual
content in bpf_cgroup_storage[].

Meaning, any BPF program that would have been properly run from the
cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
was performed, and that is later unloaded such that prog / maps are
teared down will cause a use after free if that pointer is retrieved
from programs that are not run through BPF_PROG_RUN_ARRAY() but have
the cgroup local storage helper enabled in their func proto.

Lets just remove it from the two sock_map program types to fix it.
Auditing through the types where this helper is enabled, it appears
that these are the only ones where it was mistakenly allowed.

Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper 
function")
Signed-off-by: Daniel Borkmann 
Cc: Roman Gushchin 
Acked-by: John Fastabend 
---
 net/core/filter.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 35c6933..4d7c511 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5264,8 +5264,6 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
return _msg_pull_data_proto;
case BPF_FUNC_msg_push_data:
return _msg_push_data_proto;
-   case BPF_FUNC_get_local_storage:
-   return _get_local_storage_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -5296,8 +5294,6 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
return _sk_redirect_map_proto;
case BPF_FUNC_sk_redirect_hash:
return _sk_redirect_hash_proto;
-   case BPF_FUNC_get_local_storage:
-   return _get_local_storage_proto;
 #ifdef CONFIG_INET
case BPF_FUNC_sk_lookup_tcp:
return _sk_lookup_tcp_proto;
-- 
2.9.5



Re: [PATCH v2 bpf] bpf: devmap: fix wrong interface selection in notifier_call

2018-10-25 Thread Daniel Borkmann
On 10/24/2018 01:15 PM, Taehee Yoo wrote:
> The dev_map_notification() removes interface in devmap if
> unregistering interface's ifindex is same.
> But only checking ifindex is not enough because other netns can have
> same ifindex. so that wrong interface selection could occurred.
> Hence netdev pointer comparison code is added.
> 
> v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann)
> v1: Initial patch
> 
> Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map")
> Signed-off-by: Taehee Yoo 

Applied to bpf, thanks Taehee!


Re: [PATCH] selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS

2018-10-25 Thread Daniel Borkmann
On 10/25/2018 04:47 PM, Naresh Kamboju wrote:
> BPF sockmap and hashmap are dependent on CONFIG_BPF_STREAM_PARSER and
> xskmap is dependent on CONFIG_XDP_SOCKETS
> 
> Signed-off-by: Naresh Kamboju 

Applied to bpf, thanks Naresh!


Re: [PATCH bpf 7/7] bpf: make direct packet write unclone more robust

2018-10-24 Thread Daniel Borkmann
On 10/24/2018 11:42 PM, Song Liu wrote:
> On Wed, Oct 24, 2018 at 1:06 PM Daniel Borkmann  wrote:
>>
>> Given this seems to be quite fragile and can easily slip through the
>> cracks, lets make direct packet write more robust by requiring that
>> future program types which allow for such write must provide a prologue
>> callback. In case of XDP and sk_msg it's noop, thus add a generic noop
>> handler there. The latter starts out with NULL data/data_end unconditionally
>> when sg pages are shared.
>>
>> Signed-off-by: Daniel Borkmann 
>> Acked-by: Alexei Starovoitov 
>> ---
>>  kernel/bpf/verifier.c |  6 +-
>>  net/core/filter.c | 11 +++
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 5fc9a65..171a2c8 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -5709,7 +5709,11 @@ static int convert_ctx_accesses(struct 
>> bpf_verifier_env *env)
>> bool is_narrower_load;
>> u32 target_size;
>>
>> -   if (ops->gen_prologue) {
>> +   if (ops->gen_prologue || env->seen_direct_write) {
>> +   if (!ops->gen_prologue) {
>> +   verbose(env, "bpf verifier is misconfigured\n");
>> +   return -EINVAL;
>> +   }
> 
> nit: how about this?
> 
> diff --git i/kernel/bpf/verifier.c w/kernel/bpf/verifier.c
> index 6fbe7a8afed7..d35078024e35 100644
> --- i/kernel/bpf/verifier.c
> +++ w/kernel/bpf/verifier.c
> @@ -5286,6 +5286,11 @@ static int convert_ctx_accesses(struct
> bpf_verifier_env *env)
> bool is_narrower_load;
> u32 target_size;
> 
> +   if (!ops->gen_prologue && env->seen_direct_write) {
> +   verbose(env, "bpf verifier is misconfigured\n");
> +   return -EINVAL;
> +   }
> +
> if (ops->gen_prologue) {
> cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
> env->prog);
> 

Hm, probably matter of different style preference, personally I'd prefer
the one as is though.

Thanks,
Daniel


[PATCH bpf 1/7] bpf: fix test suite to enable all unpriv program types

2018-10-24 Thread Daniel Borkmann
Given BPF_PROG_TYPE_CGROUP_SKB program types are also valid in an
unprivileged setting, lets not omit these tests and potentially
have issues fall through the cracks. Make this more obvious by
adding a small test_as_unpriv() helper.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 769d68a..8e1a79d 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4891,6 +4891,8 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.result = ACCEPT,
+   .result_unpriv = REJECT,
+   .errstr_unpriv = "R3 pointer comparison prohibited",
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
},
{
@@ -5146,6 +5148,7 @@ static struct bpf_test tests[] = {
.fixup_cgroup_storage = { 1 },
.result = REJECT,
.errstr = "get_local_storage() doesn't support non-zero flags",
+   .errstr_unpriv = "R2 leaks addr into helper function",
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
},
{
@@ -5261,6 +5264,7 @@ static struct bpf_test tests[] = {
.fixup_percpu_cgroup_storage = { 1 },
.result = REJECT,
.errstr = "get_local_storage() doesn't support non-zero flags",
+   .errstr_unpriv = "R2 leaks addr into helper function",
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
},
{
@@ -14050,6 +14054,13 @@ static void get_unpriv_disabled()
fclose(fd);
 }
 
+static bool test_as_unpriv(struct bpf_test *test)
+{
+   return !test->prog_type ||
+  test->prog_type == BPF_PROG_TYPE_SOCKET_FILTER ||
+  test->prog_type == BPF_PROG_TYPE_CGROUP_SKB;
+}
+
 static int do_test(bool unpriv, unsigned int from, unsigned int to)
 {
int i, passes = 0, errors = 0, skips = 0;
@@ -14060,10 +14071,10 @@ static int do_test(bool unpriv, unsigned int from, 
unsigned int to)
/* Program types that are not supported by non-root we
 * skip right away.
 */
-   if (!test->prog_type && unpriv_disabled) {
+   if (test_as_unpriv(test) && unpriv_disabled) {
printf("#%d/u %s SKIP\n", i, test->descr);
skips++;
-   } else if (!test->prog_type) {
+   } else if (test_as_unpriv(test)) {
if (!unpriv)
set_admin(false);
printf("#%d/u %s ", i, test->descr);
-- 
2.9.5



[PATCH bpf 5/7] bpf: fix direct packet write into pop/peek helpers

2018-10-24 Thread Daniel Borkmann
Commit f1a2e44a3aec ("bpf: add queue and stack maps") probably just
copy-pasted .pkt_access for bpf_map_{pop,peek}_elem() helpers, but
this is buggy in this context since it would allow writes into cloned
skbs which is invalid. Therefore, disable .pkt_access for the two.

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Mauricio Vasquez B 
---
 kernel/bpf/helpers.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index ab0d5e3..a74972b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -99,7 +99,6 @@ BPF_CALL_2(bpf_map_pop_elem, struct bpf_map *, map, void *, 
value)
 const struct bpf_func_proto bpf_map_pop_elem_proto = {
.func   = bpf_map_pop_elem,
.gpl_only   = false,
-   .pkt_access = true,
.ret_type   = RET_INTEGER,
.arg1_type  = ARG_CONST_MAP_PTR,
.arg2_type  = ARG_PTR_TO_UNINIT_MAP_VALUE,
@@ -113,7 +112,6 @@ BPF_CALL_2(bpf_map_peek_elem, struct bpf_map *, map, void 
*, value)
 const struct bpf_func_proto bpf_map_peek_elem_proto = {
.func   = bpf_map_pop_elem,
.gpl_only   = false,
-   .pkt_access = true,
.ret_type   = RET_INTEGER,
.arg1_type  = ARG_CONST_MAP_PTR,
.arg2_type  = ARG_PTR_TO_UNINIT_MAP_VALUE,
-- 
2.9.5



[PATCH bpf 4/7] bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data

2018-10-24 Thread Daniel Borkmann
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for
BPF_PROG_TYPE_CGROUP_SKB") added direct packet access for skbs in
cg_skb program types, however allowed access type was not added to
the may_access_direct_pkt_data() helper. Therefore the latter always
returns false. This is not directly an issue, it just means writes
are unconditionally disabled (which is correct) but also reads.
Latter is relevant in this function when BPF helpers may read direct
packet data which is unconditionally disabled then. Fix it by properly
adding BPF_PROG_TYPE_CGROUP_SKB to may_access_direct_pkt_data().

Fixes: b39b5f411dcf ("bpf: add cg_skb_is_valid_access for 
BPF_PROG_TYPE_CGROUP_SKB")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Song Liu 
---
 kernel/bpf/verifier.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b0cc8f2..5fc9a65 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1393,6 +1393,7 @@ static bool may_access_direct_pkt_data(struct 
bpf_verifier_env *env,
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
+   case BPF_PROG_TYPE_CGROUP_SKB:
if (t == BPF_WRITE)
return false;
/* fallthrough */
-- 
2.9.5



[PATCH bpf 6/7] bpf: fix leaking uninitialized memory on pop/peek helpers

2018-10-24 Thread Daniel Borkmann
Commit f1a2e44a3aec ("bpf: add queue and stack maps") added helpers
with ARG_PTR_TO_UNINIT_MAP_VALUE. Meaning, the helper is supposed to
fill the map value buffer with data instead of reading from it like
in other helpers such as map update. However, given the buffer is
allowed to be uninitialized (since we fill it in the helper anyway),
it also means that the helper is obliged to wipe the memory in case
of an error in order to not allow for leaking uninitialized memory.
Given pop/peek is both handled inside __{stack,queue}_map_get(),
lets wipe it there on error case, that is, empty stack/queue.

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Mauricio Vasquez B 
---
 kernel/bpf/queue_stack_maps.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 12a93fb..8bbd72d 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -122,6 +122,7 @@ static int __queue_map_get(struct bpf_map *map, void 
*value, bool delete)
raw_spin_lock_irqsave(>lock, flags);
 
if (queue_stack_map_is_empty(qs)) {
+   memset(value, 0, qs->map.value_size);
err = -ENOENT;
goto out;
}
@@ -151,6 +152,7 @@ static int __stack_map_get(struct bpf_map *map, void 
*value, bool delete)
raw_spin_lock_irqsave(>lock, flags);
 
if (queue_stack_map_is_empty(qs)) {
+   memset(value, 0, qs->map.value_size);
err = -ENOENT;
goto out;
}
-- 
2.9.5



[PATCH bpf 7/7] bpf: make direct packet write unclone more robust

2018-10-24 Thread Daniel Borkmann
Given this seems to be quite fragile and can easily slip through the
cracks, lets make direct packet write more robust by requiring that
future program types which allow for such write must provide a prologue
callback. In case of XDP and sk_msg it's noop, thus add a generic noop
handler there. The latter starts out with NULL data/data_end unconditionally
when sg pages are shared.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c |  6 +-
 net/core/filter.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fc9a65..171a2c8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5709,7 +5709,11 @@ static int convert_ctx_accesses(struct bpf_verifier_env 
*env)
bool is_narrower_load;
u32 target_size;
 
-   if (ops->gen_prologue) {
+   if (ops->gen_prologue || env->seen_direct_write) {
+   if (!ops->gen_prologue) {
+   verbose(env, "bpf verifier is misconfigured\n");
+   return -EINVAL;
+   }
cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
env->prog);
if (cnt >= ARRAY_SIZE(insn_buf)) {
diff --git a/net/core/filter.c b/net/core/filter.c
index 3fdddfa..cd648d0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5644,6 +5644,15 @@ static bool sock_filter_is_valid_access(int off, int 
size,
   prog->expected_attach_type);
 }
 
+static int bpf_noop_prologue(struct bpf_insn *insn_buf, bool direct_write,
+const struct bpf_prog *prog)
+{
+   /* Neither direct read nor direct write requires any preliminary
+* action.
+*/
+   return 0;
+}
+
 static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write,
const struct bpf_prog *prog, int drop_verdict)
 {
@@ -7210,6 +7219,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = {
.get_func_proto = xdp_func_proto,
.is_valid_access= xdp_is_valid_access,
.convert_ctx_access = xdp_convert_ctx_access,
+   .gen_prologue   = bpf_noop_prologue,
 };
 
 const struct bpf_prog_ops xdp_prog_ops = {
@@ -7308,6 +7318,7 @@ const struct bpf_verifier_ops sk_msg_verifier_ops = {
.get_func_proto = sk_msg_func_proto,
.is_valid_access= sk_msg_is_valid_access,
.convert_ctx_access = sk_msg_convert_ctx_access,
+   .gen_prologue   = bpf_noop_prologue,
 };
 
 const struct bpf_prog_ops sk_msg_prog_ops = {
-- 
2.9.5



[PATCH bpf 3/7] bpf: fix direct packet access for flow dissector progs

2018-10-24 Thread Daniel Borkmann
Commit d58e468b1112 ("flow_dissector: implements flow dissector BPF
hook") added direct packet access for skbs in may_access_direct_pkt_data()
function where this enables read and write access to the skb->data. This
is buggy because without a prologue generator such as bpf_unclone_prologue()
we would allow for writing into cloned skbs. Original intention might have
been to only allow read access where this is not needed (similar as the
flow_dissector_func_proto() indicates which enables only bpf_skb_load_bytes()
as well), therefore this patch fixes it to restrict to read-only.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Petar Penkov 
---
 kernel/bpf/verifier.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 98fa0be..b0cc8f2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1387,21 +1387,23 @@ static bool may_access_direct_pkt_data(struct 
bpf_verifier_env *env,
   enum bpf_access_type t)
 {
switch (env->prog->type) {
+   /* Program types only with direct read access go here! */
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_OUT:
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
case BPF_PROG_TYPE_SK_REUSEPORT:
-   /* dst_input() and dst_output() can't write for now */
+   case BPF_PROG_TYPE_FLOW_DISSECTOR:
if (t == BPF_WRITE)
return false;
/* fallthrough */
+
+   /* Program types with direct read + write access go here! */
case BPF_PROG_TYPE_SCHED_CLS:
case BPF_PROG_TYPE_SCHED_ACT:
case BPF_PROG_TYPE_XDP:
case BPF_PROG_TYPE_LWT_XMIT:
case BPF_PROG_TYPE_SK_SKB:
case BPF_PROG_TYPE_SK_MSG:
-   case BPF_PROG_TYPE_FLOW_DISSECTOR:
if (meta)
return meta->pkt_access;
 
-- 
2.9.5



[PATCH bpf 2/7] bpf: disallow direct packet access for unpriv in cg_skb

2018-10-24 Thread Daniel Borkmann
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for
BPF_PROG_TYPE_CGROUP_SKB") added support for returning pkt pointers
for direct packet access. Given this program type is allowed for both
unprivileged and privileged users, we shouldn't allow unprivileged
ones to use it, e.g. besides others one reason would be to avoid any
potential speculation on the packet test itself, thus guard this for
root only.

Fixes: b39b5f411dcf ("bpf: add cg_skb_is_valid_access for 
BPF_PROG_TYPE_CGROUP_SKB")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: Song Liu 
---
 net/core/filter.c   | 6 ++
 tools/testing/selftests/bpf/test_verifier.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 35c6933..3fdddfa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5496,7 +5496,13 @@ static bool cg_skb_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
return false;
+   case bpf_ctx_range(struct __sk_buff, data):
+   case bpf_ctx_range(struct __sk_buff, data_end):
+   if (!capable(CAP_SYS_ADMIN))
+   return false;
+   break;
}
+
if (type == BPF_WRITE) {
switch (off) {
case bpf_ctx_range(struct __sk_buff, mark):
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 8e1a79d..36f3d30 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4892,7 +4892,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.result_unpriv = REJECT,
-   .errstr_unpriv = "R3 pointer comparison prohibited",
+   .errstr_unpriv = "invalid bpf_context access off=76 size=4",
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
},
{
-- 
2.9.5



[PATCH bpf 0/7] Batch of direct packet access fixes for BPF

2018-10-24 Thread Daniel Borkmann
Several fixes to get direct packet access in order from verifier
side. Also test suite fix to run cg_skb as unpriv and an improvement
to make direct packet write less error prone in future.

Thanks!

Daniel Borkmann (7):
  bpf: fix test suite to enable all unpriv program types
  bpf: disallow direct packet access for unpriv in cg_skb
  bpf: fix direct packet access for flow dissector progs
  bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data
  bpf: fix direct packet write into pop/peek helpers
  bpf: fix leaking uninitialized memory on pop/peek helpers
  bpf: make direct packet write unclone more robust

 kernel/bpf/helpers.c|  2 --
 kernel/bpf/queue_stack_maps.c   |  2 ++
 kernel/bpf/verifier.c   | 13 ++---
 net/core/filter.c   | 17 +
 tools/testing/selftests/bpf/test_verifier.c | 15 +--
 5 files changed, 42 insertions(+), 7 deletions(-)

-- 
2.9.5



  1   2   3   4   5   6   7   8   9   10   >