Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
On Thu, May 10, 2018 at 9:28 AM Borislav Petkov wrote: > As someone already pointed out on IRC, arch/x86/include/asm/cpufeature.h > is solely a kernel header so nothing but kernel should include it. So > forget the userspace breakage "argument". For what is worth, I have the same exact problem in a relatively popular open source system call tracer, and my attempt to fix the issue from user space has been: https://github.com/draios/sysdig/commit/2958eb1d52e047f4b93d1238be803e7c405bdec2 While I can definitely live with that (and I'd be happy to submit a patch to samples/ following the same approach) and absolutely respect the technical authority of the reviewers here, it would be nice to recognize that these changes actually affect users to a certain degree, even if from a technical point of view they don't break userspace. From a practical point of view, BPF is widely used from userspace programs to access some kernel data structures to gather visibility information, and even the simplest use case, such as including linux/sched.h to access some task_struct members, ends up pulling in arch/x86/include/asm/cpufeature.h, thus (ab)using that file. Adding these quirks definitely increases the complexity a developer needs to keep in mind in order to take advantage of a BPF based instrumentation. Thanks
Re: [RFC bpf] bpf, x64: fix JIT emission for dead code
On Wed, Apr 25, 2018 at 8:34 AM Daniel Borkmann wrote: > I've applied this fix to bpf tree, thanks Gianluca! Thank you all for the quick review, really appreciated!
[RFC bpf] bpf, x64: fix JIT emission for dead code
Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead code with a series of ja-1 instructions, for safety. That made JIT compilation much more complex for some BPF programs. One instance of such programs is, for example: bool flag = false ... /* A bunch of other code */ ... if (flag) do_something() In some cases llvm is not able to remove at compile time the code for do_something(), so the generated BPF program ends up with a large amount of dead instructions. In one specific real life example, there are two series of ~500 and ~1000 dead instructions in the program. When the verifier replaces them with a series of ja-1 instructions, it causes an interesting behavior at JIT time. During the first pass, since all the instructions are estimated at 64 bytes, the ja-1 instructions end up being translated as 5 bytes JMP instructions (0xE9), since the jump offsets become increasingly large (> 127) as each instruction gets discovered to be 5 bytes instead of the estimated 64. Starting from the second pass, the first N instructions of the ja-1 sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets become <= 127 this time. In particular, N is defined as roughly 127 / (5 - 2) ~= 42. So, each further pass will make the subsequent N JMP instructions shrink from 5 to 2 bytes, making the image shrink every time. This means that in order to have the entire program converge, there need to be, in the real example above, at least ~1000 / 42 ~= 24 passes just for translating the dead code. If we add this number to the passes needed to translate the other non dead code, it brings such program to 40+ passes, and JIT doesn't complete. Ultimately the userspace loader fails because such BPF program was supposed to be part of a prog array owner being JITed. While it is certainly possible to try to refactor such programs to help the compiler remove dead code, the behavior is not really intuitive and it puts further burden on the BPF developer who is not expecting such behavior. To make things worse, such programs are working just fine in all the kernel releases prior to the ja-1 fix. A possible approach to mitigate this behavior consists into noticing that for ja-1 instructions we don't really need to rely on the estimated size of the previous and current instructions, we know that a -1 BPF jump offset can be safely translated into a 0xEB instruction with a jump offset of -2. Such fix brings the BPF program in the previous example to complete again in ~9 passes. Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing") Signed-off-by: Gianluca Borello --- Hi Posting this as RFC since I just wanted to report this potential bug and propose a possible fix, although I'm not sure if: 1) There might be a better fix on the JIT side 2) We might want to replace the ja-1 instructions with something else 3) We might want to ignore this issue If we choose option 3, I'd just like to point out that this caused a real regression on a couple BPF programs that are part of a larger collection of programs that used to work fine on 4.15 and I recently found broken in 4.16, so there would be some value in somehow addressing this. Thanks arch/x86/net/bpf_jit_comp.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index b725154182cc..abce27ceb411 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1027,7 +1027,17 @@ xadd:if (is_imm8(insn->off)) break; case BPF_JMP | BPF_JA: - jmp_offset = addrs[i + insn->off] - addrs[i]; + if (insn->off == -1) + /* -1 jmp instructions will always jump +* backwards two bytes. Explicitly handling +* this case avoids wasting too many passes +* when there are long sequences of replaced +* dead code. +*/ + jmp_offset = -2; + else + jmp_offset = addrs[i + insn->off] - addrs[i]; + if (!jmp_offset) /* optimize out nop jumps */ break; -- 2.17.0
Re: Issue accessing task_struct from BPF due to 4.16 stack-protector changes
On Fri, Mar 2, 2018 at 12:42 PM, Alexei Starovoitov wrote: > > good catch! > I wonder why sched.h is using this flag insead of relying on #defines from > autoconf.h > It could have been using CONFIG_HAVE_CC_STACKPROTECTOR > instead of CONFIG_CC_STACKPROTECTOR, no ? > Thanks for your reply Alexei. I think switching to HAVE_CC_STACKPROTECTOR could indeed solve this particular BPF issue in a cleaner way (I tested it), at the cost of having that struct member always present for the supported architectures even if the stack protector is actually disabled (e.g. CONFIG_CC_STACKPROTECTOR_NONE=y). Not sure if this could be frowned upon by someone considering how critical task_struct is, but on the other hand is really just 8 bytes. Thanks
Issue accessing task_struct from BPF due to 4.16 stack-protector changes
Hello, While testing bpf-next, I noticed that I was reading garbage when accessing some task_struct members, and the issue seems caused by the recent commit 2bc2f688fdf8 ("Makefile: move stack-protector availability out of Kconfig") which removes CONFIG_CC_STACKPROTECTOR from autoconf.h. When I compile my BPF program, offsetof(struct task_struct, files), which is the member I'm dereferencing, returns 1768 (where the garbage is), whereas doing it on 4.15 returns 1776 (where the correct member is). I believe when compiling with clang this portion of the task_struct doesn't get considered anymore: #ifdef CONFIG_CC_STACKPROTECTOR /* Canary value for the -fstack-protector GCC feature: */ unsigned long stack_canary; #endif I solved it by adding $(KBUILD_CPPFLAGS) to my BPF Makefile (which is pretty similar to the one used in samples/bpf/Makefile). Two questions: 1) Do you confirm this is the proper way to handle this moving forward? Or should there be a better way? 2) Would you consider useful a simple patch to samples/bpf/Makefile so that other developers will not be stuck in a long bisect to figure out why they read garbage when dereferencing task_struct? I assume that several people use that Makefile as a template to start their project, like I did (perhaps I'm assuming wrong though). Thanks
[PATCH bpf-next] bpf: fix stacksafe exploration when comparing states
Commit cc2b14d51053 ("bpf: teach verifier to recognize zero initialized stack") introduced a very relaxed check when comparing stacks of different states, effectively returning a positive result in many cases where it shouldn't. This can create problems in cases such as this following C pseudocode: long var; long *x = bpf_map_lookup(...); if (!x) return; if (*x != 0xbeef) var = 0; else var = 1; /* This is the key part, calling a helper causes an explored state * to be saved with the information that "var" is on the stack as * STACK_ZERO, since the helper is first met by the verifier after * the "var = 0" assignment. This state will however be wrongly used * also for the "var = 1" case, so the verifier assumes "var" is always * 0 and will replace the NULL assignment with nops, because the * search pruning prevents it from exploring the faulty branch. */ bpf_ktime_get_ns(); if (var) *(long *)0 = 0xbeef; Fix the issue by making sure that the stack is fully explored before returning a positive comparison result. Also attach a couple tests that highlight the bad behavior. In the first test, without this fix instructions 16 and 17 are replaced with nops instead of being rejected by the verifier. The second test, instead, allows a program to make a potentially illegal read from the stack. Fixes: cc2b14d51053 ("bpf: teach verifier to recognize zero initialized stack") Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 2 +- tools/testing/selftests/bpf/test_verifier.c | 51 + 2 files changed, 52 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8b442ae125d0..93e1c77dae1d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4107,7 +4107,7 @@ static bool stacksafe(struct bpf_func_state *old, if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)) /* explored state didn't use this */ - return true; + continue; if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID) continue; diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 3bacff0d6f91..5e79515d10c5 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -9715,6 +9715,57 @@ static struct bpf_test tests[] = { .result = REJECT, .prog_type = BPF_PROG_TYPE_XDP, }, + { + "search pruning: all branches should be verified (nop operation)", + .insns = { + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 11), + BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, 0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_3, 0xbeef, 2), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_JMP_A(1), + BPF_MOV64_IMM(BPF_REG_4, 1), + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_4, -16), + BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns), + BPF_LDX_MEM(BPF_DW, BPF_REG_5, BPF_REG_10, -16), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_5, 0, 2), + BPF_MOV64_IMM(BPF_REG_6, 0), + BPF_ST_MEM(BPF_DW, BPF_REG_6, 0, 0xdead), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "R6 invalid mem access 'inv'", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_TRACEPOINT, + }, + { + "search pruning: all branches should be verified (invalid stack access)", + .insns = { + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8), + BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_0, 0), + BPF_MOV64_IMM(BPF_REG_4, 0), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_3, 0xbeef, 2), + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_4, -16), + BPF_JMP_A(1), +
Re: len = bpf_probe_read_str(); bpf_perf_event_output(... len) == FAIL
On Tue, Nov 21, 2017 at 2:31 PM, Alexei Starovoitov wrote: > > yeah sorry about this hack. Gianluca reported this issue as well. > Yonghong fixed it for bpf_probe_read only. We will extend > the fix to bpf_probe_read_str() and bpf_perf_event_output() asap. > The above workaround gets too much into llvm and verifier details > we should strive to make bpf program writing as easy as possible. > Hi Arnaldo With the help of Alexei, Daniel and Yonghong I just submitted a new series ("bpf: fix semantics issues with helpers receiving NULL arguments") that includes a fix in bpf_perf_event_output. This should simplify the way you write your bpf programs, so you shouldn't be required to write those convoluted branches anymore (there are a few usage examples in the commit log). In my case it made writing the code much easier, after applying it I haven't been surprised by the compiler output in a while, and I hope your experience will be improved as well. Thanks
[PATCH net 3/4] bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO
Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way the compiler generates optimized BPF code when checking boundaries of an argument from C code. A typical example of this optimized code can be generated using the bpf_probe_read_str helper when operating on variable memory: /* len is a generic scalar */ if (len > 0 && len <= 0x7fff) bpf_probe_read_str(p, len, s); 251: (79) r1 = *(u64 *)(r10 -88) 252: (07) r1 += -1 253: (25) if r1 > 0x7ffe goto pc-42 254: (bf) r1 = r7 255: (79) r2 = *(u64 *)(r10 -88) 256: (bf) r8 = r4 257: (85) call bpf_probe_read_str#45 R2 min value is negative, either use unsigned or 'var &= const' With this code, the verifier loses track of the variable. Replacing arg2 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it avoids this quite common case which leads to usability issues, and the compiler generates code that the verifier can more easily test: if (len <= 0x7fff) bpf_probe_read_str(p, len, s); or bpf_probe_read_str(p, len & 0x7fff, s); No changes to the bpf_probe_read_str helper are necessary since strncpy_from_unsafe itself immediately returns if the size passed is 0. Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- kernel/trace/bpf_trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 728909f7951c..ed8601a1a861 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -494,7 +494,7 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = { .gpl_only = true, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_UNINIT_MEM, - .arg2_type = ARG_CONST_SIZE, + .arg2_type = ARG_CONST_SIZE_OR_ZERO, .arg3_type = ARG_ANYTHING, }; -- 2.14.1
[PATCH net 4/4] bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO
Commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics") relaxed the treatment of ARG_CONST_SIZE_OR_ZERO due to the way the compiler generates optimized BPF code when checking boundaries of an argument from C code. A typical example of this optimized code can be generated using the bpf_perf_event_output helper when operating on variable memory: /* len is a generic scalar */ if (len > 0 && len <= 0x7fff) bpf_perf_event_output(ctx, &perf_map, 0, buf, len); 110: (79) r5 = *(u64 *)(r10 -40) 111: (bf) r1 = r5 112: (07) r1 += -1 113: (25) if r1 > 0x7ffe goto pc+6 114: (bf) r1 = r6 115: (18) r2 = 0x94e5f166c200 117: (b7) r3 = 0 118: (bf) r4 = r7 119: (85) call bpf_perf_event_output#25 R5 min value is negative, either use unsigned or 'var &= const' With this code, the verifier loses track of the variable. Replacing arg5 with ARG_CONST_SIZE_OR_ZERO is thus desirable since it avoids this quite common case which leads to usability issues, and the compiler generates code that the verifier can more easily test: if (len <= 0x7fff) bpf_perf_event_output(ctx, &perf_map, 0, buf, len); or bpf_perf_event_output(ctx, &perf_map, 0, buf, len & 0x7fff); No changes to the bpf_perf_event_output helper are necessary since it can handle a case where size is 0, and an empty frame is pushed. Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- kernel/trace/bpf_trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ed8601a1a861..27d1f4ffa3de 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -403,7 +403,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = { .arg2_type = ARG_CONST_MAP_PTR, .arg3_type = ARG_ANYTHING, .arg4_type = ARG_PTR_TO_MEM, - .arg5_type = ARG_CONST_SIZE, + .arg5_type = ARG_CONST_SIZE_OR_ZERO, }; static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs); @@ -605,7 +605,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_tp = { .arg2_type = ARG_CONST_MAP_PTR, .arg3_type = ARG_ANYTHING, .arg4_type = ARG_PTR_TO_MEM, - .arg5_type = ARG_CONST_SIZE, + .arg5_type = ARG_CONST_SIZE_OR_ZERO, }; BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map, -- 2.14.1
[PATCH net 2/4] bpf: remove explicit handling of 0 for arg2 in bpf_probe_read
Commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO") changed arg2 type to ARG_CONST_SIZE_OR_ZERO to simplify writing bpf programs by taking advantage of the new semantics introduced for ARG_CONST_SIZE_OR_ZERO which allows arguments. In order to prevent the helper from actually passing a NULL pointer to probe_kernel_read, which can happen when is passed to the helper, the commit also introduced an explicit check against size == 0. After the recent introduction of the ARG_PTR_TO_MEM_OR_NULL type, bpf_probe_read can not receive a pair of arguments anymore, thus the check is not needed anymore and can be removed, since probe_kernel_read can correctly handle a call. This also fixes the semantics of the helper before it gets officially released and bpf programs start relying on this check. Fixes: 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO") Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann Acked-by: Yonghong Song --- kernel/trace/bpf_trace.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index a5580c670866..728909f7951c 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -78,16 +78,12 @@ EXPORT_SYMBOL_GPL(trace_call_bpf); BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) { - int ret = 0; - - if (unlikely(size == 0)) - goto out; + int ret; ret = probe_kernel_read(dst, unsafe_ptr, size); if (unlikely(ret < 0)) memset(dst, 0, size); - out: return ret; } -- 2.14.1
[PATCH net 1/4] bpf: introduce ARG_PTR_TO_MEM_OR_NULL
With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO and the verifier can prove the value of this next argument is 0. However, most helpers are just interested in handling , so forcing them to deal with makes the implementation of those helpers more complicated for no apparent benefits, requiring them to explicitly handle those corner cases with checks that bpf programs could start relying upon, preventing the possibility of removing them later. Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case. Currently, the only helper that needs this is bpf_csum_diff_proto(), so change arg1 and arg3 to this new type as well. Also add a new battery of tests that explicitly test the !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the various variations are focused on bpf_csum_diff, so cover also other helpers. Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/linux/bpf.h | 1 + kernel/bpf/verifier.c | 4 +- net/core/filter.c | 4 +- tools/testing/selftests/bpf/test_verifier.c | 113 ++-- 4 files changed, 112 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 76c577281d78..e55e4255a210 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -78,6 +78,7 @@ enum bpf_arg_type { * functions that access data on eBPF program stack */ ARG_PTR_TO_MEM, /* pointer to valid memory (stack, packet, map value) */ + ARG_PTR_TO_MEM_OR_NULL, /* pointer to valid memory or NULL */ ARG_PTR_TO_UNINIT_MEM, /* pointer to memory does not need to be initialized, * helper function must fill all bytes or clear * them in error case. diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index dd54d20ace2f..308b0638ec5d 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1384,13 +1384,15 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno, if (type != expected_type) goto err_type; } else if (arg_type == ARG_PTR_TO_MEM || + arg_type == ARG_PTR_TO_MEM_OR_NULL || arg_type == ARG_PTR_TO_UNINIT_MEM) { expected_type = PTR_TO_STACK; /* One exception here. In case function allows for NULL to be * passed in as argument, it's a SCALAR_VALUE type. Final test * happens during stack boundary checking. */ - if (register_is_null(*reg)) + if (register_is_null(*reg) && + arg_type == ARG_PTR_TO_MEM_OR_NULL) /* final test in check_stack_boundary() */; else if (!type_is_pkt_pointer(type) && type != PTR_TO_MAP_VALUE && diff --git a/net/core/filter.c b/net/core/filter.c index 1afa17935954..6a85e67fafce 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1646,9 +1646,9 @@ static const struct bpf_func_proto bpf_csum_diff_proto = { .gpl_only = false, .pkt_access = true, .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_MEM, + .arg1_type = ARG_PTR_TO_MEM_OR_NULL, .arg2_type = ARG_CONST_SIZE_OR_ZERO, - .arg3_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_PTR_TO_MEM_OR_NULL, .arg4_type = ARG_CONST_SIZE_OR_ZERO, .arg5_type = ARG_ANYTHING, }; diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 2a5267bef160..3c64f30cf63c 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -5631,7 +5631,7 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_TRACEPOINT, }, { - "helper access to variable memory: size = 0 allowed on NULL", + "helper access to variable memory: size = 0 allowed on NULL (ARG_PTR_TO_MEM_OR_NULL)", .insns = { BPF_MOV64_IMM(BPF_REG_1, 0), BPF_MOV64_IMM(BPF_REG_2, 0), @@ -5645,7 +5645,7 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { - "helper access to variable memory: size > 0 not allowed on NULL", + "helper access to variable memory: size > 0 not allowed on NULL (ARG_PTR_TO_MEM_OR_NULL)", .insns = {
[PATCH net 0/4] bpf: fix semantics issues with helpers receiving NULL arguments
This set includes some fixes in semantics and usability issues that emerged recently, and would be good to have them in net before the next release. In particular, ARG_CONST_SIZE_OR_ZERO semantics was recently changed in commit 9fd29c08e520 ("bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics") with the goal of letting the compiler generate simpler code that the verifier can more easily accept. To handle this change in semantics, a few checks in some helpers were added, like in commit 9c019e2bc4b2 ("bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO"), and those checks are less than ideal because once they make it into a released kernel bpf programs can start relying on them, preventing the possibility of being removed later on. This patch tries to fix the issue by introducing a new argument type ARG_PTR_TO_MEM_OR_NULL that can be used for helpers that can receive a tuple. By doing so, we can fix the semantics of the other helpers that don't need and can just handle , allowing the code to get rid of those checks. Gianluca Borello (4): bpf: introduce ARG_PTR_TO_MEM_OR_NULL bpf: remove explicit handling of 0 for arg2 in bpf_probe_read bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO include/linux/bpf.h | 1 + kernel/bpf/verifier.c | 4 +- kernel/trace/bpf_trace.c| 12 +-- net/core/filter.c | 4 +- tools/testing/selftests/bpf/test_verifier.c | 113 ++-- 5 files changed, 116 insertions(+), 18 deletions(-) -- 2.14.1
[PATCH net-next] bpf: remove tail_call and get_stackid helper declarations from bpf.h
commit afdb09c720b6 ("security: bpf: Add LSM hooks for bpf object related syscall") included linux/bpf.h in linux/security.h. As a result, bpf programs including bpf_helpers.h and some other header that ends up pulling in also security.h, such as several examples under samples/bpf, fail to compile because bpf_tail_call and bpf_get_stackid are now "redefined as different kind of symbol". >From bpf.h: u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5); u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); Whereas in bpf_helpers.h they are: static void (*bpf_tail_call)(void *ctx, void *map, int index); static int (*bpf_get_stackid)(void *ctx, void *map, int flags); Fix this by removing the unused declaration of bpf_tail_call and moving the declaration of bpf_get_stackid in bpf_trace.c, which is the only place where it's needed. Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov --- include/linux/bpf.h | 3 --- kernel/trace/bpf_trace.c | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 172be7faf7ba..520aeebe0d93 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -231,9 +231,6 @@ struct bpf_event_entry { struct rcu_head rcu; }; -u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5); -u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); - bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); int bpf_prog_calc_tag(struct bpf_prog *fp); diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b65011d320e3..136aa6bb0422 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -15,6 +15,8 @@ #include #include "trace.h" +u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); + /** * trace_call_bpf - invoke BPF program * @call: tracepoint event -- 2.15.0.rc2
[PATCH net-next] bpf: add bpf_probe_read_str helper
Provide a simple helper with the same semantics of strncpy_from_unsafe(): int bpf_probe_read_str(void *dst, int size, const void *unsafe_addr) This gives more flexibility to a bpf program. A typical use case is intercepting a file name during sys_open(). The current approach is: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 bpf_probe_read(buf, sizeof(buf), ctx->di); /* consume buf */ } This is suboptimal because the size of the string needs to be estimated at compile time, causing more memory to be copied than often necessary, and can become more problematic if further processing on buf is done, for example by pushing it to userspace via bpf_perf_event_output(), since the real length of the string is unknown and the entire buffer must be copied (and defining an unrolled strnlen() inside the bpf program is a very inefficient and unfeasible approach). With the new helper, the code can easily operate on the actual string length rather than the buffer size: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di); /* consume buf, for example push it to userspace via * bpf_perf_event_output(), but this time we can use * res (the string length) as event size, after checking * its boundaries. */ } Another useful use case is when parsing individual process arguments or individual environment variables navigating current->mm->arg_start and current->mm->env_start: using this helper and the return value, one can quickly iterate at the right offset of the memory area. The code changes simply leverage the already existent strncpy_from_unsafe() kernel function, which is safe to be called from a bpf program as it is used in bpf_trace_printk(). Signed-off-by: Gianluca Borello Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 15 ++- kernel/trace/bpf_trace.c | 32 2 files changed, 46 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 0eb0e87dbe9f..54a5894bb4ea 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -430,6 +430,18 @@ union bpf_attr { * @xdp_md: pointer to xdp_md * @delta: An positive/negative integer to be added to xdp_md.data * Return: 0 on success or negative on error + * + * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) + * Copy a NUL terminated string from unsafe address. In case the string + * length is smaller than size, the target is not padded with further NUL + * bytes. In case the string length is larger than size, just count-1 + * bytes are copied and the last byte is set to NUL. + * @dst: destination address + * @size: maximum number of bytes to copy, including the trailing NUL + * @unsafe_ptr: unsafe address + * Return: + * > 0 length of the string including the trailing NUL on success + * < 0 error */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -476,7 +488,8 @@ union bpf_attr { FN(set_hash_invalid), \ FN(get_numa_node_id), \ FN(skb_change_head),\ - FN(xdp_adjust_head), + FN(xdp_adjust_head),\ + FN(probe_read_str), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index c22a961d1a42..424daa4586d1 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -395,6 +395,36 @@ static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = { .arg2_type = ARG_ANYTHING, }; +BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size, + const void *, unsafe_ptr) +{ + int ret; + + /* +* The strncpy_from_unsafe() call will likely not fill the entire +* buffer, but that's okay in this circumstance as we're probing +* arbitrary memory anyway similar to bpf_probe_read() and might +* as well probe the stack. Thus, memory is explicitly cleared +* only in error case, so that improper users ignoring return +* code altogether don't copy garbage; otherwise length of string +* is returned that can be used for bpf_perf_event_output() et al. +*/ + ret = strncpy_from_unsafe(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + + return ret; +} + +static const struct bpf_func_proto bpf_probe_read_str_proto = { + .func = bpf_probe_read_str, + .gpl_only = true, + .ret_t