Re: [PATCH bpf] selftests/bpf: add missing pointer dereference for map stacktrace fixup
On 12/7/2018 1:14 PM, Stanislav Fomichev wrote: I get a segfault without it, other fixups always do dereference, and without dereference I don't understand how it can ever work. Fixes: 7c85c448e7d74 ("selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog") Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/test_verifier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index df6f751cc1e8..d23929a1985d 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -14166,7 +14166,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type, do { prog[*fixup_map_stacktrace].imm = map_fds[12]; fixup_map_stacktrace++; - } while (fixup_map_stacktrace); + } while (*fixup_map_stacktrace); } } It was my mistake. Thanks for the fix! -Prashant
[PATCH v2 net] tun: remove skb access after netif_receive_skb
In tun.c skb->len was accessed while doing stats accounting after a call to netif_receive_skb. We can not access skb after this call because buffers may be dropped. The fix for this bug would be to store skb->len in local variable and then use it after netif_receive_skb(). IMO using xdp data size for accounting bytes will be better because input for tun_xdp_one() is xdp_buff. Hence this patch: - fixes a bug by removing skb access after netif_receive_skb() - uses xdp data size for accounting bytes [613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun] [613.021062] Read of size 4 at addr 8881da9ab7c0 by task vhost-1115/1155 [613.023073] [613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232 [613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [613.029116] Call Trace: [613.031145] dump_stack+0x5b/0x90 [613.032219] print_address_description+0x6c/0x23c [613.034156] ? tun_sendmsg+0x77c/0xc50 [tun] [613.036141] kasan_report.cold.5+0x241/0x308 [613.038125] tun_sendmsg+0x77c/0xc50 [tun] [613.040109] ? tun_get_user+0x1960/0x1960 [tun] [613.042094] ? __isolate_free_page+0x270/0x270 [613.045173] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.047127] ? peek_head_len.part.13+0x90/0x90 [vhost_net] [613.049096] ? get_tx_bufs+0x5a/0x2c0 [vhost_net] [613.051106] ? vhost_enable_notify+0x2d8/0x420 [vhost] [613.053139] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.053139] ? vhost_net_buf_peek+0x340/0x340 [vhost_net] [613.053139] ? __mutex_lock+0x8d9/0xb30 [613.053139] ? finish_task_switch+0x8f/0x3f0 [613.053139] ? handle_tx+0x32/0x120 [vhost_net] [613.053139] ? mutex_trylock+0x110/0x110 [613.053139] ? finish_task_switch+0xcf/0x3f0 [613.053139] ? finish_task_switch+0x240/0x3f0 [613.053139] ? __switch_to_asm+0x34/0x70 [613.053139] ? __switch_to_asm+0x40/0x70 [613.053139] ? __schedule+0x506/0xf10 [613.053139] handle_tx+0xc7/0x120 [vhost_net] [613.053139] vhost_worker+0x166/0x200 [vhost] [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] ? __kthread_parkme+0x77/0x90 [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] kthread+0x1b1/0x1d0 [613.053139] ? kthread_park+0xb0/0xb0 [613.053139] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Allocated by task 1155: [613.088705] kasan_kmalloc+0xbf/0xe0 [613.088705] kmem_cache_alloc+0xdc/0x220 [613.088705] __build_skb+0x2a/0x160 [613.088705] build_skb+0x14/0xc0 [613.088705] tun_sendmsg+0x4f0/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Freed by task 1155: [613.088705] __kasan_slab_free+0x12e/0x180 [613.088705] kmem_cache_free+0xa0/0x230 [613.088705] ip6_mc_input+0x40f/0x5a0 [613.088705] ipv6_rcv+0xc9/0x1e0 [613.088705] __netif_receive_skb_one_core+0xc1/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] br_pass_frame_up+0x2b9/0x2e0 [613.088705] br_handle_frame_finish+0x2fb/0x7a0 [613.088705] br_handle_frame+0x30f/0x6c0 [613.088705] __netif_receive_skb_core+0x61a/0x15b0 [613.088705] __netif_receive_skb_one_core+0x8e/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] tun_sendmsg+0x738/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] The buggy address belongs to the object at 8881da9ab740 [613.088705] which belongs to the cache skbuff_head_cache of size 232 Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reviewed-by: Toshiaki Makita Signed-off-by: Prashant Bhole Acked-by: Jason Wang --- v1 -> v2: No change. Reposted due to patchwork status. drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index e244f5d7512a..6e388792c0a8 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2385,6 +2385,7 @@ static int tun_xdp_one(struct tun_struct *tun, struct tun_file *tfile, struct xdp_buff *xdp, int *flush) { + unsigned int datasize = xdp->data_end - xdp->data; struct tun_xdp_hdr *hdr = xdp->data_hard_start; struct virtio_net_hdr *gso = >gso; struct tun_pcpu_stats *stats; @@ -2461,7 +2462,7 @@ static int tun_xdp_one(struct tun_struct *tun, stats = get_cpu_ptr(tun->pcpu_stats); u64_stats_update_begin(>syncp); stats->rx_packets++; - stats->rx_bytes += skb->len; + stats->rx_bytes
[PATCH net] tun: remove skb access after netif_receive_skb
In tun.c skb->len was accessed while doing stats accounting after a call to netif_receive_skb. We can not access skb after this call because buffers may be dropped. The fix for this bug would be to store skb->len in local variable and then use it after netif_receive_skb(). IMO using xdp data size for accounting bytes will be better because input for tun_xdp_one() is xdp_buff. Hence this patch: - fixes a bug by removing skb access after netif_receive_skb() - uses xdp data size for accounting bytes [613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun] [613.021062] Read of size 4 at addr 8881da9ab7c0 by task vhost-1115/1155 [613.023073] [613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232 [613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [613.029116] Call Trace: [613.031145] dump_stack+0x5b/0x90 [613.032219] print_address_description+0x6c/0x23c [613.034156] ? tun_sendmsg+0x77c/0xc50 [tun] [613.036141] kasan_report.cold.5+0x241/0x308 [613.038125] tun_sendmsg+0x77c/0xc50 [tun] [613.040109] ? tun_get_user+0x1960/0x1960 [tun] [613.042094] ? __isolate_free_page+0x270/0x270 [613.045173] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.047127] ? peek_head_len.part.13+0x90/0x90 [vhost_net] [613.049096] ? get_tx_bufs+0x5a/0x2c0 [vhost_net] [613.051106] ? vhost_enable_notify+0x2d8/0x420 [vhost] [613.053139] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.053139] ? vhost_net_buf_peek+0x340/0x340 [vhost_net] [613.053139] ? __mutex_lock+0x8d9/0xb30 [613.053139] ? finish_task_switch+0x8f/0x3f0 [613.053139] ? handle_tx+0x32/0x120 [vhost_net] [613.053139] ? mutex_trylock+0x110/0x110 [613.053139] ? finish_task_switch+0xcf/0x3f0 [613.053139] ? finish_task_switch+0x240/0x3f0 [613.053139] ? __switch_to_asm+0x34/0x70 [613.053139] ? __switch_to_asm+0x40/0x70 [613.053139] ? __schedule+0x506/0xf10 [613.053139] handle_tx+0xc7/0x120 [vhost_net] [613.053139] vhost_worker+0x166/0x200 [vhost] [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] ? __kthread_parkme+0x77/0x90 [613.053139] ? vhost_dev_init+0x580/0x580 [vhost] [613.053139] kthread+0x1b1/0x1d0 [613.053139] ? kthread_park+0xb0/0xb0 [613.053139] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Allocated by task 1155: [613.088705] kasan_kmalloc+0xbf/0xe0 [613.088705] kmem_cache_alloc+0xdc/0x220 [613.088705] __build_skb+0x2a/0x160 [613.088705] build_skb+0x14/0xc0 [613.088705] tun_sendmsg+0x4f0/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] Freed by task 1155: [613.088705] __kasan_slab_free+0x12e/0x180 [613.088705] kmem_cache_free+0xa0/0x230 [613.088705] ip6_mc_input+0x40f/0x5a0 [613.088705] ipv6_rcv+0xc9/0x1e0 [613.088705] __netif_receive_skb_one_core+0xc1/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] br_pass_frame_up+0x2b9/0x2e0 [613.088705] br_handle_frame_finish+0x2fb/0x7a0 [613.088705] br_handle_frame+0x30f/0x6c0 [613.088705] __netif_receive_skb_core+0x61a/0x15b0 [613.088705] __netif_receive_skb_one_core+0x8e/0x100 [613.088705] netif_receive_skb_internal+0xc4/0x270 [613.088705] tun_sendmsg+0x738/0xc50 [tun] [613.088705] vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net] [613.088705] handle_tx_copy+0x2d0/0x8f0 [vhost_net] [613.088705] handle_tx+0xc7/0x120 [vhost_net] [613.088705] vhost_worker+0x166/0x200 [vhost] [613.088705] kthread+0x1b1/0x1d0 [613.088705] ret_from_fork+0x35/0x40 [613.088705] [613.088705] The buggy address belongs to the object at 8881da9ab740 [613.088705] which belongs to the cache skbuff_head_cache of size 232 Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Reviewed-by: Toshiaki Makita Signed-off-by: Prashant Bhole --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index e244f5d7512a..6e388792c0a8 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2385,6 +2385,7 @@ static int tun_xdp_one(struct tun_struct *tun, struct tun_file *tfile, struct xdp_buff *xdp, int *flush) { + unsigned int datasize = xdp->data_end - xdp->data; struct tun_xdp_hdr *hdr = xdp->data_hard_start; struct virtio_net_hdr *gso = >gso; struct tun_pcpu_stats *stats; @@ -2461,7 +2462,7 @@ static int tun_xdp_one(struct tun_struct *tun, stats = get_cpu_ptr(tun->pcpu_stats); u64_stats_update_begin(>syncp); stats->rx_packets++; - stats->rx_bytes += skb->len; + stats->rx_bytes += datasize; u64_stats_update_end(>syncp); put_cpu_ptr(stats); -- 2.17.2
Re: [PATCH bpf 2/2] bpf: test_verifier, test for lookup on queue/stack maps
On 11/28/2018 10:06 PM, Mauricio Vasquez wrote: On 11/28/18 3:45 AM, Daniel Borkmann wrote: On 11/28/2018 08:51 AM, Prashant Bhole wrote: This patch adds tests to check whether bpf verifier prevents lookup on queue/stack maps Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_verifier.c | 52 + 1 file changed, 52 insertions(+) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 550b7e46bf4a..becd9f4f3980 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -74,6 +74,8 @@ struct bpf_test { int fixup_map_in_map[MAX_FIXUPS]; int fixup_cgroup_storage[MAX_FIXUPS]; int fixup_percpu_cgroup_storage[MAX_FIXUPS]; + int fixup_map_queue[MAX_FIXUPS]; + int fixup_map_stack[MAX_FIXUPS]; const char *errstr; const char *errstr_unpriv; uint32_t retval, retval_unpriv; @@ -4611,6 +4613,38 @@ static struct bpf_test tests[] = { .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem", .prog_type = BPF_PROG_TYPE_PERF_EVENT, }, + { + "prevent map lookup in queue map", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_queue = { 3 }, + .result = REJECT, + .errstr = "invalid stack type R2 off=-8 access_size=0", + .prog_type = BPF_PROG_TYPE_XDP, Hmm, the approach in patch 1 is very fragile, and we're lucky in this case that the verifier bailed out with 'invalid stack type R2 off=-8 access_size=0' because of key size being zero. If this would have not been the case then the added ERR_PTR(-EOPNOTSUPP) would basically be seen as a valid pointer and the program could read/write into it. Instead, this needs to be prevented much earlier like check_map_func_compatibility(), Actually it is prevented in check_map_func_compatibility(), but stack boundary check is done before in the verifier. and I would like to have a split on these approaches to make verifier more robust. While you want ERR_PTR(-EOPNOTSUPP) for user space syscall side, In the case of QUEUE and STACK maps this is not relevant because the lookup syscall is mapped into peek operation. In fact queue_stack_map_lookup_elem() & queue_stack_map_update_elem() should be never called, I think we can remove them safely. Got it. Shall we keep these verifier tests (patch 2)? Thanks, Prashant
[PATCH bpf 1/2] bpf: queue/stack map, fix return value of map_lookup_elem
Since commit 509db2833e0d ("bpf: error handling when map_lookup_elem isn't supported") when map lookup isn't supported, the map_lookup_elem function should return ERR_PTR(-EOPNOTSUPP). Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Signed-off-by: Prashant Bhole --- kernel/bpf/queue_stack_maps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c index b384ea9f3254..8fd19b06da66 100644 --- a/kernel/bpf/queue_stack_maps.c +++ b/kernel/bpf/queue_stack_maps.c @@ -240,7 +240,7 @@ static int queue_stack_map_push_elem(struct bpf_map *map, void *value, /* Called from syscall or from eBPF program */ static void *queue_stack_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* Called from syscall or from eBPF program */ -- 2.17.2
[PATCH bpf 0/2] fix map_lookup_elem return value for queue/stack map
This set fixes map_lookup_elem return value for queue/stack map. Also adds verifier tests to check whether verifier prevents lookup on these maps. Note that patch 2 isn't dependant on patch 1. The verifier prevents lookup on queue/stack map because key size is zero. Prashant Bhole (2): bpf: queue/stack map, fix return value of map_lookup_elem bpf: test_verifier, test for lookup on queue/stack maps kernel/bpf/queue_stack_maps.c | 2 +- tools/testing/selftests/bpf/test_verifier.c | 52 + 2 files changed, 53 insertions(+), 1 deletion(-) -- 2.17.2
[PATCH bpf 2/2] bpf: test_verifier, test for lookup on queue/stack maps
This patch adds tests to check whether bpf verifier prevents lookup on queue/stack maps Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_verifier.c | 52 + 1 file changed, 52 insertions(+) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 550b7e46bf4a..becd9f4f3980 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -74,6 +74,8 @@ struct bpf_test { int fixup_map_in_map[MAX_FIXUPS]; int fixup_cgroup_storage[MAX_FIXUPS]; int fixup_percpu_cgroup_storage[MAX_FIXUPS]; + int fixup_map_queue[MAX_FIXUPS]; + int fixup_map_stack[MAX_FIXUPS]; const char *errstr; const char *errstr_unpriv; uint32_t retval, retval_unpriv; @@ -4611,6 +4613,38 @@ static struct bpf_test tests[] = { .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem", .prog_type = BPF_PROG_TYPE_PERF_EVENT, }, + { + "prevent map lookup in queue map", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_queue = { 3 }, + .result = REJECT, + .errstr = "invalid stack type R2 off=-8 access_size=0", + .prog_type = BPF_PROG_TYPE_XDP, + }, + { + "prevent map lookup in stack map", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_stack = { 3 }, + .result = REJECT, + .errstr = "invalid stack type R2 off=-8 access_size=0", + .prog_type = BPF_PROG_TYPE_XDP, + }, { "prevent map lookup in prog array", .insns = { @@ -14048,6 +14082,8 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type, int *fixup_map_sockhash = test->fixup_map_sockhash; int *fixup_map_xskmap = test->fixup_map_xskmap; int *fixup_map_stacktrace = test->fixup_map_stacktrace; + int *fixup_map_queue = test->fixup_map_queue; + int *fixup_map_stack = test->fixup_map_stack; int *fixup_prog1 = test->fixup_prog1; int *fixup_prog2 = test->fixup_prog2; int *fixup_map_in_map = test->fixup_map_in_map; @@ -14168,6 +14204,22 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type, fixup_map_stacktrace++; } while (fixup_map_stacktrace); } + if (*fixup_map_queue) { + map_fds[13] = create_map(BPF_MAP_TYPE_QUEUE, 0, +sizeof(u32), 1); + do { + prog[*fixup_map_queue].imm = map_fds[13]; + fixup_map_queue++; + } while (*fixup_map_queue); + } + if (*fixup_map_stack) { + map_fds[14] = create_map(BPF_MAP_TYPE_STACK, 0, +sizeof(u32), 1); + do { + prog[*fixup_map_stack].imm = map_fds[14]; + fixup_map_stack++; + } while (*fixup_map_stack); + } } static int set_admin(bool admin) -- 2.17.2
Re: [PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog
On 10/25/2018 5:54 PM, Naresh Kamboju wrote: On Tue, 9 Oct 2018 at 12:32, Song Liu wrote: On Mon, Oct 8, 2018 at 6:07 PM Prashant Bhole wrote: map_lookup_elem isn't supported by certain map types like: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Let's add verfier tests to check whether verifier prevents bpf_map_lookup_elem call on above programs from bpf program. Signed-off-by: Prashant Bhole Acked-by: Alexei Starovoitov Acked-by: Song Liu --- tools/testing/selftests/bpf/test_verifier.c | 121 +++- 1 file changed, 120 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 65ae44c85d27..cf4cd32b6772 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -48,7 +48,7 @@ #define MAX_INSNS BPF_MAXINSNS #define MAX_FIXUPS 8 -#define MAX_NR_MAPS8 +#define MAX_NR_MAPS13 #define POINTER_VALUE 0xcafe4all #define TEST_DATA_LEN 64 @@ -65,6 +65,10 @@ struct bpf_test { int fixup_map_hash_48b[MAX_FIXUPS]; int fixup_map_hash_16b[MAX_FIXUPS]; int fixup_map_array_48b[MAX_FIXUPS]; + int fixup_map_sockmap[MAX_FIXUPS]; + int fixup_map_sockhash[MAX_FIXUPS]; + int fixup_map_xskmap[MAX_FIXUPS]; + int fixup_map_stacktrace[MAX_FIXUPS]; int fixup_prog1[MAX_FIXUPS]; int fixup_prog2[MAX_FIXUPS]; int fixup_map_in_map[MAX_FIXUPS]; @@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = { .errstr = "invalid access to packet", .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, + { + "prevent map lookup in sockmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_sockmap = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 15 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in sockhash", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_sockhash = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 18 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in xskmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_xskmap = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 17 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_XDP, + }, + { + "prevent map lookup in stack trace", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_stacktrace = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_PERF_EVENT, + }, + { + "prevent map lookup in prog array", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), +
[PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog
map_lookup_elem isn't supported by certain map types like: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Let's add verfier tests to check whether verifier prevents bpf_map_lookup_elem call on above programs from bpf program. Signed-off-by: Prashant Bhole Acked-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_verifier.c | 121 +++- 1 file changed, 120 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 65ae44c85d27..cf4cd32b6772 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -48,7 +48,7 @@ #define MAX_INSNS BPF_MAXINSNS #define MAX_FIXUPS 8 -#define MAX_NR_MAPS8 +#define MAX_NR_MAPS13 #define POINTER_VALUE 0xcafe4all #define TEST_DATA_LEN 64 @@ -65,6 +65,10 @@ struct bpf_test { int fixup_map_hash_48b[MAX_FIXUPS]; int fixup_map_hash_16b[MAX_FIXUPS]; int fixup_map_array_48b[MAX_FIXUPS]; + int fixup_map_sockmap[MAX_FIXUPS]; + int fixup_map_sockhash[MAX_FIXUPS]; + int fixup_map_xskmap[MAX_FIXUPS]; + int fixup_map_stacktrace[MAX_FIXUPS]; int fixup_prog1[MAX_FIXUPS]; int fixup_prog2[MAX_FIXUPS]; int fixup_map_in_map[MAX_FIXUPS]; @@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = { .errstr = "invalid access to packet", .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, + { + "prevent map lookup in sockmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_sockmap = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 15 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in sockhash", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_sockhash = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 18 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in xskmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_xskmap = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 17 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_XDP, + }, + { + "prevent map lookup in stack trace", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map_stacktrace = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_PERF_EVENT, + }, + { + "prevent map lookup in prog array", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), +
[PATCH bpf-next 4/6] tools/bpf: bpftool, print strerror when map lookup error occurs
Since map lookup error can be ENOENT or EOPNOTSUPP, let's print strerror() as error message in normal and JSON output. This patch adds helper function print_entry_error() to print entry from lookup error occurs Example: Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "Operation not supported" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "Operation not supported" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: Operation not supported key: 0b 00 00 00 value: Operation not supported Found 0 elements Signed-off-by: Prashant Bhole Acked-by: Jakub Kicinski Acked-by: Alexei Starovoitov --- tools/bpf/bpftool/map.c | 29 - 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 28d365435fea..9f5de48f8a99 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -336,6 +336,25 @@ static void print_entry_json(struct bpf_map_info *info, unsigned char *key, jsonw_end_object(json_wtr); } +static void print_entry_error(struct bpf_map_info *info, unsigned char *key, + const char *value) +{ + int value_size = strlen(value); + bool single_line, break_names; + + break_names = info->key_size > 16 || value_size > 16; + single_line = info->key_size + value_size <= 24 && !break_names; + + printf("key:%c", break_names ? '\n' : ' '); + fprint_hex(stdout, key, info->key_size, " "); + + printf(single_line ? " " : "\n"); + + printf("value:%c%s", break_names ? '\n' : ' ', value); + + printf("\n"); +} + static void print_entry_plain(struct bpf_map_info *info, unsigned char *key, unsigned char *value) { @@ -663,6 +682,7 @@ static int dump_map_elem(int fd, void *key, void *value, json_writer_t *btf_wtr) { int num_elems = 0; + int lookup_errno; if (!bpf_map_lookup_elem(fd, key, value)) { if (json_output) { @@ -685,6 +705,8 @@ static int dump_map_elem(int fd, void *key, void *value, } /* lookup error handling */ + lookup_errno = errno; + if (map_is_map_of_maps(map_info->type) || map_is_map_of_progs(map_info->type)) return 0; @@ -694,13 +716,10 @@ static int dump_map_elem(int fd, void *key, void *value, print_hex_data_json(key, map_info->key_size); jsonw_name(json_wtr, "value"); jsonw_start_object(json_wtr); - jsonw_string_field(json_wtr, "error", - "can't lookup element"); + jsonw_string_field(json_wtr, "error", strerror(lookup_errno)); jsonw_end_object(json_wtr); } else { - p_info("can't lookup element with key: "); - fprint_hex(stderr, key, map_info->key_size, " "); - fprintf(stderr, "\n"); + print_entry_error(map_info, key, strerror(lookup_errno)); } return 0; -- 2.17.1
[PATCH bpf-next 3/6] tools/bpf: bpftool, split the function do_dump()
do_dump() function in bpftool/map.c has deep indentations. In order to reduce deep indent, let's move element printing code out of do_dump() into dump_map_elem() function. Signed-off-by: Prashant Bhole Acked-by: Jakub Kicinski Acked-by: Alexei Starovoitov --- tools/bpf/bpftool/map.c | 83 - 1 file changed, 49 insertions(+), 34 deletions(-) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 6003e9598973..28d365435fea 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -658,6 +658,54 @@ static int do_show(int argc, char **argv) return errno == ENOENT ? 0 : -1; } +static int dump_map_elem(int fd, void *key, void *value, +struct bpf_map_info *map_info, struct btf *btf, +json_writer_t *btf_wtr) +{ + int num_elems = 0; + + if (!bpf_map_lookup_elem(fd, key, value)) { + if (json_output) { + print_entry_json(map_info, key, value, btf); + } else { + if (btf) { + struct btf_dumper d = { + .btf = btf, + .jw = btf_wtr, + .is_plain_text = true, + }; + + do_dump_btf(, map_info, key, value); + } else { + print_entry_plain(map_info, key, value); + } + num_elems++; + } + return num_elems; + } + + /* lookup error handling */ + if (map_is_map_of_maps(map_info->type) || + map_is_map_of_progs(map_info->type)) + return 0; + + if (json_output) { + jsonw_name(json_wtr, "key"); + print_hex_data_json(key, map_info->key_size); + jsonw_name(json_wtr, "value"); + jsonw_start_object(json_wtr); + jsonw_string_field(json_wtr, "error", + "can't lookup element"); + jsonw_end_object(json_wtr); + } else { + p_info("can't lookup element with key: "); + fprint_hex(stderr, key, map_info->key_size, " "); + fprintf(stderr, "\n"); + } + + return 0; +} + static int do_dump(int argc, char **argv) { struct bpf_map_info info = {}; @@ -713,40 +761,7 @@ static int do_dump(int argc, char **argv) err = 0; break; } - - if (!bpf_map_lookup_elem(fd, key, value)) { - if (json_output) - print_entry_json(, key, value, btf); - else - if (btf) { - struct btf_dumper d = { - .btf = btf, - .jw = btf_wtr, - .is_plain_text = true, - }; - - do_dump_btf(, , key, value); - } else { - print_entry_plain(, key, value); - } - num_elems++; - } else if (!map_is_map_of_maps(info.type) && - !map_is_map_of_progs(info.type)) { - if (json_output) { - jsonw_name(json_wtr, "key"); - print_hex_data_json(key, info.key_size); - jsonw_name(json_wtr, "value"); - jsonw_start_object(json_wtr); - jsonw_string_field(json_wtr, "error", - "can't lookup element"); - jsonw_end_object(json_wtr); - } else { - p_info("can't lookup element with key: "); - fprint_hex(stderr, key, info.key_size, " "); - fprintf(stderr, "\n"); - } - } - + num_elems += dump_map_elem(fd, key, value, , btf, btf_wtr); prev_key = key; } -- 2.17.1
[PATCH bpf-next 2/6] bpf: return EOPNOTSUPP when map lookup isn't supported
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole Acked-by: Alexei Starovoitov --- kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/xskmap.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index dded84cbe814..24583da9ffd1 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map) static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* only called from syscall */ diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index d37a1a0a6e1e..5d0677d808ae 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -2096,7 +2096,7 @@ int sockmap_get_from_fd(const union bpf_attr *attr, int type, static void *sock_map_lookup(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } static int sock_map_update_elem(struct bpf_map *map, diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 8061a439ef18..b2ade10f7ec3 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -505,7 +505,7 @@ const struct bpf_func_proto bpf_get_stack_proto = { /* Called from eBPF program */ static void *stack_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* Called from syscall */ diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c index 9f8463afda9c..ef0b7b6ef8a5 100644 --- a/kernel/bpf/xskmap.c +++ b/kernel/bpf/xskmap.c @@ -154,7 +154,7 @@ void __xsk_map_flush(struct bpf_map *map) static void *xsk_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, -- 2.17.1
[PATCH bpf-next 5/6] selftests/bpf: test_verifier, change names of fixup maps
Currently fixup map are named like fixup_map1, fixup_map2, and so on. As suggested by Alexei let's change change map names such that we can identify map type by looking at the name. This patch is basically a find and replace change: fixup_map1 -> fixup_map_hash_8b fixup_map2 -> fixup_map_hash_48b fixup_map3 -> fixup_map_hash_16b fixup_map4 -> fixup_map_array_48b Suggested-by: Alexei Starovoitov Signed-off-by: Prashant Bhole Acked-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_verifier.c | 380 ++-- 1 file changed, 190 insertions(+), 190 deletions(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index bc9cd8537467..65ae44c85d27 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -61,10 +61,10 @@ static bool unpriv_disabled = false; struct bpf_test { const char *descr; struct bpf_insn insns[MAX_INSNS]; - int fixup_map1[MAX_FIXUPS]; - int fixup_map2[MAX_FIXUPS]; - int fixup_map3[MAX_FIXUPS]; - int fixup_map4[MAX_FIXUPS]; + int fixup_map_hash_8b[MAX_FIXUPS]; + int fixup_map_hash_48b[MAX_FIXUPS]; + int fixup_map_hash_16b[MAX_FIXUPS]; + int fixup_map_array_48b[MAX_FIXUPS]; int fixup_prog1[MAX_FIXUPS]; int fixup_prog2[MAX_FIXUPS]; int fixup_map_in_map[MAX_FIXUPS]; @@ -876,7 +876,7 @@ static struct bpf_test tests[] = { BPF_FUNC_map_lookup_elem), BPF_EXIT_INSN(), }, - .fixup_map1 = { 2 }, + .fixup_map_hash_8b = { 2 }, .errstr = "invalid indirect read from stack", .result = REJECT, }, @@ -1110,7 +1110,7 @@ static struct bpf_test tests[] = { BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), BPF_EXIT_INSN(), }, - .fixup_map1 = { 3 }, + .fixup_map_hash_8b = { 3 }, .errstr = "R0 invalid mem access 'map_value_or_null'", .result = REJECT, }, @@ -1127,7 +1127,7 @@ static struct bpf_test tests[] = { BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_EXIT_INSN(), }, - .fixup_map1 = { 3 }, + .fixup_map_hash_8b = { 3 }, .errstr = "misaligned value access", .result = REJECT, .flags = F_LOAD_WITH_STRICT_ALIGNMENT, @@ -1147,7 +1147,7 @@ static struct bpf_test tests[] = { BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), }, - .fixup_map1 = { 3 }, + .fixup_map_hash_8b = { 3 }, .errstr = "R0 invalid mem access", .errstr_unpriv = "R0 leaks addr", .result = REJECT, @@ -1237,7 +1237,7 @@ static struct bpf_test tests[] = { BPF_FUNC_map_delete_elem), BPF_EXIT_INSN(), }, - .fixup_map1 = { 24 }, + .fixup_map_hash_8b = { 24 }, .errstr_unpriv = "R1 pointer comparison", .result_unpriv = REJECT, .result = ACCEPT, @@ -1391,7 +1391,7 @@ static struct bpf_test tests[] = { offsetof(struct __sk_buff, pkt_type)), BPF_EXIT_INSN(), }, - .fixup_map1 = { 4 }, + .fixup_map_hash_8b = { 4 }, .errstr = "different pointers", .errstr_unpriv = "R1 pointer comparison", .result = REJECT, @@ -1414,7 +1414,7 @@ static struct bpf_test tests[] = { BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), BPF_JMP_IMM(BPF_JA, 0, 0, -12), }, - .fixup_map1 = { 6 }, + .fixup_map_hash_8b = { 6 }, .errstr = "different pointers", .errstr_unpriv = "R1 pointer comparison", .result = REJECT, @@ -1438,7 +1438,7 @@ static struct bpf_test tests[] = { BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), BPF_JMP_IMM(BPF_JA, 0, 0, -13), }, - .fixup_map1 = { 7 }, + .fixup_map_hash_8b = { 7 }, .errstr = "different pointers", .errstr_unpriv = "R1 pointer comparison", .result = REJECT, @@ -2575,7 +2575,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .fixup_map1 = { 3 }, + .fixup_map_hash_8b = { 3 },
[PATCH bpf-next 1/6] bpf: error handling when map_lookup_elem isn't supported
The error value returned by map_lookup_elem doesn't differentiate whether lookup was failed because of invalid key or lookup is not supported. Lets add handling for -EOPNOTSUPP return value of map_lookup_elem() method of map, with expectation from map's implementation that it should return -EOPNOTSUPP if lookup is not supported. The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set to EOPNOTSUPP if map lookup is not supported. Signed-off-by: Prashant Bhole Acked-by: Alexei Starovoitov --- kernel/bpf/syscall.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5742df21598c..4f416234251f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -719,10 +719,15 @@ static int map_lookup_elem(union bpf_attr *attr) } else { rcu_read_lock(); ptr = map->ops->map_lookup_elem(map, key); - if (ptr) + if (IS_ERR(ptr)) { + err = PTR_ERR(ptr); + } else if (!ptr) { + err = -ENOENT; + } else { + err = 0; memcpy(value, ptr, value_size); + } rcu_read_unlock(); - err = ptr ? 0 : -ENOENT; } if (err) -- 2.17.1
[PATCH bpf-next 0/6] Error handling when map lookup isn't supported
Currently when map a lookup fails, user space API can not make any distinction whether given key was not found or lookup is not supported by particular map. In this series we modify return value of maps which do not support lookup. Lookup on such map implementation will return -EOPNOTSUPP. bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP errno. We also handle this error in bpftool to print appropriate message. Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall such that errno will set to EOPNOTSUPP when map doesn't support lookup Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP for maps which do not support lookup Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is moved out into new function dump_map_elem(). This was done in order to reduce deep indentation and accomodate further changes. Patch 4: Changes in bpftool to print strerror() message when lookup error is occured. This will result in appropriate message like "Operation not supported" when map doesn't support lookup. Patch 5: test_verifier: change fixup map naming convention as suggested by Alexei Patch 6: Added verifier tests to check whether verifier rejects call to bpf_map_lookup_elem from bpf program. For all map types those do not support map lookup. Prashant Bhole (6): bpf: error handling when map_lookup_elem isn't supported bpf: return EOPNOTSUPP when map lookup isn't supported tools/bpf: bpftool, split the function do_dump() tools/bpf: bpftool, print strerror when map lookup error occurs selftests/bpf: test_verifier, change names of fixup maps selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c| 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/syscall.c| 9 +- kernel/bpf/xskmap.c | 2 +- tools/bpf/bpftool/map.c | 102 ++-- tools/testing/selftests/bpf/test_verifier.c | 501 7 files changed, 389 insertions(+), 231 deletions(-) -- 2.17.1
Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
On 10/9/2018 9:43 AM, Daniel Borkmann wrote: On 10/09/2018 02:02 AM, Prashant Bhole wrote: On 10/6/2018 3:35 AM, Alexei Starovoitov wrote: On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote: Currently when map a lookup fails, user space API can not make any distinction whether given key was not found or lookup is not supported by particular map. In this series we modify return value of maps which do not support lookup. Lookup on such map implementation will return -EOPNOTSUPP. bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP errno. We also handle this error in bpftool to print appropriate message. Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall such that errno will set to EOPNOTSUPP when map doesn't support lookup Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP for maps which do not support lookup Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is moved out into new function dump_map_elem(). This was done in order to reduce deep indentation and accomodate further changes. Patch 4: Changes in bpftool to print strerror() message when lookup error is occured. This will result in appropriate message like "Operation not supported" when map doesn't support lookup. Patch 5: test_verifier: change fixup map naming convention as suggested by Alexei Patch 6: Added verifier tests to check whether verifier rejects call to bpf_map_lookup_elem from bpf program. For all map types those do not support map lookup. for the set: Acked-by: Alexei Starovoitov Thanks. Is there any reason this series did not get posted on netdev-list and can not be seen in the patchwork? Hmm, could you repost to netdev? Perhaps a netdev or patchwork issue that it did not land there. I just double-checked and it's indeed not present. Shall I repost with the same version and Alexei's Acked-by for the series? -Prashant
Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported
On 10/6/2018 3:35 AM, Alexei Starovoitov wrote: On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote: Currently when map a lookup fails, user space API can not make any distinction whether given key was not found or lookup is not supported by particular map. In this series we modify return value of maps which do not support lookup. Lookup on such map implementation will return -EOPNOTSUPP. bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP errno. We also handle this error in bpftool to print appropriate message. Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall such that errno will set to EOPNOTSUPP when map doesn't support lookup Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP for maps which do not support lookup Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is moved out into new function dump_map_elem(). This was done in order to reduce deep indentation and accomodate further changes. Patch 4: Changes in bpftool to print strerror() message when lookup error is occured. This will result in appropriate message like "Operation not supported" when map doesn't support lookup. Patch 5: test_verifier: change fixup map naming convention as suggested by Alexei Patch 6: Added verifier tests to check whether verifier rejects call to bpf_map_lookup_elem from bpf program. For all map types those do not support map lookup. for the set: Acked-by: Alexei Starovoitov Thanks. Is there any reason this series did not get posted on netdev-list and can not be seen in the patchwork?
Re: [RFC v2 bpf-next 5/5] selftests/bpf: verifier, check bpf_map_lookup_elem access in bpf prog
On 10/5/2018 10:51 AM, Alexei Starovoitov wrote: On Tue, Oct 02, 2018 at 02:35:19PM +0900, Prashant Bhole wrote: map_lookup_elem isn't supported by certain map types like: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Let's add verfier tests to check whether verifier prevents bpf_map_lookup_elem call on above programs from bpf program. Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_verifier.c | 121 +++- 1 file changed, 120 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index c7d25f23baf9..afa7e67f66e4 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -47,7 +47,7 @@ #define MAX_INSNS BPF_MAXINSNS #define MAX_FIXUPS8 -#define MAX_NR_MAPS8 +#define MAX_NR_MAPS13 #define POINTER_VALUE 0xcafe4all #define TEST_DATA_LEN 64 @@ -64,6 +64,10 @@ struct bpf_test { int fixup_map2[MAX_FIXUPS]; int fixup_map3[MAX_FIXUPS]; int fixup_map4[MAX_FIXUPS]; + int fixup_map5[MAX_FIXUPS]; + int fixup_map6[MAX_FIXUPS]; + int fixup_map7[MAX_FIXUPS]; + int fixup_map8[MAX_FIXUPS]; int fixup_prog1[MAX_FIXUPS]; int fixup_prog2[MAX_FIXUPS]; int fixup_map_in_map[MAX_FIXUPS]; @@ -4391,6 +4395,85 @@ static struct bpf_test tests[] = { .errstr = "invalid access to packet", .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, + { + "prevent map lookup in sockmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map5 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 15 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in sockhash", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map6 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 18 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in xskmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map7 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 17 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_XDP, + }, + { + "prevent map lookup in stack trace", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map8 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_PERF_EVENT, + }, + { + "prevent map lookup in prog array", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), +
Re: [RFC v2 bpf-next 2/5] bpf: return EOPNOTSUPP when map lookup isn't supported
On 10/5/2018 9:10 AM, Alexei Starovoitov wrote: On Tue, Oct 02, 2018 at 02:35:16PM +0900, Prashant Bhole wrote: Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole --- kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/xskmap.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index dded84cbe814..24583da9ffd1 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map) static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); last time we discussed that the verifier should make sure that lookup is not called from bpf program for these map types. I'd like to see test cases in test_verifier.c for these map types to make sure we don't introduce crashes. Hi Alexei, Patch 05/05 adds such tests in test_verifier.c. Please review those changes. Thank you. -Prashant
[RFC v2 bpf-next 2/5] bpf: return EOPNOTSUPP when map lookup isn't supported
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole --- kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/xskmap.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index dded84cbe814..24583da9ffd1 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map) static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* only called from syscall */ diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index d37a1a0a6e1e..5d0677d808ae 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -2096,7 +2096,7 @@ int sockmap_get_from_fd(const union bpf_attr *attr, int type, static void *sock_map_lookup(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } static int sock_map_update_elem(struct bpf_map *map, diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 8061a439ef18..b2ade10f7ec3 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -505,7 +505,7 @@ const struct bpf_func_proto bpf_get_stack_proto = { /* Called from eBPF program */ static void *stack_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* Called from syscall */ diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c index 9f8463afda9c..ef0b7b6ef8a5 100644 --- a/kernel/bpf/xskmap.c +++ b/kernel/bpf/xskmap.c @@ -154,7 +154,7 @@ void __xsk_map_flush(struct bpf_map *map) static void *xsk_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, -- 2.17.1
[RFC v2 bpf-next 5/5] selftests/bpf: verifier, check bpf_map_lookup_elem access in bpf prog
map_lookup_elem isn't supported by certain map types like: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Let's add verfier tests to check whether verifier prevents bpf_map_lookup_elem call on above programs from bpf program. Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_verifier.c | 121 +++- 1 file changed, 120 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index c7d25f23baf9..afa7e67f66e4 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -47,7 +47,7 @@ #define MAX_INSNS BPF_MAXINSNS #define MAX_FIXUPS 8 -#define MAX_NR_MAPS8 +#define MAX_NR_MAPS13 #define POINTER_VALUE 0xcafe4all #define TEST_DATA_LEN 64 @@ -64,6 +64,10 @@ struct bpf_test { int fixup_map2[MAX_FIXUPS]; int fixup_map3[MAX_FIXUPS]; int fixup_map4[MAX_FIXUPS]; + int fixup_map5[MAX_FIXUPS]; + int fixup_map6[MAX_FIXUPS]; + int fixup_map7[MAX_FIXUPS]; + int fixup_map8[MAX_FIXUPS]; int fixup_prog1[MAX_FIXUPS]; int fixup_prog2[MAX_FIXUPS]; int fixup_map_in_map[MAX_FIXUPS]; @@ -4391,6 +4395,85 @@ static struct bpf_test tests[] = { .errstr = "invalid access to packet", .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, + { + "prevent map lookup in sockmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map5 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 15 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in sockhash", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map6 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 18 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + }, + { + "prevent map lookup in xskmap", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map7 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 17 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_XDP, + }, + { + "prevent map lookup in stack trace", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_EXIT_INSN(), + }, + .fixup_map8 = { 3 }, + .result = REJECT, + .errstr = "cannot pass map_type 7 into func bpf_map_lookup_elem", + .prog_type = BPF_PROG_TYPE_PERF_EVENT, + }, + { + "prevent map lookup in prog array", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_e
[RFC v2 bpf-next 4/5] tools/bpf: bpftool, print strerror when map lookup error occurs
Since map lookup error can be ENOENT or EOPNOTSUPP, let's print strerror() as error message in normal and JSON output. This patch adds helper function print_entry_error() to print entry from lookup error occurs Example: Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "Operation not supported" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "Operation not supported" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: Operation not supported key: 0b 00 00 00 value: Operation not supported Found 0 elements Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/map.c | 29 - 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 28d365435fea..9f5de48f8a99 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -336,6 +336,25 @@ static void print_entry_json(struct bpf_map_info *info, unsigned char *key, jsonw_end_object(json_wtr); } +static void print_entry_error(struct bpf_map_info *info, unsigned char *key, + const char *value) +{ + int value_size = strlen(value); + bool single_line, break_names; + + break_names = info->key_size > 16 || value_size > 16; + single_line = info->key_size + value_size <= 24 && !break_names; + + printf("key:%c", break_names ? '\n' : ' '); + fprint_hex(stdout, key, info->key_size, " "); + + printf(single_line ? " " : "\n"); + + printf("value:%c%s", break_names ? '\n' : ' ', value); + + printf("\n"); +} + static void print_entry_plain(struct bpf_map_info *info, unsigned char *key, unsigned char *value) { @@ -663,6 +682,7 @@ static int dump_map_elem(int fd, void *key, void *value, json_writer_t *btf_wtr) { int num_elems = 0; + int lookup_errno; if (!bpf_map_lookup_elem(fd, key, value)) { if (json_output) { @@ -685,6 +705,8 @@ static int dump_map_elem(int fd, void *key, void *value, } /* lookup error handling */ + lookup_errno = errno; + if (map_is_map_of_maps(map_info->type) || map_is_map_of_progs(map_info->type)) return 0; @@ -694,13 +716,10 @@ static int dump_map_elem(int fd, void *key, void *value, print_hex_data_json(key, map_info->key_size); jsonw_name(json_wtr, "value"); jsonw_start_object(json_wtr); - jsonw_string_field(json_wtr, "error", - "can't lookup element"); + jsonw_string_field(json_wtr, "error", strerror(lookup_errno)); jsonw_end_object(json_wtr); } else { - p_info("can't lookup element with key: "); - fprint_hex(stderr, key, map_info->key_size, " "); - fprintf(stderr, "\n"); + print_entry_error(map_info, key, strerror(lookup_errno)); } return 0; -- 2.17.1
[RFC v2 bpf-next 3/5] tools/bpf: bpftool, split the function do_dump()
do_dump() function in bpftool/map.c has deep indentations. In order to reduce deep indent, let's move element printing code out of do_dump() into dump_map_elem() function. Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/map.c | 83 - 1 file changed, 49 insertions(+), 34 deletions(-) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 6003e9598973..28d365435fea 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -658,6 +658,54 @@ static int do_show(int argc, char **argv) return errno == ENOENT ? 0 : -1; } +static int dump_map_elem(int fd, void *key, void *value, +struct bpf_map_info *map_info, struct btf *btf, +json_writer_t *btf_wtr) +{ + int num_elems = 0; + + if (!bpf_map_lookup_elem(fd, key, value)) { + if (json_output) { + print_entry_json(map_info, key, value, btf); + } else { + if (btf) { + struct btf_dumper d = { + .btf = btf, + .jw = btf_wtr, + .is_plain_text = true, + }; + + do_dump_btf(, map_info, key, value); + } else { + print_entry_plain(map_info, key, value); + } + num_elems++; + } + return num_elems; + } + + /* lookup error handling */ + if (map_is_map_of_maps(map_info->type) || + map_is_map_of_progs(map_info->type)) + return 0; + + if (json_output) { + jsonw_name(json_wtr, "key"); + print_hex_data_json(key, map_info->key_size); + jsonw_name(json_wtr, "value"); + jsonw_start_object(json_wtr); + jsonw_string_field(json_wtr, "error", + "can't lookup element"); + jsonw_end_object(json_wtr); + } else { + p_info("can't lookup element with key: "); + fprint_hex(stderr, key, map_info->key_size, " "); + fprintf(stderr, "\n"); + } + + return 0; +} + static int do_dump(int argc, char **argv) { struct bpf_map_info info = {}; @@ -713,40 +761,7 @@ static int do_dump(int argc, char **argv) err = 0; break; } - - if (!bpf_map_lookup_elem(fd, key, value)) { - if (json_output) - print_entry_json(, key, value, btf); - else - if (btf) { - struct btf_dumper d = { - .btf = btf, - .jw = btf_wtr, - .is_plain_text = true, - }; - - do_dump_btf(, , key, value); - } else { - print_entry_plain(, key, value); - } - num_elems++; - } else if (!map_is_map_of_maps(info.type) && - !map_is_map_of_progs(info.type)) { - if (json_output) { - jsonw_name(json_wtr, "key"); - print_hex_data_json(key, info.key_size); - jsonw_name(json_wtr, "value"); - jsonw_start_object(json_wtr); - jsonw_string_field(json_wtr, "error", - "can't lookup element"); - jsonw_end_object(json_wtr); - } else { - p_info("can't lookup element with key: "); - fprint_hex(stderr, key, info.key_size, " "); - fprintf(stderr, "\n"); - } - } - + num_elems += dump_map_elem(fd, key, value, , btf, btf_wtr); prev_key = key; } -- 2.17.1
[RFC v2 bpf-next 1/5] bpf: error handling when map_lookup_elem isn't supported
The error value returned by map_lookup_elem doesn't differentiate whether lookup was failed because of invalid key or lookup is not supported. Lets add handling for -EOPNOTSUPP return value of map_lookup_elem() method of map, with expectation from map's implementation that it should return -EOPNOTSUPP if lookup is not supported. The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set to EOPNOTSUPP if map lookup is not supported. Signed-off-by: Prashant Bhole --- kernel/bpf/syscall.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5742df21598c..4f416234251f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -719,10 +719,15 @@ static int map_lookup_elem(union bpf_attr *attr) } else { rcu_read_lock(); ptr = map->ops->map_lookup_elem(map, key); - if (ptr) + if (IS_ERR(ptr)) { + err = PTR_ERR(ptr); + } else if (!ptr) { + err = -ENOENT; + } else { + err = 0; memcpy(value, ptr, value_size); + } rcu_read_unlock(); - err = ptr ? 0 : -ENOENT; } if (err) -- 2.17.1
[RFC v2 bpf-next 0/5] Error handling when map lookup isn't supported
Currently when map a lookup fails, user space API can not make any distinction whether given key was not found or lookup is not supported by particular map. In this series we modify return value of maps which do not support lookup. Lookup on such map implementation will return -EOPNOTSUPP. bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP errno. We also handle this error in bpftool to print appropriate message. Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall such that errno will set to EOPNOTSUPP when map doesn't support lookup Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP for maps which do not support lookup Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is moved out into new function dump_map_elem(). This was done in order to reduce deep indentation and accomodate further changes. Patch 4: Changes in bpftool to print strerror() message when lookup error is occured. This will result in appropriate message like "Operation not supported" when map doesn't support lookup. Patch 5: Added verifier tests to check whether verifier rejects call to bpf_map_lookup_elem from bpf program. For all map types those do not support map lookup. v2: - bpftool: all nit-pick fixes pointed out by Jakub - bpftool: removed usage of error strings. Now using strerror(), suggested by Jakub - added tests in verifier_tests, suggested by Alexei Prashant Bhole (5): bpf: error handling when map_lookup_elem isn't supported bpf: return EOPNOTSUPP when map lookup isn't supported tools/bpf: bpftool, split the function do_dump() tools/bpf: bpftool, print strerror when map lookup error occurs selftests/bpf: verifier, check bpf_map_lookup_elem access in bpf prog kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c| 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/syscall.c| 9 +- kernel/bpf/xskmap.c | 2 +- tools/bpf/bpftool/map.c | 102 +++-- tools/testing/selftests/bpf/test_verifier.c | 121 +++- 7 files changed, 199 insertions(+), 41 deletions(-) -- 2.17.1
Re: [RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed
On 9/21/2018 12:59 AM, Jakub Kicinski wrote: On Thu, 20 Sep 2018 14:04:19 +0900, Prashant Bhole wrote: On 9/20/2018 12:29 AM, Jakub Kicinski wrote: On Wed, 19 Sep 2018 16:51:43 +0900, Prashant Bhole wrote: Let's add a check for EOPNOTSUPP error when map lookup is failed. Also in case map doesn't support lookup, the output of map dump is changed from "can't lookup element" to "lookup not supported for this map". Patch adds function print_entry_error() function to print the error value. Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: lookup not supported for this map key: 0b 00 00 00 value: lookup not supported for this map Found 0 elements Nice improvement, thanks for the changes! I wonder what your thoughts would be on just printing some form of "lookup not supported for this map" only once? It seems slightly like repeated information - if lookup is not supported for one key it likely won't be for other keys too, so we could shorten the output. Would that make sense? Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/main.h | 5 + tools/bpf/bpftool/map.c | 35 ++- 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 40492cdc4e53..1a8c683f949b 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -46,6 +46,11 @@ #include "json_writer.h" +#define ERR_CANNOT_LOOKUP \ + "can't lookup element" +#define ERR_LOOKUP_NOT_SUPPORTED \ + "lookup not supported for this map" Do we need these? Are we going to reused them in more parts of the code? These are used only once. These can be used in do_lookup(). Currently do_lookup() prints strerror(errno) when lookup is failed. Shall I change that do_lookup() output? I actually prefer to stick to strerror(), the standard errors more clearly correlate with what happened in my mind (i.e. "Operation not supported" == kernel sent EOPNOTSUPP). strerror() may also print in local language if translation/localization matters. We could even use strerr() in dump_map_elem() but up to you. The one in do_lookup() I'd prefer to leave be ;) Sorry for the late reply. In v2 I have removed the error strings altogether. As you suggested output will be strerror(). Also added verifier tests as Alexei suggested. I am sending the RFC-v2 series soon. Thanks. -Prashant
[PATCH bpf-next] samples/bpf: fix compilation failure
following commit: commit d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") added struct bpf_flow_keys which conflicts with the struct with same name in sockex2_kern.c and sockex3_kern.c similar to commit: commit 534e0e52bc23 ("samples/bpf: fix a compilation failure") we tried the rename it "flow_keys" but it also conflicted with struct having same name in include/net/flow_dissector.h. Hence renaming the struct to "flow_key_record". Also, this commit doesn't fix the compilation error completely because the similar struct is present in sockex3_kern.c. Hence renaming it in both files sockex3_user.c and sockex3_kern.c Signed-off-by: Prashant Bhole --- samples/bpf/sockex2_kern.c | 11 ++- samples/bpf/sockex3_kern.c | 8 samples/bpf/sockex3_user.c | 4 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/samples/bpf/sockex2_kern.c b/samples/bpf/sockex2_kern.c index f58acfc92556..f2f9dbc021b0 100644 --- a/samples/bpf/sockex2_kern.c +++ b/samples/bpf/sockex2_kern.c @@ -14,7 +14,7 @@ struct vlan_hdr { __be16 h_vlan_encapsulated_proto; }; -struct bpf_flow_keys { +struct flow_key_record { __be32 src; __be32 dst; union { @@ -59,7 +59,7 @@ static inline __u32 ipv6_addr_hash(struct __sk_buff *ctx, __u64 off) } static inline __u64 parse_ip(struct __sk_buff *skb, __u64 nhoff, __u64 *ip_proto, -struct bpf_flow_keys *flow) +struct flow_key_record *flow) { __u64 verlen; @@ -83,7 +83,7 @@ static inline __u64 parse_ip(struct __sk_buff *skb, __u64 nhoff, __u64 *ip_proto } static inline __u64 parse_ipv6(struct __sk_buff *skb, __u64 nhoff, __u64 *ip_proto, - struct bpf_flow_keys *flow) + struct flow_key_record *flow) { *ip_proto = load_byte(skb, nhoff + offsetof(struct ipv6hdr, nexthdr)); @@ -96,7 +96,8 @@ static inline __u64 parse_ipv6(struct __sk_buff *skb, __u64 nhoff, __u64 *ip_pro return nhoff; } -static inline bool flow_dissector(struct __sk_buff *skb, struct bpf_flow_keys *flow) +static inline bool flow_dissector(struct __sk_buff *skb, + struct flow_key_record *flow) { __u64 nhoff = ETH_HLEN; __u64 ip_proto; @@ -198,7 +199,7 @@ struct bpf_map_def SEC("maps") hash_map = { SEC("socket2") int bpf_prog2(struct __sk_buff *skb) { - struct bpf_flow_keys flow = {}; + struct flow_key_record flow = {}; struct pair *value; u32 key; diff --git a/samples/bpf/sockex3_kern.c b/samples/bpf/sockex3_kern.c index 95907f8d2b17..c527b57d3ec8 100644 --- a/samples/bpf/sockex3_kern.c +++ b/samples/bpf/sockex3_kern.c @@ -61,7 +61,7 @@ struct vlan_hdr { __be16 h_vlan_encapsulated_proto; }; -struct bpf_flow_keys { +struct flow_key_record { __be32 src; __be32 dst; union { @@ -88,7 +88,7 @@ static inline __u32 ipv6_addr_hash(struct __sk_buff *ctx, __u64 off) } struct globals { - struct bpf_flow_keys flow; + struct flow_key_record flow; }; struct bpf_map_def SEC("maps") percpu_map = { @@ -114,14 +114,14 @@ struct pair { struct bpf_map_def SEC("maps") hash_map = { .type = BPF_MAP_TYPE_HASH, - .key_size = sizeof(struct bpf_flow_keys), + .key_size = sizeof(struct flow_key_record), .value_size = sizeof(struct pair), .max_entries = 1024, }; static void update_stats(struct __sk_buff *skb, struct globals *g) { - struct bpf_flow_keys key = g->flow; + struct flow_key_record key = g->flow; struct pair *value; value = bpf_map_lookup_elem(_map, ); diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c index 22f74d0e1493..9d02e0404719 100644 --- a/samples/bpf/sockex3_user.c +++ b/samples/bpf/sockex3_user.c @@ -13,7 +13,7 @@ #define PARSE_IP_PROG_FD (prog_fd[0]) #define PROG_ARRAY_FD (map_fd[0]) -struct flow_keys { +struct flow_key_record { __be32 src; __be32 dst; union { @@ -64,7 +64,7 @@ int main(int argc, char **argv) (void) f; for (i = 0; i < 5; i++) { - struct flow_keys key = {}, next_key; + struct flow_key_record key = {}, next_key; struct pair value; sleep(1); -- 2.17.1
Re: [RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed
On 9/20/2018 12:29 AM, Jakub Kicinski wrote: On Wed, 19 Sep 2018 16:51:43 +0900, Prashant Bhole wrote: Let's add a check for EOPNOTSUPP error when map lookup is failed. Also in case map doesn't support lookup, the output of map dump is changed from "can't lookup element" to "lookup not supported for this map". Patch adds function print_entry_error() function to print the error value. Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: lookup not supported for this map key: 0b 00 00 00 value: lookup not supported for this map Found 0 elements Nice improvement, thanks for the changes! I wonder what your thoughts would be on just printing some form of "lookup not supported for this map" only once? It seems slightly like repeated information - if lookup is not supported for one key it likely won't be for other keys too, so we could shorten the output. Would that make sense? Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/main.h | 5 + tools/bpf/bpftool/map.c | 35 ++- 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 40492cdc4e53..1a8c683f949b 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -46,6 +46,11 @@ #include "json_writer.h" +#define ERR_CANNOT_LOOKUP \ + "can't lookup element" +#define ERR_LOOKUP_NOT_SUPPORTED \ + "lookup not supported for this map" Do we need these? Are we going to reused them in more parts of the code? These are used only once. These can be used in do_lookup(). Currently do_lookup() prints strerror(errno) when lookup is failed. Shall I change that do_lookup() output? -Prashant
Re: [RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed
On 9/20/2018 12:29 AM, Jakub Kicinski wrote: On Wed, 19 Sep 2018 16:51:43 +0900, Prashant Bhole wrote: Let's add a check for EOPNOTSUPP error when map lookup is failed. Also in case map doesn't support lookup, the output of map dump is changed from "can't lookup element" to "lookup not supported for this map". Patch adds function print_entry_error() function to print the error value. Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: lookup not supported for this map key: 0b 00 00 00 value: lookup not supported for this map Found 0 elements Nice improvement, thanks for the changes! I wonder what your thoughts would be on just printing some form of "lookup not supported for this map" only once? It seems slightly like repeated information - if lookup is not supported for one key it likely won't be for other keys too, so we could shorten the output. Would that make sense? I too thought that the message is repeated information. One idea was as you said, stop iterating once we get EOPNOTSUPP. But only reason for keeping this way is that user will at least see dump of keys along with it. Also it will be consistent with Json output. What is your opinion on it? I will fix other things that you pointed out. Thanks! -Prashant Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/main.h | 5 + tools/bpf/bpftool/map.c | 35 ++- 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 40492cdc4e53..1a8c683f949b 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -46,6 +46,11 @@ #include "json_writer.h" +#define ERR_CANNOT_LOOKUP \ + "can't lookup element" +#define ERR_LOOKUP_NOT_SUPPORTED \ + "lookup not supported for this map" Do we need these? Are we going to reused them in more parts of the code? #define ptr_to_u64(ptr) ((__u64)(unsigned long)(ptr)) #define NEXT_ARG() ({ argc--; argv++; if (argc < 0) usage(); }) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 284e12a289c0..2faccd2098c9 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -333,6 +333,25 @@ static void print_entry_json(struct bpf_map_info *info, unsigned char *key, jsonw_end_object(json_wtr); } +static void print_entry_error(struct bpf_map_info *info, unsigned char *key, + char *value) +{ + bool single_line, break_names; + int value_size = strlen(value); nit: order variables declaration lines to shortest, please. + + break_names = info->key_size > 16 || value_size > 16; + single_line = info->key_size + value_size <= 24 && !break_names; + + printf("key:%c", break_names ? '\n' : ' '); + fprint_hex(stdout, key, info->key_size, " "); + + printf(single_line ? " " : "\n"); + + printf("value:%c%s", break_names ? '\n' : ' ', value); + + printf("\n"); +} + static void print_entry_plain(struct bpf_map_info *info, unsigned char *key, unsigned char *value) { @@ -660,6 +679,8 @@ static int dump_map_elem(int fd, void *key, void *value, json_writer_t *btf_wtr) { int num_elems = 0; + int lookup_errno; + char *errstr; nit: const char? if (!bpf_map_lookup_elem(fd, key, value)) { if (json_output) {
Re: [RFC bpf-next 3/4] tools/bpf: bpftool, split the function do_dump()
On 9/20/2018 12:26 AM, Jakub Kicinski wrote: On Wed, 19 Sep 2018 16:51:42 +0900, Prashant Bhole wrote: +static int dump_map_elem(int fd, void *key, void *value, +struct bpf_map_info *map_info, struct btf *btf, +json_writer_t *btf_wtr) +{ + int num_elems = 0; + + if (!bpf_map_lookup_elem(fd, key, value)) { + if (json_output) { + print_entry_json(map_info, key, value, btf); + } else { + if (btf) { + struct btf_dumper d = { + .btf = btf, + .jw = btf_wtr, + .is_plain_text = true, + }; + + do_dump_btf(, map_info, key, value); + } else { + print_entry_plain(map_info, key, value); + } + num_elems++; + } + goto out; + } + + /* lookup error handling */ + if (map_is_map_of_maps(map_info->type) || + map_is_map_of_progs(map_info->type)) + goto out; + nit: why not just return? the goto seems to only do a return anyway, is this suggested by some coding style? Is it to help the compiler? I see people do this from time to time.. Thanks for reviewing. I agree, goto and the label isn't needed. I will fix it. -Prashant [...] +out: + return num_elems;
Re: [RFC bpf-next 2/4] bpf: return EOPNOTSUPP when map lookup isn't supported
On 9/20/2018 12:14 AM, Alexei Starovoitov wrote: On Wed, Sep 19, 2018 at 04:51:41PM +0900, Prashant Bhole wrote: Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole --- kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/xskmap.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index dded84cbe814..24583da9ffd1 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map) static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } conceptually the set looks good to me. Please add a test to test_verifier.c to make sure that these lookup helpers cannot be called from BPF program. Otherwise this diff may cause crashes. Thanks for reviewing. Is the verifier change below sufficient? --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2128,10 +2128,18 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (env->subprog_cnt > 1) { verbose(env, "tail_calls are not allowed in programs with bpf-to-bpf calls\n"); return -EINVAL; } break; + case BPF_FUNC_map_lookup_elem: + if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY || + map->map_type == BPF_MAP_TYPE_STACK_TRACE || + map->map_type == BPF_MAP_TYPE_XSKMAP || + map->map_type == BPF_MAP_TYPE_SOCKMAP || + map->map_type == BPF_MAP_TYPE_SOCKHASH) + goto error; + break; case BPF_FUNC_perf_event_read: case BPF_FUNC_perf_event_output: case BPF_FUNC_perf_event_read_value: if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) goto error; -Prashant
Re: [RFC bpf-next 2/4] bpf: return EOPNOTSUPP when map lookup isn't supported
On 9/20/2018 3:40 AM, Mauricio Vasquez wrote: On 09/19/2018 10:14 AM, Alexei Starovoitov wrote: On Wed, Sep 19, 2018 at 04:51:41PM +0900, Prashant Bhole wrote: Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole --- kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/xskmap.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index dded84cbe814..24583da9ffd1 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map) static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } conceptually the set looks good to me. Please add a test to test_verifier.c to make sure that these lookup helpers cannot be called from BPF program. Otherwise this diff may cause crashes. I think we can remove all these stub functions that return EOPNOTSUPP or EINVAL and return the error in syscall.c if ops->map_[update, delete, lookup, ...] is null. This will require to extend (and test) the verifier to guarantee that those helpers are not called in wrong maps, for example map_delete_element in array like maps. Would it make sense? Thanks for review and suggestion. I had thought about this way too (except adding restrictions in the verifier). There is no strong reason for choosing current implementation. I thought there must be some reason that those methods are implemented and just returning NULL. Also there are no NULL checks for map_lookup_elem stub. So I decided to simply change the return value. If some more people agree with removing stub function idea, I will do it. -Prashant
[RFC bpf-next 1/4] bpf: error handling when map_lookup_elem isn't supported
The error value returned by map_lookup_elem doesn't differentiate whether lookup was failed because of invalid key or lookup is not supported. Lets add handling for -EOPNOTSUPP return value of map_lookup_elem() method of map, with expectation from map's implementation that it should return -EOPNOTSUPP if lookup is not supported. The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set to EOPNOTSUPP if map lookup is not supported. Signed-off-by: Prashant Bhole --- kernel/bpf/syscall.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index b3c2d09bcf7a..ecb06352b5a0 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -716,10 +716,15 @@ static int map_lookup_elem(union bpf_attr *attr) } else { rcu_read_lock(); ptr = map->ops->map_lookup_elem(map, key); - if (ptr) + if (IS_ERR(ptr)) { + err = PTR_ERR(ptr); + } else if (!ptr) { + err = -ENOENT; + } else { + err = 0; memcpy(value, ptr, value_size); + } rcu_read_unlock(); - err = ptr ? 0 : -ENOENT; } if (err) -- 2.17.1
[RFC bpf-next 3/4] tools/bpf: bpftool, split the function do_dump()
do_dump() function in bpftool/map.c has deep indentations. In order to reduce deep indent, let's move element printing code out of do_dump() into dump_map_elem() function. Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/map.c | 83 - 1 file changed, 49 insertions(+), 34 deletions(-) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index af8ad32fa6e9..284e12a289c0 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -655,6 +655,54 @@ static int do_show(int argc, char **argv) return errno == ENOENT ? 0 : -1; } +static int dump_map_elem(int fd, void *key, void *value, +struct bpf_map_info *map_info, struct btf *btf, +json_writer_t *btf_wtr) +{ + int num_elems = 0; + + if (!bpf_map_lookup_elem(fd, key, value)) { + if (json_output) { + print_entry_json(map_info, key, value, btf); + } else { + if (btf) { + struct btf_dumper d = { + .btf = btf, + .jw = btf_wtr, + .is_plain_text = true, + }; + + do_dump_btf(, map_info, key, value); + } else { + print_entry_plain(map_info, key, value); + } + num_elems++; + } + goto out; + } + + /* lookup error handling */ + if (map_is_map_of_maps(map_info->type) || + map_is_map_of_progs(map_info->type)) + goto out; + + if (json_output) { + jsonw_name(json_wtr, "key"); + print_hex_data_json(key, map_info->key_size); + jsonw_name(json_wtr, "value"); + jsonw_start_object(json_wtr); + jsonw_string_field(json_wtr, "error", + "can't lookup element"); + jsonw_end_object(json_wtr); + } else { + p_info("can't lookup element with key: "); + fprint_hex(stderr, key, map_info->key_size, " "); + fprintf(stderr, "\n"); + } +out: + return num_elems; +} + static int do_dump(int argc, char **argv) { struct bpf_map_info info = {}; @@ -710,40 +758,7 @@ static int do_dump(int argc, char **argv) err = 0; break; } - - if (!bpf_map_lookup_elem(fd, key, value)) { - if (json_output) - print_entry_json(, key, value, btf); - else - if (btf) { - struct btf_dumper d = { - .btf = btf, - .jw = btf_wtr, - .is_plain_text = true, - }; - - do_dump_btf(, , key, value); - } else { - print_entry_plain(, key, value); - } - num_elems++; - } else if (!map_is_map_of_maps(info.type) && - !map_is_map_of_progs(info.type)) { - if (json_output) { - jsonw_name(json_wtr, "key"); - print_hex_data_json(key, info.key_size); - jsonw_name(json_wtr, "value"); - jsonw_start_object(json_wtr); - jsonw_string_field(json_wtr, "error", - "can't lookup element"); - jsonw_end_object(json_wtr); - } else { - p_info("can't lookup element with key: "); - fprint_hex(stderr, key, info.key_size, " "); - fprintf(stderr, "\n"); - } - } - + num_elems += dump_map_elem(fd, key, value, , btf, btf_wtr); prev_key = key; } -- 2.17.1
[RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed
Let's add a check for EOPNOTSUPP error when map lookup is failed. Also in case map doesn't support lookup, the output of map dump is changed from "can't lookup element" to "lookup not supported for this map". Patch adds function print_entry_error() function to print the error value. Following example dumps a map which does not support lookup. Output before: root# bpftool map -jp dump id 40 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "can\'t lookup element" } ] root# bpftool map dump id 40 can't lookup element with key: 0a 00 00 00 can't lookup element with key: 0b 00 00 00 Found 0 elements Output after changes: root# bpftool map dump -jp id 45 [ "key": ["0x0a","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" }, "key": ["0x0b","0x00","0x00","0x00" ], "value": { "error": "lookup not supported for this map" } ] root# bpftool map dump id 45 key: 0a 00 00 00 value: lookup not supported for this map key: 0b 00 00 00 value: lookup not supported for this map Found 0 elements Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/main.h | 5 + tools/bpf/bpftool/map.c | 35 ++- 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 40492cdc4e53..1a8c683f949b 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -46,6 +46,11 @@ #include "json_writer.h" +#define ERR_CANNOT_LOOKUP \ + "can't lookup element" +#define ERR_LOOKUP_NOT_SUPPORTED \ + "lookup not supported for this map" + #define ptr_to_u64(ptr)((__u64)(unsigned long)(ptr)) #define NEXT_ARG() ({ argc--; argv++; if (argc < 0) usage(); }) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 284e12a289c0..2faccd2098c9 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -333,6 +333,25 @@ static void print_entry_json(struct bpf_map_info *info, unsigned char *key, jsonw_end_object(json_wtr); } +static void print_entry_error(struct bpf_map_info *info, unsigned char *key, + char *value) +{ + bool single_line, break_names; + int value_size = strlen(value); + + break_names = info->key_size > 16 || value_size > 16; + single_line = info->key_size + value_size <= 24 && !break_names; + + printf("key:%c", break_names ? '\n' : ' '); + fprint_hex(stdout, key, info->key_size, " "); + + printf(single_line ? " " : "\n"); + + printf("value:%c%s", break_names ? '\n' : ' ', value); + + printf("\n"); +} + static void print_entry_plain(struct bpf_map_info *info, unsigned char *key, unsigned char *value) { @@ -660,6 +679,8 @@ static int dump_map_elem(int fd, void *key, void *value, json_writer_t *btf_wtr) { int num_elems = 0; + int lookup_errno; + char *errstr; if (!bpf_map_lookup_elem(fd, key, value)) { if (json_output) { @@ -682,22 +703,26 @@ static int dump_map_elem(int fd, void *key, void *value, } /* lookup error handling */ + lookup_errno = errno; + if (map_is_map_of_maps(map_info->type) || map_is_map_of_progs(map_info->type)) goto out; + if (lookup_errno == EOPNOTSUPP) + errstr = ERR_LOOKUP_NOT_SUPPORTED; + else + errstr = ERR_CANNOT_LOOKUP; + if (json_output) { jsonw_name(json_wtr, "key"); print_hex_data_json(key, map_info->key_size); jsonw_name(json_wtr, "value"); jsonw_start_object(json_wtr); - jsonw_string_field(json_wtr, "error", - "can't lookup element"); + jsonw_string_field(json_wtr, "error", errstr); jsonw_end_object(json_wtr); } else { - p_info("can't lookup element with key: "); - fprint_hex(stderr, key, map_info->key_size, " "); - fprintf(stderr, "\n"); + print_entry_error(map_info, key, errstr); } out: return num_elems; -- 2.17.1
[RFC bpf-next 2/4] bpf: return EOPNOTSUPP when map lookup isn't supported
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Signed-off-by: Prashant Bhole --- kernel/bpf/arraymap.c | 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c | 2 +- kernel/bpf/xskmap.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index dded84cbe814..24583da9ffd1 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map) static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* only called from syscall */ diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 488ef9663c01..e50922a802f7 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -2076,7 +2076,7 @@ int sockmap_get_from_fd(const union bpf_attr *attr, int type, static void *sock_map_lookup(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } static int sock_map_update_elem(struct bpf_map *map, diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 8061a439ef18..b2ade10f7ec3 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -505,7 +505,7 @@ const struct bpf_func_proto bpf_get_stack_proto = { /* Called from eBPF program */ static void *stack_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } /* Called from syscall */ diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c index 9f8463afda9c..ef0b7b6ef8a5 100644 --- a/kernel/bpf/xskmap.c +++ b/kernel/bpf/xskmap.c @@ -154,7 +154,7 @@ void __xsk_map_flush(struct bpf_map *map) static void *xsk_map_lookup_elem(struct bpf_map *map, void *key) { - return NULL; + return ERR_PTR(-EOPNOTSUPP); } static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, -- 2.17.1
[RFC bpf-next 0/4] Error handling when map lookup isn't supported
Currently when map a lookup is failed, user space API can not make any distinction whether given key was not found or lookup is not supported by particular map. In this series we modify return value of maps which do not support lookup. Lookup on such map implementation will return -EOPNOTSUPP. bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP errno. We also handle this error in bpftool to print appropriate message. Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall such that errno will set to EOPNOTSUPP when map doesn't support lookup Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP for maps which do not support lookup Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is moved out into new function dump_map_elem(). This was done in order to reduce deep indentation and accomodate further changes. Patch 4: Changes to bpftool to do additional checking for errno when map lookup is failed. In case of EOPNOTSUPP errno, it prints message "lookup not supported for this map" Prashant Bhole (4): bpf: error handling when map_lookup_elem isn't supported bpf: return EOPNOTSUPP when map lookup isn't supported tools/bpf: bpftool, split the function do_dump() tools/bpf: handle EOPNOTSUPP when map lookup is failed kernel/bpf/arraymap.c| 2 +- kernel/bpf/sockmap.c | 2 +- kernel/bpf/stackmap.c| 2 +- kernel/bpf/syscall.c | 9 +++- kernel/bpf/xskmap.c | 2 +- tools/bpf/bpftool/main.h | 5 ++ tools/bpf/bpftool/map.c | 108 +++ 7 files changed, 90 insertions(+), 40 deletions(-) -- 2.17.1
[PATCH bpf-next] tools/bpf: bpftool, add xskmap in map types
When listed all maps, bpftool currently shows (null) for xskmap. Added xskmap type in map_type_name[] to show correct type. Signed-off-by: Prashant Bhole --- tools/bpf/bpftool/map.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index df175bc33c5d..9c55077ca5dd 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -68,6 +68,7 @@ static const char * const map_type_name[] = { [BPF_MAP_TYPE_DEVMAP] = "devmap", [BPF_MAP_TYPE_SOCKMAP] = "sockmap", [BPF_MAP_TYPE_CPUMAP] = "cpumap", + [BPF_MAP_TYPE_XSKMAP] = "xskmap", [BPF_MAP_TYPE_SOCKHASH] = "sockhash", [BPF_MAP_TYPE_CGROUP_STORAGE] = "cgroup_storage", }; -- 2.17.1
[PATCH bpf-next] samples/bpf: xdpsock, minor fixes
- xsks_map size was fixed to 4, changed it MAX_SOCKS - Remove redundant definition of MAX_SOCKS in xdpsock_user.c - In dump_stats(), add NULL check for xsks[i] Signed-off-by: Prashant Bhole --- samples/bpf/xdpsock_kern.c | 2 +- samples/bpf/xdpsock_user.c | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/samples/bpf/xdpsock_kern.c b/samples/bpf/xdpsock_kern.c index d8806c41362e..b8ccd0802b3f 100644 --- a/samples/bpf/xdpsock_kern.c +++ b/samples/bpf/xdpsock_kern.c @@ -16,7 +16,7 @@ struct bpf_map_def SEC("maps") xsks_map = { .type = BPF_MAP_TYPE_XSKMAP, .key_size = sizeof(int), .value_size = sizeof(int), - .max_entries = 4, + .max_entries = MAX_SOCKS, }; struct bpf_map_def SEC("maps") rr_map = { diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index b3906111bedb..57ecadc58403 100644 --- a/samples/bpf/xdpsock_user.c +++ b/samples/bpf/xdpsock_user.c @@ -118,7 +118,6 @@ struct xdpsock { unsigned long prev_tx_npkts; }; -#define MAX_SOCKS 4 static int num_socks; struct xdpsock *xsks[MAX_SOCKS]; @@ -596,7 +595,7 @@ static void dump_stats(void) prev_time = now; - for (i = 0; i < num_socks; i++) { + for (i = 0; i < num_socks && xsks[i]; i++) { char *fmt = "%-15s %'-11.0f %'-11lu\n"; double rx_pps, tx_pps; -- 2.17.1
[PATCH bpf-next] xsk: remove unnecessary assignment
Since xdp_umem_query() was added one assignment of bpf.command was missed from cleanup. Removing the assignment statement. Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level") Signed-off-by: Prashant Bhole --- net/xdp/xdp_umem.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index bfe2dbea480b..d179732617dc 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -76,8 +76,6 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit) return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */ - bpf.command = XDP_QUERY_XSK_UMEM; - rtnl_lock(); err = xdp_umem_query(dev, queue_id); if (err) { -- 2.17.1
[PATCH bpf] xsk: fix return value of xdp_umem_assign_dev()
s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev(). This function's return value is directly returned by xsk_bind(). EOPNOTSUPP is bind()'s possible return value. Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()") Signed-off-by: Prashant Bhole --- net/xdp/xdp_umem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 911ca6d3cb5a..bfe2dbea480b 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -74,14 +74,14 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, return 0; if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit) - return force_zc ? -ENOTSUPP : 0; /* fail or fallback */ + return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */ bpf.command = XDP_QUERY_XSK_UMEM; rtnl_lock(); err = xdp_umem_query(dev, queue_id); if (err) { - err = err < 0 ? -ENOTSUPP : -EBUSY; + err = err < 0 ? -EOPNOTSUPP : -EBUSY; goto err_rtnl_unlock; } -- 2.17.1
Re: selftests/bpf test_sockmap failure
On 7/25/2018 8:02 AM, Yonghong Song wrote: On 7/24/18 3:40 PM, John Fastabend wrote: On 07/24/2018 08:45 AM, Yonghong Song wrote: In one of our production machines, tools/testing/selftests/bpf test_sockmap failed randomly like below: ... [TEST 78]: (512, 1, 1, sendmsg, pass,apply 1,): rx thread exited with err 1. FAILED ... ... [TEST 80]: (2, 1024, 256, sendmsg, pass,apply 1,): rx thread exited with err 1. FAILED ... ... [TEST 83]: (100, 1, 5, sendpage, pass,apply 1,): rx thread exited with err 1. FAILED ... ... [TEST 79]: (512, 1, 1, sendpage, pass,apply 1,): rx thread exited with err 1. FAILED ... The command line is just `test_sockmap`. The machine has 80 cpus, 256G memory. The kernel is based on 4.16 but backported with latest bpf-next bpf changes. The failed test number (78, 79, 80, or 83) is random. But they all share similar characteristics: . the option rate is greater than one, i.e., more than one sendmsg/sendpage in the sender forked process. . The txmsg_apply is not 0 I debugged a little bit. It happens in msg_loop() function below "unexpected timeout" path. ... slct = select(max_fd + 1, , NULL, NULL, ); if (slct == -1) { perror("select()"); clock_gettime(CLOCK_MONOTONIC, >end); goto out_errno; } else if (!slct) { if (opt->verbose) fprintf(stderr, "unexpected timeout\n"); errno = -EIO; clock_gettime(CLOCK_MONOTONIC, >end); goto out_errno; } ... It appears that when the error happens, the receive process does not receive all bytes sent from the send process and eventually times out. Has anybody seen this issue as well? John, any comments on this failure? Can you run the test with verbose enabled so we can determine if the tx side is even sending the message? Sample patch below. This will allow us to see the tx bytes and rx bytes, although it will be a bit noisy. I notice that the test program is not smart enough to (re)send bytes if the sendmsg call doesn't consume all bytes. This is a valid error if we get a enomem or other normal error on the tx side. With apply this is more likely because every byte (in apply = 1 case) is being sent through BPF prog. The following are some of logs for the failed tests: ... [TEST 78]: (512, 1, 1, sendmsg, pass,apply 1,): connected sockets: c1 <-> p1, c2 <-> p2 cgroups binding: c1(24) <-> s1(22) - - - c2(25) <-> s2(23) tx_sendmsg: TX: 512B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 0.00GB/s unexpected timeout msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err -5 rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 147B 147.00B/s 0.00GB/s rx thread exited with err 1. FAILED ... [TEST 82]: (100, 1, 5, sendmsg, pass,apply 1,): connected sockets: c1 <-> p1, c2 <-> p2 cgroups binding: c1(24) <-> s1(22) - - - c2(25) <-> s2(23) tx_sendmsg: TX: 500B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 0.00GB/s unexpected timeout msg_loop_rx: iov_count 1 iov_buf 5 cnt 100 err -5 rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 366B 366.00B/s 0.00GB/s rx thread exited with err 1. FAILED [TEST 402]: (512, 1, 1, sendmsg, pass,apply 1,): connected sockets: c1 <-> p1, c2 <-> p2 cgroups binding: c1(42) <-> s1(40) - - - c2(43) <-> s2(41) tx_sendmsg: TX: 512B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 0.00GB/s unexpected timeout msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err -5 rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 147B 147.00B/s 0.00GB/s rx thread exited with err 1. FAILED [TEST 80]: (2, 1024, 256, sendmsg, pass,apply 1,): connected sockets: c1 <-> p1, c2 <-> p2 cgroups binding: c1(24) <-> s1(22) - - - c2(25) <-> s2(23) tx_sendmsg: TX: 524288B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 0.00GB/s unexpected timeout msg_loop_rx: iov_count 1024 iov_buf 256 cnt 2 err -5 rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 458805B 458805.00B/s 0.000459GB/s rx thread exited with err 1. FAILED If this is not the case I can do some more digging but I've not seen this before. I cannot reproduce it in my FC27 VM (with latest bpf-next) either. The bug only shows up on our production service which is a lot busier and has more cpus/memorys and some sysctl configurations are different from my VM setup. In the past I have noticed that on busy machines sometimes data is delayed so much that select() timeout is triggered. Although it isn't a solution but you can try to increase the timeout value to check if that is the case. -Prashant
[PATCH net-next] net: ip6_gre: get ipv6hdr after skb_cow_head()
A KASAN:use-after-free bug was found related to ip6-erspan while running selftests/net/ip6_gre_headroom.sh It happens because of following sequence: - ipv6hdr pointer is obtained from skb - skb_cow_head() is called, skb->head memory is reallocated - old data is accessed using ipv6hdr pointer skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure enough headroom at xmit."), but looking at the history there was a chance of similar bug because gre_handle_offloads() and pskb_trim() can also reallocate skb->head memory. Fixes tag points to commit which introduced possibility of this bug. This patch moves ipv6hdr pointer assignment after skb_cow_head() call. Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Prashant Bhole --- net/ipv6/ip6_gre.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 367177786e34..fc7dd3a04360 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -927,7 +927,6 @@ static netdev_tx_t ip6gre_tunnel_xmit(struct sk_buff *skb, static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) { - struct ipv6hdr *ipv6h = ipv6_hdr(skb); struct ip6_tnl *t = netdev_priv(dev); struct dst_entry *dst = skb_dst(skb); struct net_device_stats *stats; @@ -1012,6 +1011,8 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, goto tx_err; } } else { + struct ipv6hdr *ipv6h = ipv6_hdr(skb); + switch (skb->protocol) { case htons(ETH_P_IP): memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); -- 2.17.1
Re: [BUG net-next] BUG triggered with GRO SKB list_head changes
On 7/12/2018 7:39 AM, Tyler Hicks wrote: Starting with the following net-next commit, I see a BUG when starting a LXD container inside of a KVM guest using virtio-net: d4546c2509b1 net: Convert GRO SKB handling to list_head. Recently I encountered KASAN:use-after-free BUG and git bisect pointed to above commit. Looks like this is the same issue without KASAN enabled. I have submitted a bugfix for this BUG with Tyler Hicks in Reported-by tag. -Prashant Here's what the kernel spits out: kernel BUG at /var/scm/kernel/linux/include/linux/skbuff.h:2080! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 0 PID: 1362 Comm: libvirtd Not tainted 4.18.0-rc2+ #69 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 RIP: 0010:skb_pull+0x36/0x40 Code: c6 77 24 29 f0 3b 87 84 00 00 00 89 87 80 00 00 00 72 17 89 f6 48 89 f0 48 03 87 d8 00 00 00 48 89 87 d8 00 00 00 c3 31 c0 c3 <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 39 b7 80 00 00 00 76 RSP: :96737f6039f0 EFLAGS: 00010297 RAX: 9c66e2f2 RBX: RCX: 0501 RDX: 0001 RSI: 000e RDI: 96737f7e3938 RBP: 967379f40020 R08: R09: R10: 96737f603988 R11: c0461335 R12: 967379f409e0 R13: 96737f7e3938 R14: R15: 967379e96ac0 FS: 7fc96087e640() GS:96737f60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fc913608aa0 CR3: 5dacc001 CR4: 001606f0 Call Trace: br_dev_xmit+0xe1/0x3d0 [bridge] dev_hard_start_xmit+0xbc/0x3b0 __dev_queue_xmit+0xb98/0xc30 ip_finish_output2+0x3e5/0x670 ? ip_output+0x7f/0x250 ip_output+0x7f/0x250 ? ip_fragment.constprop.5+0x80/0x80 ip_forward+0x3e2/0x650 ? ipv4_frags_init_net+0x130/0x130 ip_rcv+0x2be/0x500 ? ip_local_deliver_finish+0x3b0/0x3b0 __netif_receive_skb_core+0x6a8/0xb30 ? lock_acquire+0xab/0x200 ? netif_receive_skb_internal+0x2a/0x380 netif_receive_skb_internal+0x73/0x380 ? napi_gro_complete+0xcf/0x1b0 dev_gro_receive+0x374/0x730 napi_gro_receive+0x4f/0x1d0 receive_buf+0x4b6/0x1930 [virtio_net] ? detach_buf+0x69/0x120 [virtio_ring] virtnet_poll+0x122/0x2e0 [virtio_net] net_rx_action+0x207/0x450 __do_softirq+0x149/0x4ea irq_exit+0xbf/0xd0 do_IRQ+0x6c/0x130 common_interrupt+0xf/0xf RIP: 0010:__radix_tree_lookup+0x28/0xe0 Code: 00 00 53 49 89 ca 41 bb 40 00 00 00 4c 8b 47 50 4c 89 c0 83 e0 03 48 83 f8 01 0f 85 a8 00 00 00 4c 89 c0 48 83 e0 fe 0f b6 08 <4c> 89 d8 48 d3 e0 48 83 e8 01 48 39 c6 76 11 e9 9f 00 00 00 4c 89 RSP: :ae150048fcc0 EFLAGS: 0282 ORIG_RAX: ffd9 RAX: 96735d2ef908 RBX: 001f RCX: 0006 RDX: RSI: 02e2 RDI: 96735d10b788 RBP: 02e2 R08: 96735d2ef909 R09: R10: R11: 0040 R12: 001f R13: ec01c15f3a80 R14: 001f R15: ae150048fd18 __do_page_cache_readahead+0x11f/0x2e0 filemap_fault+0x408/0x660 ext4_filemap_fault+0x2f/0x40 __do_fault+0x1f/0xd0 __handle_mm_fault+0x915/0xfa0 handle_mm_fault+0x1c2/0x390 __do_page_fault+0x2f6/0x580 ? async_page_fault+0x5/0x20 async_page_fault+0x1b/0x20 RIP: 0033:0x7fc913608aa0 Code: Bad RIP value. RSP: 002b:7ffcfa9c7f08 EFLAGS: 00010206 RAX: RBX: 0003 RCX: 0080 RDX: 0006 RSI: 7fc913a74bf8 RDI: 7fc913df9720 RBP: 0001 R08: 55df45795700 R09: R10: 55df4574c010 R11: 0001 R12: 7ffcfa9c8c38 R13: 7ffcfa9c8c48 R14: 7fc913dc3d70 R15: 55df4578ab30 Modules linked in: veth ebtable_filter ebtables ipt_MASQUERADE xt_CHECKSUM xt_comment xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_filter bpfilter bridge stp llc fuse kvm_intel kvm irqbypass 9pnet_virtio 9pnet virtio_balloon ib_iser rdma_cm configfs iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables virtio_net net_failover virtio_blk failover crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper virtio_pci psmouse virtio_ring virtio I'm not very familiar with the GRO or IP fragmentation code but I was able to identify that this change "fixes" the issue: diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7ccc601b55d9..a5cea572a7f1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -666,6 +666,7 @@ struct sk_buff { /* These two members must be first. */ struct sk_buff *next; struct sk_buff *prev; + struct list_head
[PATCH net-next] net: gro: properly remove skb from list
Following crash occurs in validate_xmit_skb_list() when same skb is iterated multiple times in the loop and consume_skb() is called. The root cause is calling list_del_init(>list) and not clearing skb->next in d4546c2509b1. list_del_init(>list) sets skb->next to point to skb itself. skb->next needs to be cleared because other parts of network stack uses another kind of SKB lists. validate_xmit_skb_list() uses such list. A similar type of bugfix was reported by Jesper Dangaard Brouer. https://patchwork.ozlabs.org/patch/942541/ This patch clears skb->next and changes list_del_init() to list_del() so that list->prev will maintain the list poison. [ 148.185511] == [ 148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0 [ 148.190158] Read of size 8 at addr 8801e52eefc0 by task swapper/1/0 [ 148.192940] [ 148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25 [ 148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 148.199129] Call Trace: [ 148.200565] [ 148.201911] dump_stack+0xc6/0x14c [ 148.203572] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 148.205083] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 148.206307] ? validate_xmit_skb+0x2c6/0x560 [ 148.207432] ? debug_show_held_locks+0x30/0x30 [ 148.208571] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.211144] print_address_description+0x6c/0x23c [ 148.212601] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.213782] kasan_report.cold.6+0x241/0x2fd [ 148.214958] validate_xmit_skb_list+0x4b/0xa0 [ 148.216494] sch_direct_xmit+0x1b0/0x680 [ 148.217601] ? dev_watchdog+0x4e0/0x4e0 [ 148.218675] ? do_raw_spin_trylock+0x10/0x120 [ 148.219818] ? do_raw_spin_lock+0xe0/0xe0 [ 148.221032] __dev_queue_xmit+0x1167/0x1810 [ 148.222155] ? sched_clock+0x5/0x10 [...] [ 148.474257] Allocated by task 0: [ 148.475363] kasan_kmalloc+0xbf/0xe0 [ 148.476503] kmem_cache_alloc+0xb4/0x1b0 [ 148.477654] __build_skb+0x91/0x250 [ 148.478677] build_skb+0x67/0x180 [ 148.479657] e1000_clean_rx_irq+0x542/0x8a0 [ 148.480757] e1000_clean+0x652/0xd10 [ 148.481772] net_rx_action+0x4ea/0xc20 [ 148.482808] __do_softirq+0x1f9/0x574 [ 148.483831] [ 148.484575] Freed by task 0: [ 148.485504] __kasan_slab_free+0x12e/0x180 [ 148.486589] kmem_cache_free+0xb4/0x240 [ 148.487634] kfree_skbmem+0xed/0x150 [ 148.488648] consume_skb+0x146/0x250 [ 148.489665] validate_xmit_skb+0x2b7/0x560 [ 148.490754] validate_xmit_skb_list+0x70/0xa0 [ 148.491897] sch_direct_xmit+0x1b0/0x680 [ 148.493949] __dev_queue_xmit+0x1167/0x1810 [ 148.495103] br_dev_queue_push_xmit+0xce/0x250 [ 148.496196] br_forward_finish+0x276/0x280 [ 148.497234] __br_forward+0x44f/0x520 [ 148.498260] br_forward+0x19f/0x1b0 [ 148.499264] br_handle_frame_finish+0x65e/0x980 [ 148.500398] NF_HOOK.constprop.10+0x290/0x2a0 [ 148.501522] br_handle_frame+0x417/0x640 [ 148.502582] __netif_receive_skb_core+0xaac/0x18f0 [ 148.503753] __netif_receive_skb_one_core+0x98/0x120 [ 148.504958] netif_receive_skb_internal+0xe3/0x330 [ 148.506154] napi_gro_complete+0x190/0x2a0 [ 148.507243] dev_gro_receive+0x9f7/0x1100 [ 148.508316] napi_gro_receive+0xcb/0x260 [ 148.509387] e1000_clean_rx_irq+0x2fc/0x8a0 [ 148.510501] e1000_clean+0x652/0xd10 [ 148.511523] net_rx_action+0x4ea/0xc20 [ 148.512566] __do_softirq+0x1f9/0x574 [ 148.513598] [ 148.514346] The buggy address belongs to the object at 8801e52eefc0 [ 148.514346] which belongs to the cache skbuff_head_cache of size 232 [ 148.517047] The buggy address is located 0 bytes inside of [ 148.517047] 232-byte region [8801e52eefc0, 8801e52ef0a8) [ 148.519549] The buggy address belongs to the page: [ 148.520726] page:ea000794bb00 count:1 mapcount:0 mapping:880106f4dfc0 index:0x8801e52ee840 compound_mapcount: 0 [ 148.524325] flags: 0x17c0008100(slab|head) [ 148.525481] raw: 0017c0008100 880106b938d0 880106b938d0 880106f4dfc0 [ 148.527503] raw: 8801e52ee840 00190011 0001 [ 148.529547] page dumped because: kasan: bad access detected Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.") Signed-off-by: Prashant Bhole Reported-by: Tyler Hicks --- net/core/dev.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index d13cddcac41f..08c41941f912 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5169,7 +5169,8 @@ static void __napi_gro_flush_chain(struct napi_struct *napi, u32 index, list_for_each_entry_safe_reverse(skb, p, head, list) { if (flush_old && NAPI_GRO_CB(skb)->age == jiffies) return; - list_del_init(>list); + list_del(>list); + skb->
[PATCH net-next] rtnetlink: Fix null-ptr-deref in rtnl_newlink
In rtnl_newlink(), NULL check is performed on m_ops however member of ops is accessed. Fixed by accessing member of m_ops instead of ops. [ 345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110 [ 345.432629] Read of size 4 at addr 0088 by task ip/986 [ 345.432629] [ 345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9 [ 345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 345.432629] Call Trace: [ 345.432629] dump_stack+0xc6/0x150 [ 345.432629] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 345.432629] ? kasan_report+0xb4/0x410 [ 345.432629] kasan_report.cold.4+0x8f/0x91 [ 345.432629] ? rtnl_newlink+0x400/0x1110 [ 345.432629] rtnl_newlink+0x400/0x1110 [...] Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage") Signed-off-by: Prashant Bhole --- net/core/rtnetlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 8ca49a0e13fb..9a1ba2015ad8 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2936,7 +2936,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, } if (m_ops) { - if (ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE) + if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE) return -EINVAL; if (m_ops->slave_maxtype && -- 2.17.0
[PATCH bpf-next v4 5/5] selftests/bpf: test_sockmap, print additional test options
Print values of test options like apply, cork, start, end so that individual failed tests can be identified for manual run Acked-by: John Fastabend Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 28 +++--- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 2bc70e1ab2eb..05c8cb71724a 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -876,6 +876,8 @@ static char *test_to_str(int test) #define OPTSTRING 60 static void test_options(char *options) { + char tstr[OPTSTRING]; + memset(options, 0, OPTSTRING); if (txmsg_pass) @@ -888,14 +890,22 @@ static void test_options(char *options) strncat(options, "redir_noisy,", OPTSTRING); if (txmsg_drop) strncat(options, "drop,", OPTSTRING); - if (txmsg_apply) - strncat(options, "apply,", OPTSTRING); - if (txmsg_cork) - strncat(options, "cork,", OPTSTRING); - if (txmsg_start) - strncat(options, "start,", OPTSTRING); - if (txmsg_end) - strncat(options, "end,", OPTSTRING); + if (txmsg_apply) { + snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_cork) { + snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_start) { + snprintf(tstr, OPTSTRING, "start %d,", txmsg_start); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_end) { + snprintf(tstr, OPTSTRING, "end %d,", txmsg_end); + strncat(options, tstr, OPTSTRING); + } if (txmsg_ingress) strncat(options, "ingress,", OPTSTRING); if (txmsg_skb) @@ -904,7 +914,7 @@ static void test_options(char *options) static int __test_exec(int cgrp, int test, struct sockmap_options *opt) { - char *options = calloc(60, sizeof(char)); + char *options = calloc(OPTSTRING, sizeof(char)); int err; if (test == SENDPAGE) -- 2.17.0
[PATCH bpf-next v4 4/5] selftests/bpf: test_sockmap, fix data verification
When data verification is enabled, some tests fail because verification is done incorrectly. Following changes fix it. - Identify the size of data block to be verified - Reset verification counter when data block size is reached - Fixed the value printed in case of verfication failure Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 5cd0550af595..2bc70e1ab2eb 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, int fd_flags = O_NONBLOCK; struct timeval timeout; float total_bytes; + int bytes_cnt = 0; + int chunk_sz; fd_set w; + if (opt->sendpage) + chunk_sz = iov_length * cnt; + else + chunk_sz = iov_length * iov_count; + fcntl(fd, fd_flags); total_bytes = (float)iov_count * (float)iov_length * (float)cnt; err = clock_gettime(CLOCK_MONOTONIC, >start); @@ -393,9 +400,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, errno = -EIO; fprintf(stderr, "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n", - i, j, d[j], k - 1, d[j+1], k + 1); + i, j, d[j], k - 1, d[j+1], k); goto out_errno; } + bytes_cnt++; + if (bytes_cnt == chunk_sz) { + k = 0; + bytes_cnt = 0; + } recv--; } } -- 2.17.0
[PATCH bpf-next v4 1/5] selftests/bpf: test_sockmap, check test failure
Test failures are not identified because exit code of RX/TX threads is not checked. Also threads are not returning correct exit code. - Return exit code from threads depending on test execution status - In main thread, check the exit code of RX/TX threads - Skip error checking for corked tests as they are expected to timeout Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 27 +- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index eb17fae458e6..7b2008a144cb 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt) struct msg_stats s = {0}; int iov_count = opt->iov_count; int iov_buf = opt->iov_length; + int rx_status, tx_status; int cnt = opt->rate; - int status; errno = 0; @@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt) rxpid = fork(); if (rxpid == 0) { if (opt->drop_expected) - exit(1); + exit(0); if (opt->sendpage) iov_count = 1; @@ -463,7 +463,9 @@ static int sendmsg_test(struct sockmap_options *opt) "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + if (err && txmsg_cork) + err = 0; + exit(err ? 1 : 0); } else if (rxpid == -1) { perror("msg_loop_rx: "); return errno; @@ -491,14 +493,27 @@ static int sendmsg_test(struct sockmap_options *opt) "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (txpid == -1) { perror("msg_loop_tx: "); return errno; } - assert(waitpid(rxpid, , 0) == rxpid); - assert(waitpid(txpid, , 0) == txpid); + assert(waitpid(rxpid, _status, 0) == rxpid); + assert(waitpid(txpid, _status, 0) == txpid); + if (WIFEXITED(rx_status)) { + err = WEXITSTATUS(rx_status); + if (err) { + fprintf(stderr, "rx thread exited with err %d. ", err); + goto out; + } + } + if (WIFEXITED(tx_status)) { + err = WEXITSTATUS(tx_status); + if (err) + fprintf(stderr, "tx thread exited with err %d. ", err); + } +out: return err; } -- 2.17.0
[PATCH bpf-next v4 0/5] fix test_sockmap
test_sockmap was originally written only to exercise kernel code paths, so there was no strict checking of errors. When the code was modified to run as selftests, due to lack of error handling it was not able to detect test failures. In order to improve, this series fixes error handling, test run time and data verification. Also slightly improved test output by printing parameter values (cork, apply, start, end) so that parameters for all tests are displayed. Changes in v4: - patch1: Ignore RX timoute error only for corked tests - patch3: Setting different timeout for corked tests and reduce run time by reducing number of iterations in some tests Changes in v3: - Skipped error checking for corked tests Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, timing improvements selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 87 +- 1 file changed, 67 insertions(+), 20 deletions(-) -- 2.17.0
[PATCH bpf-next v4 3/5] selftests/bpf: test_sockmap, timing improvements
Currently 10us delay is too low for many tests to succeed. It needs to be increased. Also, many corked tests are expected to hit rx timeout irrespective of timeout value. - This patch sets 1000usec timeout value for corked tests because less than that causes broken-pipe error in tx thread. Also sets 1 second timeout for all other tests because less than that results in RX timeout - tests with apply=1 and higher number of iterations were taking lot of time. This patch reduces test run time by reducing iterations. real0m12.968s user0m0.219s sys 0m14.337s Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 7f9ca79aadbc..5cd0550af595 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -345,8 +345,13 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, if (err < 0) perror("recv start time: "); while (s->bytes_recvd < total_bytes) { - timeout.tv_sec = 0; - timeout.tv_usec = 10; + if (txmsg_cork) { + timeout.tv_sec = 0; + timeout.tv_usec = 1000; + } else { + timeout.tv_sec = 1; + timeout.tv_usec = 0; + } /* FD sets */ FD_ZERO(); @@ -1025,14 +1030,14 @@ static int test_send(struct sockmap_options *opt, int cgrp) opt->iov_length = 1; opt->iov_count = 1; - opt->rate = 1024; + opt->rate = 512; err = test_exec(cgrp, opt); if (err) goto out; opt->iov_length = 256; opt->iov_count = 1024; - opt->rate = 10; + opt->rate = 2; err = test_exec(cgrp, opt); if (err) goto out; -- 2.17.0
[PATCH bpf-next v4 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
In case of selftest mode, temporary cgroup environment is created but cgroup is not joined. It causes test failures. Fixed by joining the cgroup Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 7b2008a144cb..7f9ca79aadbc 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -1344,6 +1344,11 @@ static int __test_suite(char *bpf_file) return cg_fd; } + if (join_cgroup(CG_PATH)) { + fprintf(stderr, "ERROR: failed to join cgroup\n"); + return -EINVAL; + } + /* Tests basic commands and APIs with range of iov values */ txmsg_start = txmsg_end = 0; err = test_txmsg(cg_fd); -- 2.17.0
Re: [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
On 5/31/2018 4:59 AM, John Fastabend wrote: On 05/30/2018 12:29 PM, Alexei Starovoitov wrote: On Wed, May 30, 2018 at 02:56:09PM +0900, Prashant Bhole wrote: In order to reduce runtime of tests, recently timout for select() call was reduced from 1sec to 10usec. This was causing many tests failures. It was caught with failure handling commits in this series. Restoring the timeout from 10usec to 1sec Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 64f9e25c451f..9d01f5c2abe2 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, if (err < 0) perror("recv start time: "); while (s->bytes_recvd < total_bytes) { - timeout.tv_sec = 0; - timeout.tv_usec = 10; + timeout.tv_sec = 1; + timeout.tv_usec = 0; I've applied the set, but had to revert it, since it takes too long. real1m40.124s user0m0.375s sys 0m14.521s Dang, I thought it would be a bit longer but not minutes. Myself and Daniel run the test semi-manually when we apply patches.> Adding 2 extra minutes of wait time is unnecessary. Yep. Especially since most of it is idle time. Please find a way to fix tests differently. btw I don't see any failures today. Not sure what is being fixed by incresing a timeout. Calling these fixes is a bit much, they are primarily improvements. The background is, when I originally wrote the tests my goal was to exercise the kernel code paths. Because of this I didn't really care if the tests actually sent/recv all bytes in the test. (I have long running tests using netperf/wrk/apached/etc. for that) But, the manual tests do have an option to verify the data if specified. The 'verify' option is a bit fragile in that with the right tests (e.g. drop) or the certain options (e.g. cork) it can fail which is expected. What Prashant added was support to actually verify the data correctly. And also fix a few cgroup handling and some pretty printing as well. He noticed the low timeout causing issue in these cases though so increased it. @Prashant, how about increasing this less dramatically because now all cork tests are going to stall for 1s unless perfectly aligned. How about 100us? Or even better we can conditionally set it based on if tx_cork is set. If tx_cork is set use 1us otherwise use 200us or something. (1s is really to high in any cases for lo) Also capturing some of the above in the cover letter would help folks understand the context a bit better. I did trial and error for timeout values. Currently 1000us for corked tests and 1 sec for other tests works fine. I observed broken-pipe error at tx side when timeout was < 1000us. Also tests with apply=1 and higher number of iterations were taking time, so reducing iterations reduces the test run time drastically. real0m12.968s user0m0.219s sys 0m14.337s Also I will try to explain background in the cover letter of next series. -Prashant
[PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
In order to reduce runtime of tests, recently timout for select() call was reduced from 1sec to 10usec. This was causing many tests failures. It was caught with failure handling commits in this series. Restoring the timeout from 10usec to 1sec Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 64f9e25c451f..9d01f5c2abe2 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, if (err < 0) perror("recv start time: "); while (s->bytes_recvd < total_bytes) { - timeout.tv_sec = 0; - timeout.tv_usec = 10; + timeout.tv_sec = 1; + timeout.tv_usec = 0; /* FD sets */ FD_ZERO(); -- 2.17.0
[PATCH bpf v3 4/5] selftests/bpf: test_sockmap, fix data verification
When data verification is enabled, some tests fail because verification is done incorrectly. Following changes fix it. - Identify the size of data block to be verified - Reset verification counter when data block size is reached - Fixed the value printed in case of verfication failure Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 9d01f5c2abe2..664f268dc02a 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, int fd_flags = O_NONBLOCK; struct timeval timeout; float total_bytes; + int bytes_cnt = 0; + int chunk_sz; fd_set w; + if (opt->sendpage) + chunk_sz = iov_length * cnt; + else + chunk_sz = iov_length * iov_count; + fcntl(fd, fd_flags); total_bytes = (float)iov_count * (float)iov_length * (float)cnt; err = clock_gettime(CLOCK_MONOTONIC, >start); @@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, errno = -EIO; fprintf(stderr, "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n", - i, j, d[j], k - 1, d[j+1], k + 1); + i, j, d[j], k - 1, d[j+1], k); goto out_errno; } + bytes_cnt++; + if (bytes_cnt == chunk_sz) { + k = 0; + bytes_cnt = 0; + } recv--; } } -- 2.17.0
[PATCH bpf v3 5/5] selftests/bpf: test_sockmap, print additional test options
Print values of test options like apply, cork, start, end so that individual failed tests can be identified for manual run Acked-by: John Fastabend Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 28 +++--- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 664f268dc02a..637c6585ff80 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -869,6 +869,8 @@ static char *test_to_str(int test) #define OPTSTRING 60 static void test_options(char *options) { + char tstr[OPTSTRING]; + memset(options, 0, OPTSTRING); if (txmsg_pass) @@ -881,14 +883,22 @@ static void test_options(char *options) strncat(options, "redir_noisy,", OPTSTRING); if (txmsg_drop) strncat(options, "drop,", OPTSTRING); - if (txmsg_apply) - strncat(options, "apply,", OPTSTRING); - if (txmsg_cork) - strncat(options, "cork,", OPTSTRING); - if (txmsg_start) - strncat(options, "start,", OPTSTRING); - if (txmsg_end) - strncat(options, "end,", OPTSTRING); + if (txmsg_apply) { + snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_cork) { + snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_start) { + snprintf(tstr, OPTSTRING, "start %d,", txmsg_start); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_end) { + snprintf(tstr, OPTSTRING, "end %d,", txmsg_end); + strncat(options, tstr, OPTSTRING); + } if (txmsg_ingress) strncat(options, "ingress,", OPTSTRING); if (txmsg_skb) @@ -897,7 +907,7 @@ static void test_options(char *options) static int __test_exec(int cgrp, int test, struct sockmap_options *opt) { - char *options = calloc(60, sizeof(char)); + char *options = calloc(OPTSTRING, sizeof(char)); int err; if (test == SENDPAGE) -- 2.17.0
[PATCH bpf v3 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
In case of selftest mode, temporary cgroup environment is created but cgroup is not joined. It causes test failures. Fixed by joining the cgroup Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 01bc9c6745e8..64f9e25c451f 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file) return cg_fd; } + if (join_cgroup(CG_PATH)) { + fprintf(stderr, "ERROR: failed to join cgroup\n"); + return -EINVAL; + } + /* Tests basic commands and APIs with range of iov values */ txmsg_start = txmsg_end = 0; err = test_txmsg(cg_fd); -- 2.17.0
[PATCH bpf v3 0/5] fix test_sockmap
This series fixes error handling, timeout and data verification in test_sockmap. Previously it was not able to detect failure/timeout in RX/TX thread because error was not notified to the main thread. Also slightly improved test output by printing parameter values (cork, apply, start, end) so that parameters for all tests are displayed. Changes in v3: - Skipped error checking for corked tests Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +- 1 file changed, 58 insertions(+), 18 deletions(-) -- 2.17.0
[PATCH bpf v3 1/5] selftests/bpf: test_sockmap, check test failure
Test failures are not identified because exit code of RX/TX threads is not checked. Also threads are not returning correct exit code. - Return exit code from threads depending on test execution status - In main thread, check the exit code of RX/TX threads - Skip error checking for corked tests as they are expected to timeout Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole --- tools/testing/selftests/bpf/test_sockmap.c | 25 -- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index eb17fae458e6..01bc9c6745e8 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt) struct msg_stats s = {0}; int iov_count = opt->iov_count; int iov_buf = opt->iov_length; + int rx_status, tx_status; int cnt = opt->rate; - int status; errno = 0; @@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt) rxpid = fork(); if (rxpid == 0) { if (opt->drop_expected) - exit(1); + exit(0); if (opt->sendpage) iov_count = 1; @@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt) "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (rxpid == -1) { perror("msg_loop_rx: "); return errno; @@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt) "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (txpid == -1) { perror("msg_loop_tx: "); return errno; } - assert(waitpid(rxpid, , 0) == rxpid); - assert(waitpid(txpid, , 0) == txpid); + assert(waitpid(rxpid, _status, 0) == rxpid); + assert(waitpid(txpid, _status, 0) == txpid); + if (WIFEXITED(rx_status)) { + err = WEXITSTATUS(rx_status); + if (err && !txmsg_cork) { + fprintf(stderr, "rx thread exited with err %d. ", err); + goto out; + } + } + if (WIFEXITED(tx_status)) { + err = WEXITSTATUS(tx_status); + if (err) + fprintf(stderr, "tx thread exited with err %d. ", err); + } +out: return err; } -- 2.17.0
Re: [PATCH bpf v2 0/5] fix test_sockmap
On 5/30/2018 2:12 PM, John Fastabend wrote: On 05/29/2018 05:44 PM, Prashant Bhole wrote: On 5/30/2018 12:48 AM, John Fastabend wrote: On 05/27/2018 09:37 PM, Prashant Bhole wrote: This series fixes error handling, timeout and data verification in test_sockmap. Previously it was not able to detect failure/timeout in RX/TX thread because error was not notified to the main thread. Also slightly improved test output by printing parameter values (cork, apply, start, end) so that parameters for all tests are displayed. Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +- 1 file changed, 58 insertions(+), 18 deletions(-) After first patch "check test failure" how do we handle the case where test is known to cause timeouts because we are specifically testing these cases. This is the 'cork' parameter we discussed in the last series. It looks like with this series the test may still throw an error? Sorry. In your comment in last series, did you mean to skip error checking only for all cork tests (for now)? -Prashant Hi, After this is applied are any errors returned from test_sockmap? When I read the first patch it looked like timeouts from the cork tests may result in errors "FAILED" tests. If this is the case then yes we need skip error checking on all tests or just the corked tests. Yes errors returned after applying this series. I will skip error checking on just corked tests. -Prashant
Re: [PATCH bpf v2 0/5] fix test_sockmap
On 5/30/2018 12:48 AM, John Fastabend wrote: On 05/27/2018 09:37 PM, Prashant Bhole wrote: This series fixes error handling, timeout and data verification in test_sockmap. Previously it was not able to detect failure/timeout in RX/TX thread because error was not notified to the main thread. Also slightly improved test output by printing parameter values (cork, apply, start, end) so that parameters for all tests are displayed. Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +- 1 file changed, 58 insertions(+), 18 deletions(-) After first patch "check test failure" how do we handle the case where test is known to cause timeouts because we are specifically testing these cases. This is the 'cork' parameter we discussed in the last series. It looks like with this series the test may still throw an error? Sorry. In your comment in last series, did you mean to skip error checking only for all cork tests (for now)? -Prashant
[PATCH net-next] netfilter: fix null-ptr-deref in nf_nat_decode_session
Add null check for nat_hook in nf_nat_decode_session() [ 195.648098] UBSAN: Undefined behaviour in ./include/linux/netfilter.h:348:14 [ 195.651366] BUG: KASAN: null-ptr-deref in __xfrm_policy_check+0x208/0x1d70 [ 195.653888] member access within null pointer of type 'struct nf_nat_hook' [ 195.653896] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.17.0-rc6+ #5 [ 195.656320] Read of size 8 at addr 0008 by task ping/2469 [ 195.658715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 195.658721] Call Trace: [ 195.661087] [ 195.669341] [ 195.670574] dump_stack+0xc6/0x150 [ 195.672156] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 195.674121] ? ubsan_prologue+0x31/0x92 [ 195.676546] ubsan_epilogue+0x9/0x49 [ 195.678159] handle_null_ptr_deref+0x11a/0x130 [ 195.679800] ? sprint_OID+0x1a0/0x1a0 [ 195.681322] __ubsan_handle_type_mismatch_v1+0xd5/0x11d [ 195.683146] ? ubsan_prologue+0x92/0x92 [ 195.684642] __xfrm_policy_check+0x18ef/0x1d70 [ 195.686294] ? rt_cache_valid+0x118/0x180 [ 195.687804] ? __xfrm_route_forward+0x410/0x410 [ 195.689463] ? fib_multipath_hash+0x700/0x700 [ 195.691109] ? kvm_sched_clock_read+0x23/0x40 [ 195.692805] ? pvclock_clocksource_read+0xf6/0x280 [ 195.694409] ? graph_lock+0xa0/0xa0 [ 195.695824] ? pvclock_clocksource_read+0xf6/0x280 [ 195.697508] ? pvclock_read_flags+0x80/0x80 [ 195.698981] ? kvm_sched_clock_read+0x23/0x40 [ 195.700347] ? sched_clock+0x5/0x10 [ 195.701525] ? sched_clock_cpu+0x18/0x1a0 [ 195.702846] tcp_v4_rcv+0x1d32/0x1de0 [ 195.704115] ? lock_repin_lock+0x70/0x270 [ 195.707072] ? pvclock_read_flags+0x80/0x80 [ 195.709302] ? tcp_v4_early_demux+0x4b0/0x4b0 [ 195.711833] ? lock_acquire+0x195/0x380 [ 195.714222] ? ip_local_deliver_finish+0xfc/0x770 [ 195.716967] ? raw_rcv+0x2b0/0x2b0 [ 195.718856] ? lock_release+0xa00/0xa00 [ 195.720938] ip_local_deliver_finish+0x1b9/0x770 [...] Fixes: 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- include/linux/netfilter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 04551af2ff23..dd2052f0efb7 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -345,7 +345,7 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) rcu_read_lock(); nat_hook = rcu_dereference(nf_nat_hook); - if (nat_hook->decode_session) + if (nat_hook && nat_hook->decode_session) nat_hook->decode_session(skb, fl); rcu_read_unlock(); #endif -- 2.17.0
[PATCH bpf v2 5/5] selftests/bpf: test_sockmap, print additional test options
Print values of test options like apply, cork, start, end so that individual failed tests can be identified for manual run Acked-by: John Fastabend <john.fastab...@gmail.com> Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 28 +++--- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index e8faf4596dac..7970d48c4acb 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -869,6 +869,8 @@ static char *test_to_str(int test) #define OPTSTRING 60 static void test_options(char *options) { + char tstr[OPTSTRING]; + memset(options, 0, OPTSTRING); if (txmsg_pass) @@ -881,14 +883,22 @@ static void test_options(char *options) strncat(options, "redir_noisy,", OPTSTRING); if (txmsg_drop) strncat(options, "drop,", OPTSTRING); - if (txmsg_apply) - strncat(options, "apply,", OPTSTRING); - if (txmsg_cork) - strncat(options, "cork,", OPTSTRING); - if (txmsg_start) - strncat(options, "start,", OPTSTRING); - if (txmsg_end) - strncat(options, "end,", OPTSTRING); + if (txmsg_apply) { + snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_cork) { + snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_start) { + snprintf(tstr, OPTSTRING, "start %d,", txmsg_start); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_end) { + snprintf(tstr, OPTSTRING, "end %d,", txmsg_end); + strncat(options, tstr, OPTSTRING); + } if (txmsg_ingress) strncat(options, "ingress,", OPTSTRING); if (txmsg_skb) @@ -897,7 +907,7 @@ static void test_options(char *options) static int __test_exec(int cgrp, int test, struct sockmap_options *opt) { - char *options = calloc(60, sizeof(char)); + char *options = calloc(OPTSTRING, sizeof(char)); int err; if (test == SENDPAGE) -- 2.17.0
[PATCH bpf v2 4/5] selftests/bpf: test_sockmap, fix data verification
When data verification is enabled, some tests fail because verification is done incorrectly. Following changes fix it. - Identify the size of data block to be verified - Reset verification counter when data block size is reached - Fixed the value printed in case of verfication failure Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend <john.fastab...@gmail.com> Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index f79397513362..e8faf4596dac 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, int fd_flags = O_NONBLOCK; struct timeval timeout; float total_bytes; + int bytes_cnt = 0; + int chunk_sz; fd_set w; + if (opt->sendpage) + chunk_sz = iov_length * cnt; + else + chunk_sz = iov_length * iov_count; + fcntl(fd, fd_flags); total_bytes = (float)iov_count * (float)iov_length * (float)cnt; err = clock_gettime(CLOCK_MONOTONIC, >start); @@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, errno = -EIO; fprintf(stderr, "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n", - i, j, d[j], k - 1, d[j+1], k + 1); + i, j, d[j], k - 1, d[j+1], k); goto out_errno; } + bytes_cnt++; + if (bytes_cnt == chunk_sz) { + k = 0; + bytes_cnt = 0; + } recv--; } } -- 2.17.0
[PATCH bpf v2 3/5] selftests/bpf: test_sockmap, fix test timeout
In order to reduce runtime of tests, recently timout for select() call was reduced from 1sec to 10usec. This was causing many tests failures. It was caught with failure handling commits in this series. Restoring the timeout from 10usec to 1sec Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 8a81ea0e9fb6..f79397513362 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, if (err < 0) perror("recv start time: "); while (s->bytes_recvd < total_bytes) { - timeout.tv_sec = 0; - timeout.tv_usec = 10; + timeout.tv_sec = 1; + timeout.tv_usec = 0; /* FD sets */ FD_ZERO(); -- 2.17.0
[PATCH bpf v2 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
In case of selftest mode, temporary cgroup environment is created but cgroup is not joined. It causes test failures. Fixed by joining the cgroup Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Acked-by: John Fastabend <john.fastab...@gmail.com> Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 34feb74c95c4..8a81ea0e9fb6 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file) return cg_fd; } + if (join_cgroup(CG_PATH)) { + fprintf(stderr, "ERROR: failed to join cgroup\n"); + return -EINVAL; + } + /* Tests basic commands and APIs with range of iov values */ txmsg_start = txmsg_end = 0; err = test_txmsg(cg_fd); -- 2.17.0
[PATCH bpf v2 1/5] selftests/bpf: test_sockmap, check test failure
Test failures are not identified because exit code of RX/TX threads is not checked. Also threads are not returning correct exit code. - Return exit code from threads depending on test execution status - In main thread, check the exit code of RX/TX threads Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 25 -- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index eb17fae458e6..34feb74c95c4 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt) struct msg_stats s = {0}; int iov_count = opt->iov_count; int iov_buf = opt->iov_length; + int rx_status, tx_status; int cnt = opt->rate; - int status; errno = 0; @@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt) rxpid = fork(); if (rxpid == 0) { if (opt->drop_expected) - exit(1); + exit(0); if (opt->sendpage) iov_count = 1; @@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt) "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (rxpid == -1) { perror("msg_loop_rx: "); return errno; @@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt) "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (txpid == -1) { perror("msg_loop_tx: "); return errno; } - assert(waitpid(rxpid, , 0) == rxpid); - assert(waitpid(txpid, , 0) == txpid); + assert(waitpid(rxpid, _status, 0) == rxpid); + assert(waitpid(txpid, _status, 0) == txpid); + if (WIFEXITED(rx_status)) { + err = WEXITSTATUS(rx_status); + if (err) { + fprintf(stderr, "rx thread exited with err %d. ", err); + goto out; + } + } + if (WIFEXITED(tx_status)) { + err = WEXITSTATUS(tx_status); + if (err) + fprintf(stderr, "tx thread exited with err %d. ", err); + } +out: return err; } -- 2.17.0
[PATCH bpf v2 0/5] fix test_sockmap
This series fixes error handling, timeout and data verification in test_sockmap. Previously it was not able to detect failure/timeout in RX/TX thread because error was not notified to the main thread. Also slightly improved test output by printing parameter values (cork, apply, start, end) so that parameters for all tests are displayed. Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +- 1 file changed, 58 insertions(+), 18 deletions(-) -- 2.17.0
Re: [PATCH bpf-next 0/5] fix test_sockmap
On 5/25/2018 11:01 PM, John Fastabend wrote: On 05/25/2018 01:28 AM, Prashant Bhole wrote: On 5/24/2018 1:58 PM, John Fastabend wrote: On 05/23/2018 09:47 PM, Prashant Bhole wrote: On 5/23/2018 6:44 PM, Prashant Bhole wrote: On 5/22/2018 2:08 AM, John Fastabend wrote: On 05/20/2018 10:13 PM, Prashant Bhole wrote: On 5/19/2018 1:42 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Great, this was on my list so thanks for taking care of it. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. This is expected. When 'cork' is set the sender should only xmit the data when 'cork' bytes are available. If the user doesn't provide the N bytes the data is cork'ed waiting for the bytes and if the socket is closed the state is cleaned up. What these tests are testing is the cleanup path when a user doesn't provide the N bytes. In practice this is used to validate headers and prevent users from sending partial headers. We want to keep these tests because they verify a tear-down path in the code. Ok. After your changes do these get reported as failures? If so we need to account for the above in the calculations. Yes, cork related test are reported as failures because of RX thread timeout. So with your above description, I think we need to differentiate cork tests with partial data and full data. In partial data test we can have something like "timeout_expected" flag. Any other way to fix it? Adding a flag seems reasonable to me. Lets do this for now. Also I plan to add more negative tests so we can either use the same flag or a new one for those cases as well. John, I worked on this for some time and noticed that the RX-timeout of tests with cork parameter is dependent on various parameters. So we can not set a flag like the way 'drop_expected' flag is set before executing the test. So I decided to write a function which judges all parameters before each test and decides whether a test with cork parameter will timeout or not. Then the conditions in the function became complicated. For example some tests fail if opt->rate < 17 (with some other conditions). Here is 17 is related to FRAGS_PER_SKB. Consider following two examples. I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/ ./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # RX timeout occurs ./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # Success! Ah yes this hits the buffer limit and flushes the queue. The kernel side doesn't know how to merge those specific sendpage requests so it gives each request its own buffer and when the limit is reached we flush it. Do we need to keep such tests? if yes, then I will continue with adding such conditions in the function. Yes, these tests are needed because they are testing the edge cases. These are probably the most important tests because my normal usage will catch any issues in the "good" cases its these types of things that can go unnoticed (at least for a short while) if we don't have specific tests for them. I tried but it is difficult to come up with a right set of conditions which lead to test failure. Agreed, it can be yes. How about adding your logic for all tests except "cork" cases. If there is a flag to set if the timeout is expected we can always manually set it in the test invocation. Might not be as nice as automatically learning the expected results but possibly easier than building some complicated logic to figure it out. Would you mind submitting your series again without the "cork" tests being tracked? And if you want add a bit to tell if the "cork" tests are going to timeout or not setting it per test manually. But I think your series can just omit the cork test for now and still be useful. Ok. I will submit the series again. Without any change in actual patches, but cover letter reorganized. Thanks. -Prashant
Re: [PATCH bpf-next 0/5] fix test_sockmap
On 5/24/2018 1:58 PM, John Fastabend wrote: On 05/23/2018 09:47 PM, Prashant Bhole wrote: On 5/23/2018 6:44 PM, Prashant Bhole wrote: On 5/22/2018 2:08 AM, John Fastabend wrote: On 05/20/2018 10:13 PM, Prashant Bhole wrote: On 5/19/2018 1:42 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Great, this was on my list so thanks for taking care of it. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. This is expected. When 'cork' is set the sender should only xmit the data when 'cork' bytes are available. If the user doesn't provide the N bytes the data is cork'ed waiting for the bytes and if the socket is closed the state is cleaned up. What these tests are testing is the cleanup path when a user doesn't provide the N bytes. In practice this is used to validate headers and prevent users from sending partial headers. We want to keep these tests because they verify a tear-down path in the code. Ok. After your changes do these get reported as failures? If so we need to account for the above in the calculations. Yes, cork related test are reported as failures because of RX thread timeout. So with your above description, I think we need to differentiate cork tests with partial data and full data. In partial data test we can have something like "timeout_expected" flag. Any other way to fix it? Adding a flag seems reasonable to me. Lets do this for now. Also I plan to add more negative tests so we can either use the same flag or a new one for those cases as well. John, I worked on this for some time and noticed that the RX-timeout of tests with cork parameter is dependent on various parameters. So we can not set a flag like the way 'drop_expected' flag is set before executing the test. So I decided to write a function which judges all parameters before each test and decides whether a test with cork parameter will timeout or not. Then the conditions in the function became complicated. For example some tests fail if opt->rate < 17 (with some other conditions). Here is 17 is related to FRAGS_PER_SKB. Consider following two examples. I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/ ./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # RX timeout occurs ./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # Success! Ah yes this hits the buffer limit and flushes the queue. The kernel side doesn't know how to merge those specific sendpage requests so it gives each request its own buffer and when the limit is reached we flush it. Do we need to keep such tests? if yes, then I will continue with adding such conditions in the function. Yes, these tests are needed because they are testing the edge cases. These are probably the most important tests because my normal usage will catch any issues in the "good" cases its these types of things that can go unnoticed (at least for a short while) if we don't have specific tests for them. I tried but it is difficult to come up with a right set of conditions which lead to test failure. -Prashant Thanks for doing this. John
Re: [PATCH bpf-next 0/5] fix test_sockmap
On 5/23/2018 6:44 PM, Prashant Bhole wrote: On 5/22/2018 2:08 AM, John Fastabend wrote: On 05/20/2018 10:13 PM, Prashant Bhole wrote: On 5/19/2018 1:42 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Great, this was on my list so thanks for taking care of it. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. This is expected. When 'cork' is set the sender should only xmit the data when 'cork' bytes are available. If the user doesn't provide the N bytes the data is cork'ed waiting for the bytes and if the socket is closed the state is cleaned up. What these tests are testing is the cleanup path when a user doesn't provide the N bytes. In practice this is used to validate headers and prevent users from sending partial headers. We want to keep these tests because they verify a tear-down path in the code. Ok. After your changes do these get reported as failures? If so we need to account for the above in the calculations. Yes, cork related test are reported as failures because of RX thread timeout. So with your above description, I think we need to differentiate cork tests with partial data and full data. In partial data test we can have something like "timeout_expected" flag. Any other way to fix it? Adding a flag seems reasonable to me. Lets do this for now. Also I plan to add more negative tests so we can either use the same flag or a new one for those cases as well. John, I worked on this for some time and noticed that the RX-timeout of tests with cork parameter is dependent on various parameters. So we can not set a flag like the way 'drop_expected' flag is set before executing the test. So I decided to write a function which judges all parameters before each test and decides whether a test with cork parameter will timeout or not. Then the conditions in the function became complicated. For example some tests fail if opt->rate < 17 (with some other conditions). Here is 17 is related to FRAGS_PER_SKB. Consider following two examples. I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/ ./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # RX timeout occurs ./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # Success! Do we need to keep such tests? if yes, then I will continue with adding such conditions in the function. -Prashant
Re: [PATCH bpf-next 0/5] fix test_sockmap
On 5/22/2018 2:08 AM, John Fastabend wrote: On 05/20/2018 10:13 PM, Prashant Bhole wrote: On 5/19/2018 1:42 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Great, this was on my list so thanks for taking care of it. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. This is expected. When 'cork' is set the sender should only xmit the data when 'cork' bytes are available. If the user doesn't provide the N bytes the data is cork'ed waiting for the bytes and if the socket is closed the state is cleaned up. What these tests are testing is the cleanup path when a user doesn't provide the N bytes. In practice this is used to validate headers and prevent users from sending partial headers. We want to keep these tests because they verify a tear-down path in the code. Ok. After your changes do these get reported as failures? If so we need to account for the above in the calculations. Yes, cork related test are reported as failures because of RX thread timeout. So with your above description, I think we need to differentiate cork tests with partial data and full data. In partial data test we can have something like "timeout_expected" flag. Any other way to fix it? Adding a flag seems reasonable to me. Lets do this for now. Also I plan to add more negative tests so we can either use the same flag or a new one for those cases as well. John, I worked on this for some time and noticed that the RX-timeout of tests with cork parameter is dependent on various parameters. So we can not set a flag like the way 'drop_expected' flag is set before executing the test. So I decided to write a function which judges all parameters before each test and decides whether a test with cork parameter will timeout or not. Then the conditions in the function became complicated. For example some tests fail if opt->rate < 17 (with some other conditions). Here is 17 is related to FRAGS_PER_SKB. Consider following two examples. ./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # RX timeout occurs ./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage --txmsg --txmsg_cork 1024 # Success! Do we need to keep such tests? if yes, then I will continue with adding such conditions in the function. -Prashant
Re: [PATCH bpf-next 0/5] fix test_sockmap
On 5/19/2018 1:54 AM, Shuah Khan wrote: On 05/18/2018 01:17 AM, Prashant Bhole wrote: This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +++--- 1 file changed, 58 insertions(+), 18 deletions(-) Please remember to cc linux-kselftest mailing list as well. I would like to see all the test patches cc'ed to it. Linaro and other test users watch the kselftest mailing list. I also have patchwork project now to manage the patch volume. > I am okay with patches going through net/bpf trees - there are always test dependencies on net/bpf trees. Ok. I will remember this. Thanks. -Prashant
Re: [PATCH bpf-next 0/5] fix test_sockmap
On 5/19/2018 1:42 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Great, this was on my list so thanks for taking care of it. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. This is expected. When 'cork' is set the sender should only xmit the data when 'cork' bytes are available. If the user doesn't provide the N bytes the data is cork'ed waiting for the bytes and if the socket is closed the state is cleaned up. What these tests are testing is the cleanup path when a user doesn't provide the N bytes. In practice this is used to validate headers and prevent users from sending partial headers. We want to keep these tests because they verify a tear-down path in the code. Ok. After your changes do these get reported as failures? If so we need to account for the above in the calculations. Yes, cork related test are reported as failures because of RX thread timeout. So with your above description, I think we need to differentiate cork tests with partial data and full data. In partial data test we can have something like "timeout_expected" flag. Any other way to fix it? -Prashant Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +++--- 1 file changed, 58 insertions(+), 18 deletions(-)
Re: [PATCH bpf-next 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
On 5/19/2018 1:45 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: In case of selftest mode, temporary cgroup environment is created but cgroup is not joined. It causes test failures. Fixed by joining the cgroup Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> -- Thanks, LGTM. Should this be the first patch in the series though? I wonder if after patch 1 if you would get failures without this patch. Patch 1 fixes selftest mode as well as manual mode. This patch 2 is specifically for selftest mode, hence the sequence. - Prashant
Re: [PATCH bpf-next 3/5] selftests/bpf: test_sockmap, fix test timeout
On 5/19/2018 1:47 AM, John Fastabend wrote: On 05/18/2018 12:17 AM, Prashant Bhole wrote: In order to reduce runtime of tests, recently timout for select() call was reduced from 1sec to 10usec. This was causing many tests failures. It was caught with failure handling commits in this series. Restoring the timeout from 10usec to 1sec Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- Whats the runtime for the entire test suite after this? I agree I was probably to aggressive in setting this but on the other hand I was trying to avoid letting the test run for minutes. Currently any failure stops further tests. So I made local change to force cork=0. With this change, it takes around 60 seconds for all 648 tests. It will change after fixing cork tests. -Prashant
[PATCH bpf-next 5/5] selftests/bpf: test_sockmap, print additional test options
Print values of test options like apply, cork, start, end so that individual failed tests can be identified for manual run Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 28 +++- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index e8faf4596dac..7970d48c4acb 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -869,6 +869,8 @@ static char *test_to_str(int test) #define OPTSTRING 60 static void test_options(char *options) { + char tstr[OPTSTRING]; + memset(options, 0, OPTSTRING); if (txmsg_pass) @@ -881,14 +883,22 @@ static void test_options(char *options) strncat(options, "redir_noisy,", OPTSTRING); if (txmsg_drop) strncat(options, "drop,", OPTSTRING); - if (txmsg_apply) - strncat(options, "apply,", OPTSTRING); - if (txmsg_cork) - strncat(options, "cork,", OPTSTRING); - if (txmsg_start) - strncat(options, "start,", OPTSTRING); - if (txmsg_end) - strncat(options, "end,", OPTSTRING); + if (txmsg_apply) { + snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_cork) { + snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_start) { + snprintf(tstr, OPTSTRING, "start %d,", txmsg_start); + strncat(options, tstr, OPTSTRING); + } + if (txmsg_end) { + snprintf(tstr, OPTSTRING, "end %d,", txmsg_end); + strncat(options, tstr, OPTSTRING); + } if (txmsg_ingress) strncat(options, "ingress,", OPTSTRING); if (txmsg_skb) @@ -897,7 +907,7 @@ static void test_options(char *options) static int __test_exec(int cgrp, int test, struct sockmap_options *opt) { - char *options = calloc(60, sizeof(char)); + char *options = calloc(OPTSTRING, sizeof(char)); int err; if (test == SENDPAGE) -- 2.14.3
[PATCH bpf-next 4/5] selftests/bpf: test_sockmap, fix data verification
When data verification is enabled, some tests fail because verification is done incorrectly. Following changes fix it. - Identify the size of data block to be verified - Reset verification counter when data block size is reached - Fixed the value printed in case of verification failure Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index f79397513362..e8faf4596dac 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, int fd_flags = O_NONBLOCK; struct timeval timeout; float total_bytes; + int bytes_cnt = 0; + int chunk_sz; fd_set w; + if (opt->sendpage) + chunk_sz = iov_length * cnt; + else + chunk_sz = iov_length * iov_count; + fcntl(fd, fd_flags); total_bytes = (float)iov_count * (float)iov_length * (float)cnt; err = clock_gettime(CLOCK_MONOTONIC, >start); @@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, errno = -EIO; fprintf(stderr, "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n", - i, j, d[j], k - 1, d[j+1], k + 1); + i, j, d[j], k - 1, d[j+1], k); goto out_errno; } + bytes_cnt++; + if (bytes_cnt == chunk_sz) { + k = 0; + bytes_cnt = 0; + } recv--; } } -- 2.14.3
[PATCH bpf-next 1/5] selftests/bpf: test_sockmap, check test failure
Test failures are not identified because exit code of RX/TX threads is not checked. Also threads are not returning correct exit code. - Return exit code from threads depending on test execution status - In main thread, check the exit code of RX/TX threads Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index eb17fae458e6..34feb74c95c4 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt) struct msg_stats s = {0}; int iov_count = opt->iov_count; int iov_buf = opt->iov_length; + int rx_status, tx_status; int cnt = opt->rate; - int status; errno = 0; @@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt) rxpid = fork(); if (rxpid == 0) { if (opt->drop_expected) - exit(1); + exit(0); if (opt->sendpage) iov_count = 1; @@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt) "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (rxpid == -1) { perror("msg_loop_rx: "); return errno; @@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt) "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n", s.bytes_sent, sent_Bps, sent_Bps/giga, s.bytes_recvd, recvd_Bps, recvd_Bps/giga); - exit(1); + exit(err ? 1 : 0); } else if (txpid == -1) { perror("msg_loop_tx: "); return errno; } - assert(waitpid(rxpid, , 0) == rxpid); - assert(waitpid(txpid, , 0) == txpid); + assert(waitpid(rxpid, _status, 0) == rxpid); + assert(waitpid(txpid, _status, 0) == txpid); + if (WIFEXITED(rx_status)) { + err = WEXITSTATUS(rx_status); + if (err) { + fprintf(stderr, "rx thread exited with err %d. ", err); + goto out; + } + } + if (WIFEXITED(tx_status)) { + err = WEXITSTATUS(tx_status); + if (err) + fprintf(stderr, "tx thread exited with err %d. ", err); + } +out: return err; } -- 2.14.3
[PATCH bpf-next 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode
In case of selftest mode, temporary cgroup environment is created but cgroup is not joined. It causes test failures. Fixed by joining the cgroup Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 34feb74c95c4..8a81ea0e9fb6 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file) return cg_fd; } + if (join_cgroup(CG_PATH)) { + fprintf(stderr, "ERROR: failed to join cgroup\n"); + return -EINVAL; + } + /* Tests basic commands and APIs with range of iov values */ txmsg_start = txmsg_end = 0; err = test_txmsg(cg_fd); -- 2.14.3
[PATCH bpf-next 3/5] selftests/bpf: test_sockmap, fix test timeout
In order to reduce runtime of tests, recently timout for select() call was reduced from 1sec to 10usec. This was causing many tests failures. It was caught with failure handling commits in this series. Restoring the timeout from 10usec to 1sec Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/bpf/test_sockmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 8a81ea0e9fb6..f79397513362 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt, if (err < 0) perror("recv start time: "); while (s->bytes_recvd < total_bytes) { - timeout.tv_sec = 0; - timeout.tv_usec = 10; + timeout.tv_sec = 1; + timeout.tv_usec = 0; /* FD sets */ FD_ZERO(); -- 2.14.3
[PATCH bpf-next 0/5] fix test_sockmap
This series fixes bugs in test_sockmap code. They weren't caught previously because failure in RX/TX thread was not notified to the main thread. Also fixed data verification logic and slightly improved test output such that parameters values (cork, apply, start, end) of failed test can be easily seen. Note: Even after fixing above problems there are issues with tests which set cork parameter. Tests fail (RX thread timeout) when cork value is non-zero and overall data sent by TX thread isn't multiples of cork value. Prashant Bhole (5): selftests/bpf: test_sockmap, check test failure selftests/bpf: test_sockmap, join cgroup in selftest mode selftests/bpf: test_sockmap, fix test timeout selftests/bpf: test_sockmap, fix data verification selftests/bpf: test_sockmap, print additional test options tools/testing/selftests/bpf/test_sockmap.c | 76 +++--- 1 file changed, 58 insertions(+), 18 deletions(-) -- 2.14.3
[PATCH bpf-next v2] samples/bpf: xdp_monitor, accept short options
Updated optstring parameter for getopt_long() to accept short options. Also updated usage() function. Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- samples/bpf/xdp_monitor_user.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c index 894bc64c2cac..05ad3f590c91 100644 --- a/samples/bpf/xdp_monitor_user.c +++ b/samples/bpf/xdp_monitor_user.c @@ -58,7 +58,7 @@ static void usage(char *argv[]) printf(" flag (internal value:%d)", *long_options[i].flag); else - printf("(internal short-option: -%c)", + printf("short-option: -%c", long_options[i].val); printf("\n"); } @@ -594,7 +594,7 @@ int main(int argc, char **argv) snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]); /* Parse commands line args */ - while ((opt = getopt_long(argc, argv, "h", + while ((opt = getopt_long(argc, argv, "hDSs:", long_options, )) != -1) { switch (opt) { case 'D': -- 2.13.6
Re: [PATCH bpf-next] samples/bpf: xdp_monitor, accept short options
On 5/12/2018 1:31 AM, Jesper Dangaard Brouer wrote: On Fri, 11 May 2018 10:37:51 +0900 Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> wrote: updated optstring accept short options Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- samples/bpf/xdp_monitor_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c index 894bc64c2cac..668511c77aaf 100644 --- a/samples/bpf/xdp_monitor_user.c +++ b/samples/bpf/xdp_monitor_user.c @@ -594,7 +594,7 @@ int main(int argc, char **argv) snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]); /* Parse commands line args */ - while ((opt = getopt_long(argc, argv, "h", + while ((opt = getopt_long(argc, argv, "hDSs:", long_options, )) != -1) { switch (opt) { case 'D': It was actually on purpose that I didn't add the short options, in-order to force people use those "self-documenting" long-options when they show the usage on public mailing lists or in blog-posts. Got it. If you want these short options, you also have to correct the "usage" function that state these are "internal" short-options. I am submitting v2 with "usage" updated, because with usability point of view it is nice to have short options. Thanks. -Prashant Notice the long options parsing done by getopt_long() allow you to only specify part of the string. Al-through, I can see --s is ambiguous. $ sudo ./xdp_monitor --s ./xdp_monitor: option '--s' is ambiguous; possibilities: '--stats' '--sec'
[PATCH bpf-next] samples/bpf: xdp_monitor, accept short options
updated optstring accept short options Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- samples/bpf/xdp_monitor_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c index 894bc64c2cac..668511c77aaf 100644 --- a/samples/bpf/xdp_monitor_user.c +++ b/samples/bpf/xdp_monitor_user.c @@ -594,7 +594,7 @@ int main(int argc, char **argv) snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]); /* Parse commands line args */ - while ((opt = getopt_long(argc, argv, "h", + while ((opt = getopt_long(argc, argv, "hDSs:", long_options, )) != -1) { switch (opt) { case 'D': -- 2.13.6
[PATCH bpf-next] bpf: sync tools bpf.h uapi header
sync the header from include/uapi/linux/bpf.h which was updated to add fib lookup helper function. This fixes selftests/bpf build failure Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/include/uapi/linux/bpf.h | 84 +- 1 file changed, 83 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 83a95ae388dd..ddc566cb7492 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -10,6 +10,8 @@ #include #include +#include +#include /* Extended instruction set based on top of classic BPF */ @@ -116,6 +118,7 @@ enum bpf_map_type { BPF_MAP_TYPE_DEVMAP, BPF_MAP_TYPE_SOCKMAP, BPF_MAP_TYPE_CPUMAP, + BPF_MAP_TYPE_XSKMAP, }; enum bpf_prog_type { @@ -1825,6 +1828,33 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * + * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags) + * Description + * Do FIB lookup in kernel tables using parameters in *params*. + * If lookup is successful and result shows packet is to be + * forwarded, the neighbor tables are searched for the nexthop. + * If successful (ie., FIB lookup shows forwarding and nexthop + * is resolved), the nexthop address is returned in ipv4_dst, + * ipv6_dst or mpls_out based on family, smac is set to mac + * address of egress device, dmac is set to nexthop mac address, + * rt_metric is set to metric from route. + * + * *plen* argument is the size of the passed in struct. + * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags: + * + * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs + * full lookup using FIB rules + * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress + * perspective (default is ingress) + * + * *ctx* is either **struct xdp_md** for XDP programs or + * **struct sk_buff** tc cls_act programs. + * + * Return + * Egress device index on success, 0 if packet needs to continue + * up the stack for further processing or a negative error in case + * of failure. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -1895,7 +1925,8 @@ union bpf_attr { FN(xdp_adjust_tail),\ FN(skb_get_xfrm_state), \ FN(get_stack), \ - FN(skb_load_bytes_relative), + FN(skb_load_bytes_relative),\ + FN(fib_lookup), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call @@ -2309,4 +2340,55 @@ struct bpf_raw_tracepoint_args { __u64 args[0]; }; +/* DIRECT: Skip the FIB rules and go to FIB table associated with device + * OUTPUT: Do lookup from egress perspective; default is ingress + */ +#define BPF_FIB_LOOKUP_DIRECT BIT(0) +#define BPF_FIB_LOOKUP_OUTPUT BIT(1) + +struct bpf_fib_lookup { + /* input */ + __u8family; /* network family, AF_INET, AF_INET6, AF_MPLS */ + + /* set if lookup is to consider L4 data - e.g., FIB rules */ + __u8l4_protocol; + __be16 sport; + __be16 dport; + + /* total length of packet from network header - used for MTU check */ + __u16 tot_len; + __u32 ifindex; /* L3 device index for lookup */ + + union { + /* inputs to lookup */ + __u8tos;/* AF_INET */ + __be32 flowlabel; /* AF_INET6 */ + + /* output: metric of fib result */ + __u32 rt_metric; + }; + + union { + __be32 mpls_in; + __be32 ipv4_src; + struct in6_addr ipv6_src; + }; + + /* input to bpf_fib_lookup, *dst is destination address. +* output: bpf_fib_lookup sets to gateway address +*/ + union { + /* return for MPLS lookups */ + __be32 mpls_out[4]; /* support up to 4 labels */ + __be32 ipv4_dst; + struct in6_addr ipv6_dst; + }; + + /* output */ + __be16 h_vlan_proto; + __be16 h_vlan_TCI; + __u8smac[ETH_ALEN]; + __u8dmac[ETH_ALEN]; +}; + #endif /* _UAPI__LINUX_BPF_H__ */ -- 2.14.3
[PATCH v2 bpf-next 1/2] lib/scatterlist: add sg_init_marker() helper
sg_init_marker initializes sg_magic in the sg table and calls sg_mark_end() on the last entry of the table. This can be useful to avoid memset in sg_init_table() when scatterlist is already zeroed out For example: when scatterlist is embedded inside other struct and that container struct is zeroed out Suggested-by: Daniel Borkmann <dan...@iogearbox.net> Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- include/linux/scatterlist.h | 18 ++ lib/scatterlist.c | 9 + 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 22b2131bcdcd..aa5d4eb725f5 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -248,6 +248,24 @@ static inline void *sg_virt(struct scatterlist *sg) return page_address(sg_page(sg)) + sg->offset; } +/** + * sg_init_marker - Initialize markers in sg table + * @sgl: The SG table + * @nents:Number of entries in table + * + **/ +static inline void sg_init_marker(struct scatterlist *sgl, + unsigned int nents) +{ +#ifdef CONFIG_DEBUG_SG + unsigned int i; + + for (i = 0; i < nents; i++) + sgl[i].sg_magic = SG_MAGIC; +#endif + sg_mark_end([nents - 1]); +} + int sg_nents(struct scatterlist *sg); int sg_nents_for_len(struct scatterlist *sg, u64 len); struct scatterlist *sg_next(struct scatterlist *); diff --git a/lib/scatterlist.c b/lib/scatterlist.c index 53728d391d3a..06dad7a072fd 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -132,14 +132,7 @@ EXPORT_SYMBOL(sg_last); void sg_init_table(struct scatterlist *sgl, unsigned int nents) { memset(sgl, 0, sizeof(*sgl) * nents); -#ifdef CONFIG_DEBUG_SG - { - unsigned int i; - for (i = 0; i < nents; i++) - sgl[i].sg_magic = SG_MAGIC; - } -#endif - sg_mark_end([nents - 1]); + sg_init_marker(sgl, nents); } EXPORT_SYMBOL(sg_init_table); -- 2.14.3
[PATCH v2 bpf-next 0/2] sockmap: fix sg api usage
These patches fix sg api usage in sockmap. Previously sockmap didn't use sg_init_table(), which caused hitting BUG_ON in sg api, when CONFIG_DEBUG_SG is enabled v1: added sg_init_table() calls wherever needed. v2: - Patch1 adds new helper function in sg api. sg_init_marker() - Patch2 sg_init_marker() and sg_init_table() in appropriate places Backgroud: While reviewing v1, John Fastabend raised a valid point about unnecessary memset in sg_init_table() because sockmap uses sg table which embedded in a struct. As enclosing struct is zeroed out, there is unnecessary memset in sg_init_table. So Daniel Borkmann suggested to define another static inline function in scatterlist.h which only initializes sg_magic. Also this function will be called from sg_init_table. From this suggestion I defined a function sg_init_marker() which sets sg_magic and calls sg_mark_end() Prashant Bhole (2): lib/scatterlist: add sg_init_marker() helper bpf: sockmap: initialize sg table entries properly include/linux/scatterlist.h | 18 ++ kernel/bpf/sockmap.c| 13 - lib/scatterlist.c | 9 + 3 files changed, 27 insertions(+), 13 deletions(-) -- 2.14.3
[PATCH v2 bpf-next 2/2] bpf: sockmap: initialize sg table entries properly
When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized in sg_init_table() and it is verified in sg api while navigating. We hit BUG_ON when magic check is failed. In functions sg_tcp_sendpage and sg_tcp_sendmsg, the struct containing the scatterlist is already zeroed out. So to avoid extra memset, we use sg_init_marker() to initialize sg_magic. Fixed following things: - In bpf_tcp_sendpage: initialize sg using sg_init_marker - In bpf_tcp_sendmsg: Replace sg_init_table with sg_init_marker - In bpf_tcp_push: Replace memset with sg_init_table where consumed sg entry needs to be re-initialized. Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- kernel/bpf/sockmap.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 69c5bccabd22..b4f01656c452 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes, md->sg_start++; if (md->sg_start == MAX_SKB_FRAGS) md->sg_start = 0; - memset(sg, 0, sizeof(*sg)); + sg_init_table(sg, 1); if (md->sg_start == md->sg_end) break; @@ -656,7 +656,7 @@ static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) } sg = md.sg_data; - sg_init_table(sg, MAX_SKB_FRAGS); + sg_init_marker(sg, MAX_SKB_FRAGS); rcu_read_unlock(); lock_sock(sk); @@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, lock_sock(sk); - if (psock->cork_bytes) + if (psock->cork_bytes) { m = psock->cork; - else + sg = >sg_data[m->sg_end]; + } else { m = + sg = m->sg_data; + sg_init_marker(sg, MAX_SKB_FRAGS); + } /* Catch case where ring is full and sendpage is stalled. */ if (unlikely(m->sg_end == m->sg_start && @@ -774,7 +778,6 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, goto out_err; psock->sg_size += size; - sg = >sg_data[m->sg_end]; sg_set_page(sg, page, size, offset); get_page(page); m->sg_copy[m->sg_end] = true; -- 2.14.3
Re: [PATCH bpf-next] bpf: sockmap: initialize sg table entries properly
On 3/28/2018 5:51 PM, Daniel Borkmann wrote: On 03/28/2018 08:18 AM, Prashant Bhole wrote: On 3/27/2018 6:05 PM, Daniel Borkmann wrote: On 03/27/2018 10:41 AM, Prashant Bhole wrote: On 3/27/2018 12:15 PM, John Fastabend wrote: On 03/25/2018 11:54 PM, Prashant Bhole wrote: When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC, when sg table is initialized using sg_init_table(). Magic is checked while navigating the scatterlist. We hit BUG_ON when magic check is failed. Fixed following things: - Initialization of sg table in bpf_tcp_sendpage() was missing, initialized it using sg_init_table() - bpf_tcp_sendmsg() initializes sg table using sg_init_table() before entering the loop, but further consumed sg entries are initialized using memset. Fixed it by replacing memset with sg_init_table() in function bpf_tcp_push() Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- kernel/bpf/sockmap.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 69c5bccabd22..8a848a99d768 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes, md->sg_start++; if (md->sg_start == MAX_SKB_FRAGS) md->sg_start = 0; - memset(sg, 0, sizeof(*sg)); + sg_init_table(sg, 1); Looks OK here. if (md->sg_start == md->sg_end) break; @@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, lock_sock(sk); - if (psock->cork_bytes) + if (psock->cork_bytes) { m = psock->cork; - else + sg = >sg_data[m->sg_end]; + } else { m = + sg = m->sg_data; + sg_init_table(sg, MAX_SKB_FRAGS); sg_init_table() does an unnecessary memset() though. We probably either want a new scatterlist API or just open code this, #ifdef CONFIG_DEBUG_SG { unsigned int i; for (i = 0; i < nents; i++) sgl[i].sg_magic = SG_MAGIC; } Similar sg_init_table() is present in bpf_tcp_sendmsg(). I agree that it causes unnecessary memset, but I don't agree with open coded fix. But then lets fix is properly and add a static inline helper to the include/linux/scatterlist.h header like ... static inline void sg_init_debug_marker(struct scatterlist *sgl, unsigned int nents) { #ifdef CONFIG_DEBUG_SG unsigned int i; for (i = 0; i < nents; i++) sgl[i].sg_magic = SG_MAGIC; #endif } ... and reuse it in all the places that would otherwise open-code this, as well as sg_init_table(): void sg_init_table(struct scatterlist *sgl, unsigned int nents) { memset(sgl, 0, sizeof(*sgl) * nents); sg_init_debug_marker(sgl, nents); sg_mark_end([nents - 1]); } This would be a lot cleaner than having this duplicated in various places. Daniel, This is a good suggestion. Is it ok if I submit both changes in a patch series? Sure, that's fine. How scatterlist related changes will be picked up by other subsystems? Once this gets applied into bpf-next, this will be pushed to net-next tree, and during the merge window net-next will be pulled into Linus' tree if this is what you are asking. Then also other subsystems outside of bpf/networking can make use of the sg_init_debug_marker() helper if suitable for their situation. Thanks. I am submitting V2 soon. -Prashant
Re: [PATCH bpf-next] bpf: sockmap: initialize sg table entries properly
On 3/27/2018 6:05 PM, Daniel Borkmann wrote: On 03/27/2018 10:41 AM, Prashant Bhole wrote: On 3/27/2018 12:15 PM, John Fastabend wrote: On 03/25/2018 11:54 PM, Prashant Bhole wrote: When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC, when sg table is initialized using sg_init_table(). Magic is checked while navigating the scatterlist. We hit BUG_ON when magic check is failed. Fixed following things: - Initialization of sg table in bpf_tcp_sendpage() was missing, initialized it using sg_init_table() - bpf_tcp_sendmsg() initializes sg table using sg_init_table() before entering the loop, but further consumed sg entries are initialized using memset. Fixed it by replacing memset with sg_init_table() in function bpf_tcp_push() Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- kernel/bpf/sockmap.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 69c5bccabd22..8a848a99d768 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes, md->sg_start++; if (md->sg_start == MAX_SKB_FRAGS) md->sg_start = 0; - memset(sg, 0, sizeof(*sg)); + sg_init_table(sg, 1); Looks OK here. if (md->sg_start == md->sg_end) break; @@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, lock_sock(sk); - if (psock->cork_bytes) + if (psock->cork_bytes) { m = psock->cork; - else + sg = >sg_data[m->sg_end]; + } else { m = + sg = m->sg_data; + sg_init_table(sg, MAX_SKB_FRAGS); sg_init_table() does an unnecessary memset() though. We probably either want a new scatterlist API or just open code this, #ifdef CONFIG_DEBUG_SG { unsigned int i; for (i = 0; i < nents; i++) sgl[i].sg_magic = SG_MAGIC; } Similar sg_init_table() is present in bpf_tcp_sendmsg(). I agree that it causes unnecessary memset, but I don't agree with open coded fix. But then lets fix is properly and add a static inline helper to the include/linux/scatterlist.h header like ... static inline void sg_init_debug_marker(struct scatterlist *sgl, unsigned int nents) { #ifdef CONFIG_DEBUG_SG unsigned int i; for (i = 0; i < nents; i++) sgl[i].sg_magic = SG_MAGIC; #endif } ... and reuse it in all the places that would otherwise open-code this, as well as sg_init_table(): void sg_init_table(struct scatterlist *sgl, unsigned int nents) { memset(sgl, 0, sizeof(*sgl) * nents); sg_init_debug_marker(sgl, nents); sg_mark_end([nents - 1]); } This would be a lot cleaner than having this duplicated in various places. Daniel, This is a good suggestion. Is it ok if I submit both changes in a patch series? How scatterlist related changes will be picked up by other subsystems? -Prashant
Re: [PATCH bpf-next] bpf: sockmap: initialize sg table entries properly
On 3/27/2018 12:15 PM, John Fastabend wrote: On 03/25/2018 11:54 PM, Prashant Bhole wrote: When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC, when sg table is initialized using sg_init_table(). Magic is checked while navigating the scatterlist. We hit BUG_ON when magic check is failed. Fixed following things: - Initialization of sg table in bpf_tcp_sendpage() was missing, initialized it using sg_init_table() - bpf_tcp_sendmsg() initializes sg table using sg_init_table() before entering the loop, but further consumed sg entries are initialized using memset. Fixed it by replacing memset with sg_init_table() in function bpf_tcp_push() Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- kernel/bpf/sockmap.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 69c5bccabd22..8a848a99d768 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes, md->sg_start++; if (md->sg_start == MAX_SKB_FRAGS) md->sg_start = 0; - memset(sg, 0, sizeof(*sg)); + sg_init_table(sg, 1); Looks OK here. if (md->sg_start == md->sg_end) break; @@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, lock_sock(sk); - if (psock->cork_bytes) + if (psock->cork_bytes) { m = psock->cork; - else + sg = >sg_data[m->sg_end]; + } else { m = + sg = m->sg_data; + sg_init_table(sg, MAX_SKB_FRAGS); sg_init_table() does an unnecessary memset() though. We probably either want a new scatterlist API or just open code this, #ifdef CONFIG_DEBUG_SG { unsigned int i; for (i = 0; i < nents; i++) sgl[i].sg_magic = SG_MAGIC; } Similar sg_init_table() is present in bpf_tcp_sendmsg(). I agree that it causes unnecessary memset, but I don't agree with open coded fix. I am still with V1. Thanks. -Prashant
[PATCH bpf-next] bpf: sockmap: initialize sg table entries properly
When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC, when sg table is initialized using sg_init_table(). Magic is checked while navigating the scatterlist. We hit BUG_ON when magic check is failed. Fixed following things: - Initialization of sg table in bpf_tcp_sendpage() was missing, initialized it using sg_init_table() - bpf_tcp_sendmsg() initializes sg table using sg_init_table() before entering the loop, but further consumed sg entries are initialized using memset. Fixed it by replacing memset with sg_init_table() in function bpf_tcp_push() Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- kernel/bpf/sockmap.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 69c5bccabd22..8a848a99d768 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes, md->sg_start++; if (md->sg_start == MAX_SKB_FRAGS) md->sg_start = 0; - memset(sg, 0, sizeof(*sg)); + sg_init_table(sg, 1); if (md->sg_start == md->sg_end) break; @@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, lock_sock(sk); - if (psock->cork_bytes) + if (psock->cork_bytes) { m = psock->cork; - else + sg = >sg_data[m->sg_end]; + } else { m = + sg = m->sg_data; + sg_init_table(sg, MAX_SKB_FRAGS); + } /* Catch case where ring is full and sendpage is stalled. */ if (unlikely(m->sg_end == m->sg_start && @@ -774,7 +778,6 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page *page, goto out_err; psock->sg_size += size; - sg = >sg_data[m->sg_end]; sg_set_page(sg, page, size, offset); get_page(page); m->sg_copy[m->sg_end] = true; -- 2.14.3
[PATCH net-next] selftests/net: fix in_netns.sh script
execute the subprocess in netns using 'ip netns exec' Fixes: cc30c93fa020 ("selftests/net: ignore background traffic in psock_fanout") Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/net/in_netns.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/in_netns.sh b/tools/testing/selftests/net/in_netns.sh index f57a2eaad069..88795b510b32 100755 --- a/tools/testing/selftests/net/in_netns.sh +++ b/tools/testing/selftests/net/in_netns.sh @@ -19,5 +19,5 @@ cleanup() { trap cleanup EXIT setup -"$@" +ip netns exec "${NETNS}" "$@" exit "$?" -- 2.14.3
[PATCH net-next] selftests: rtnetlink: remove testns on test fail
This patch removes testns after test failure so that next test can continue with clean ns Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/net/rtnetlink.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index a622eeecc3a6..e6f485235435 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -517,6 +517,7 @@ kci_test_gretap() ip link help gretap 2>&1 | grep -q "^Usage:" if [ $? -ne 0 ];then echo "SKIP: gretap: iproute2 too old" + ip netns del "$testns" return 1 fi @@ -543,6 +544,7 @@ kci_test_gretap() if [ $ret -ne 0 ]; then echo "FAIL: gretap" + ip netns del "$testns" return 1 fi echo "PASS: gretap" @@ -565,6 +567,7 @@ kci_test_ip6gretap() ip link help ip6gretap 2>&1 | grep -q "^Usage:" if [ $? -ne 0 ];then echo "SKIP: ip6gretap: iproute2 too old" + ip netns del "$testns" return 1 fi @@ -591,6 +594,7 @@ kci_test_ip6gretap() if [ $ret -ne 0 ]; then echo "FAIL: ip6gretap" + ip netns del "$testns" return 1 fi echo "PASS: ip6gretap" @@ -655,6 +659,7 @@ kci_test_erspan() if [ $ret -ne 0 ]; then echo "FAIL: erspan" + ip netns del "$testns" return 1 fi echo "PASS: erspan" @@ -720,6 +725,7 @@ kci_test_ip6erspan() if [ $ret -ne 0 ]; then echo "FAIL: ip6erspan" + ip netns del "$testns" return 1 fi echo "PASS: ip6erspan" -- 2.13.6
[PATCH bpf-next] samples/bpf: detach prog from cgroup
test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program attached to cgroup even after completion. Using detach functionality of test_cgrp2_sock in both scripts. Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- samples/bpf/test_cgrp2_sock.sh | 1 + samples/bpf/test_cgrp2_sock2.sh | 3 +++ 2 files changed, 4 insertions(+) diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh index 8ee0371a100a..9f6174236856 100755 --- a/samples/bpf/test_cgrp2_sock.sh +++ b/samples/bpf/test_cgrp2_sock.sh @@ -61,6 +61,7 @@ cleanup_and_exit() [ -n "$msg" ] && echo "ERROR: $msg" + test_cgrp2_sock -d ${CGRP_MNT}/sockopts ip li del cgrp2_sock umount ${CGRP_MNT} diff --git a/samples/bpf/test_cgrp2_sock2.sh b/samples/bpf/test_cgrp2_sock2.sh index fc4e64d00cb3..0f396a86e0cb 100755 --- a/samples/bpf/test_cgrp2_sock2.sh +++ b/samples/bpf/test_cgrp2_sock2.sh @@ -28,6 +28,9 @@ function attach_bpf { } function cleanup { + if [ -d /tmp/cgroupv2/foo ]; then + test_cgrp2_sock -d /tmp/cgroupv2/foo + fi ip link del veth0b ip netns delete at_ns0 umount /tmp/cgroupv2 -- 2.13.6
[PATCH net-next v2] selftests/net: fixes psock_fanout eBPF test case
eBPF test fails due to verifier failure because log_buf is too small. Fixed by increasing log_buf size Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- v2: log_buf is statically allocated tools/testing/selftests/net/psock_fanout.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c index 989f917068d1..d4346b16b2c1 100644 --- a/tools/testing/selftests/net/psock_fanout.c +++ b/tools/testing/selftests/net/psock_fanout.c @@ -128,6 +128,8 @@ static void sock_fanout_getopts(int fd, uint16_t *typeflags, uint16_t *group_id) static void sock_fanout_set_ebpf(int fd) { + static char log_buf[65536]; + const int len_off = __builtin_offsetof(struct __sk_buff, len); struct bpf_insn prog[] = { { BPF_ALU64 | BPF_MOV | BPF_X, 6, 1, 0, 0 }, @@ -140,7 +142,6 @@ static void sock_fanout_set_ebpf(int fd) { BPF_ALU | BPF_MOV | BPF_K, 0, 0, 0, 0 }, { BPF_JMP | BPF_EXIT, 0, 0, 0, 0 } }; - char log_buf[512]; union bpf_attr attr; int pfd; -- 2.13.6
[PATCH net] selftests/net: fixes psock_fanout eBPF test case
eBPF test fails due to verifier failure because log_buf is too small Fixed by increasing log_buf size Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- tools/testing/selftests/net/psock_fanout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c index 989f917068d1..898278a41322 100644 --- a/tools/testing/selftests/net/psock_fanout.c +++ b/tools/testing/selftests/net/psock_fanout.c @@ -140,7 +140,7 @@ static void sock_fanout_set_ebpf(int fd) { BPF_ALU | BPF_MOV | BPF_K, 0, 0, 0, 0 }, { BPF_JMP | BPF_EXIT, 0, 0, 0, 0 } }; - char log_buf[512]; + char log_buf[65536]; union bpf_attr attr; int pfd; -- 2.13.6
[PATCH bpf] bpf: samples/sockmap detach sock ops program
samples/sockops program keeps the sock_ops program attached to cgroup. Fixed this by detaching program before exit. Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- samples/sockmap/sockmap_user.c | 1 + 1 file changed, 1 insertion(+) diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c index 7c25c0c112bc..95a54a89a532 100644 --- a/samples/sockmap/sockmap_user.c +++ b/samples/sockmap/sockmap_user.c @@ -566,6 +566,7 @@ int main(int argc, char **argv) else fprintf(stderr, "unknown test\n"); out: + bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS); close(s1); close(s2); close(p1); -- 2.13.6
[PATCH bpf] bpf: samples/sockmap fix Makefile for build error
While building samples/sockmap, undefined reference error is thrown for `nla_dump_errormsg'. Linking tools/lib/bpf/nlattr.o as a fix Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> --- samples/sockmap/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/sockmap/Makefile b/samples/sockmap/Makefile index 73f1da4d116c..9bf2881bd11b 100644 --- a/samples/sockmap/Makefile +++ b/samples/sockmap/Makefile @@ -2,7 +2,7 @@ hostprogs-y := sockmap # Libbpf dependencies -LIBBPF := ../../tools/lib/bpf/bpf.o +LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o HOSTCFLAGS += -I$(objtree)/usr/include HOSTCFLAGS += -I$(srctree)/tools/lib/ -- 2.13.6