Re: [PATCH bpf] selftests/bpf: add missing pointer dereference for map stacktrace fixup

2018-12-06 Thread Prashant Bhole




On 12/7/2018 1:14 PM, Stanislav Fomichev wrote:

I get a segfault without it, other fixups always do dereference, and
without dereference I don't understand how it can ever work.

Fixes: 7c85c448e7d74 ("selftests/bpf: test_verifier, check
bpf_map_lookup_elem access in bpf prog")

Signed-off-by: Stanislav Fomichev 
---
  tools/testing/selftests/bpf/test_verifier.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index df6f751cc1e8..d23929a1985d 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -14166,7 +14166,7 @@ static void do_test_fixup(struct bpf_test *test, enum 
bpf_map_type prog_type,
do {
prog[*fixup_map_stacktrace].imm = map_fds[12];
fixup_map_stacktrace++;
-   } while (fixup_map_stacktrace);
+   } while (*fixup_map_stacktrace);
}
  }


It was my mistake. Thanks for the fix!

-Prashant



[PATCH v2 net] tun: remove skb access after netif_receive_skb

2018-12-03 Thread Prashant Bhole
In tun.c skb->len was accessed while doing stats accounting after a
call to netif_receive_skb. We can not access skb after this call
because buffers may be dropped.

The fix for this bug would be to store skb->len in local variable and
then use it after netif_receive_skb(). IMO using xdp data size for
accounting bytes will be better because input for tun_xdp_one() is
xdp_buff.

Hence this patch:
- fixes a bug by removing skb access after netif_receive_skb()
- uses xdp data size for accounting bytes

[613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun]
[613.021062] Read of size 4 at addr 8881da9ab7c0 by task vhost-1115/1155
[613.023073]
[613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232
[613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[613.029116] Call Trace:
[613.031145]  dump_stack+0x5b/0x90
[613.032219]  print_address_description+0x6c/0x23c
[613.034156]  ? tun_sendmsg+0x77c/0xc50 [tun]
[613.036141]  kasan_report.cold.5+0x241/0x308
[613.038125]  tun_sendmsg+0x77c/0xc50 [tun]
[613.040109]  ? tun_get_user+0x1960/0x1960 [tun]
[613.042094]  ? __isolate_free_page+0x270/0x270
[613.045173]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.047127]  ? peek_head_len.part.13+0x90/0x90 [vhost_net]
[613.049096]  ? get_tx_bufs+0x5a/0x2c0 [vhost_net]
[613.051106]  ? vhost_enable_notify+0x2d8/0x420 [vhost]
[613.053139]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.053139]  ? vhost_net_buf_peek+0x340/0x340 [vhost_net]
[613.053139]  ? __mutex_lock+0x8d9/0xb30
[613.053139]  ? finish_task_switch+0x8f/0x3f0
[613.053139]  ? handle_tx+0x32/0x120 [vhost_net]
[613.053139]  ? mutex_trylock+0x110/0x110
[613.053139]  ? finish_task_switch+0xcf/0x3f0
[613.053139]  ? finish_task_switch+0x240/0x3f0
[613.053139]  ? __switch_to_asm+0x34/0x70
[613.053139]  ? __switch_to_asm+0x40/0x70
[613.053139]  ? __schedule+0x506/0xf10
[613.053139]  handle_tx+0xc7/0x120 [vhost_net]
[613.053139]  vhost_worker+0x166/0x200 [vhost]
[613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
[613.053139]  ? __kthread_parkme+0x77/0x90
[613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
[613.053139]  kthread+0x1b1/0x1d0
[613.053139]  ? kthread_park+0xb0/0xb0
[613.053139]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] Allocated by task 1155:
[613.088705]  kasan_kmalloc+0xbf/0xe0
[613.088705]  kmem_cache_alloc+0xdc/0x220
[613.088705]  __build_skb+0x2a/0x160
[613.088705]  build_skb+0x14/0xc0
[613.088705]  tun_sendmsg+0x4f0/0xc50 [tun]
[613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.088705]  handle_tx+0xc7/0x120 [vhost_net]
[613.088705]  vhost_worker+0x166/0x200 [vhost]
[613.088705]  kthread+0x1b1/0x1d0
[613.088705]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] Freed by task 1155:
[613.088705]  __kasan_slab_free+0x12e/0x180
[613.088705]  kmem_cache_free+0xa0/0x230
[613.088705]  ip6_mc_input+0x40f/0x5a0
[613.088705]  ipv6_rcv+0xc9/0x1e0
[613.088705]  __netif_receive_skb_one_core+0xc1/0x100
[613.088705]  netif_receive_skb_internal+0xc4/0x270
[613.088705]  br_pass_frame_up+0x2b9/0x2e0
[613.088705]  br_handle_frame_finish+0x2fb/0x7a0
[613.088705]  br_handle_frame+0x30f/0x6c0
[613.088705]  __netif_receive_skb_core+0x61a/0x15b0
[613.088705]  __netif_receive_skb_one_core+0x8e/0x100
[613.088705]  netif_receive_skb_internal+0xc4/0x270
[613.088705]  tun_sendmsg+0x738/0xc50 [tun]
[613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.088705]  handle_tx+0xc7/0x120 [vhost_net]
[613.088705]  vhost_worker+0x166/0x200 [vhost]
[613.088705]  kthread+0x1b1/0x1d0
[613.088705]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] The buggy address belongs to the object at 8881da9ab740
[613.088705]  which belongs to the cache skbuff_head_cache of size 232

Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()")
Reviewed-by: Toshiaki Makita 
Signed-off-by: Prashant Bhole 
Acked-by: Jason Wang 
---
v1 -> v2:
No change. Reposted due to patchwork status.

 drivers/net/tun.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e244f5d7512a..6e388792c0a8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2385,6 +2385,7 @@ static int tun_xdp_one(struct tun_struct *tun,
   struct tun_file *tfile,
   struct xdp_buff *xdp, int *flush)
 {
+   unsigned int datasize = xdp->data_end - xdp->data;
struct tun_xdp_hdr *hdr = xdp->data_hard_start;
struct virtio_net_hdr *gso = >gso;
struct tun_pcpu_stats *stats;
@@ -2461,7 +2462,7 @@ static int tun_xdp_one(struct tun_struct *tun,
stats = get_cpu_ptr(tun->pcpu_stats);
u64_stats_update_begin(>syncp);
stats->rx_packets++;
-   stats->rx_bytes += skb->len;
+   stats->rx_bytes

[PATCH net] tun: remove skb access after netif_receive_skb

2018-11-29 Thread Prashant Bhole
In tun.c skb->len was accessed while doing stats accounting after a
call to netif_receive_skb. We can not access skb after this call
because buffers may be dropped.

The fix for this bug would be to store skb->len in local variable and
then use it after netif_receive_skb(). IMO using xdp data size for
accounting bytes will be better because input for tun_xdp_one() is
xdp_buff.

Hence this patch:
- fixes a bug by removing skb access after netif_receive_skb()
- uses xdp data size for accounting bytes

[613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun]
[613.021062] Read of size 4 at addr 8881da9ab7c0 by task vhost-1115/1155
[613.023073]
[613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232
[613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[613.029116] Call Trace:
[613.031145]  dump_stack+0x5b/0x90
[613.032219]  print_address_description+0x6c/0x23c
[613.034156]  ? tun_sendmsg+0x77c/0xc50 [tun]
[613.036141]  kasan_report.cold.5+0x241/0x308
[613.038125]  tun_sendmsg+0x77c/0xc50 [tun]
[613.040109]  ? tun_get_user+0x1960/0x1960 [tun]
[613.042094]  ? __isolate_free_page+0x270/0x270
[613.045173]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.047127]  ? peek_head_len.part.13+0x90/0x90 [vhost_net]
[613.049096]  ? get_tx_bufs+0x5a/0x2c0 [vhost_net]
[613.051106]  ? vhost_enable_notify+0x2d8/0x420 [vhost]
[613.053139]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.053139]  ? vhost_net_buf_peek+0x340/0x340 [vhost_net]
[613.053139]  ? __mutex_lock+0x8d9/0xb30
[613.053139]  ? finish_task_switch+0x8f/0x3f0
[613.053139]  ? handle_tx+0x32/0x120 [vhost_net]
[613.053139]  ? mutex_trylock+0x110/0x110
[613.053139]  ? finish_task_switch+0xcf/0x3f0
[613.053139]  ? finish_task_switch+0x240/0x3f0
[613.053139]  ? __switch_to_asm+0x34/0x70
[613.053139]  ? __switch_to_asm+0x40/0x70
[613.053139]  ? __schedule+0x506/0xf10
[613.053139]  handle_tx+0xc7/0x120 [vhost_net]
[613.053139]  vhost_worker+0x166/0x200 [vhost]
[613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
[613.053139]  ? __kthread_parkme+0x77/0x90
[613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
[613.053139]  kthread+0x1b1/0x1d0
[613.053139]  ? kthread_park+0xb0/0xb0
[613.053139]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] Allocated by task 1155:
[613.088705]  kasan_kmalloc+0xbf/0xe0
[613.088705]  kmem_cache_alloc+0xdc/0x220
[613.088705]  __build_skb+0x2a/0x160
[613.088705]  build_skb+0x14/0xc0
[613.088705]  tun_sendmsg+0x4f0/0xc50 [tun]
[613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.088705]  handle_tx+0xc7/0x120 [vhost_net]
[613.088705]  vhost_worker+0x166/0x200 [vhost]
[613.088705]  kthread+0x1b1/0x1d0
[613.088705]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] Freed by task 1155:
[613.088705]  __kasan_slab_free+0x12e/0x180
[613.088705]  kmem_cache_free+0xa0/0x230
[613.088705]  ip6_mc_input+0x40f/0x5a0
[613.088705]  ipv6_rcv+0xc9/0x1e0
[613.088705]  __netif_receive_skb_one_core+0xc1/0x100
[613.088705]  netif_receive_skb_internal+0xc4/0x270
[613.088705]  br_pass_frame_up+0x2b9/0x2e0
[613.088705]  br_handle_frame_finish+0x2fb/0x7a0
[613.088705]  br_handle_frame+0x30f/0x6c0
[613.088705]  __netif_receive_skb_core+0x61a/0x15b0
[613.088705]  __netif_receive_skb_one_core+0x8e/0x100
[613.088705]  netif_receive_skb_internal+0xc4/0x270
[613.088705]  tun_sendmsg+0x738/0xc50 [tun]
[613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.088705]  handle_tx+0xc7/0x120 [vhost_net]
[613.088705]  vhost_worker+0x166/0x200 [vhost]
[613.088705]  kthread+0x1b1/0x1d0
[613.088705]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] The buggy address belongs to the object at 8881da9ab740
[613.088705]  which belongs to the cache skbuff_head_cache of size 232

Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()")
Reviewed-by: Toshiaki Makita 
Signed-off-by: Prashant Bhole 
---
 drivers/net/tun.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e244f5d7512a..6e388792c0a8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2385,6 +2385,7 @@ static int tun_xdp_one(struct tun_struct *tun,
   struct tun_file *tfile,
   struct xdp_buff *xdp, int *flush)
 {
+   unsigned int datasize = xdp->data_end - xdp->data;
struct tun_xdp_hdr *hdr = xdp->data_hard_start;
struct virtio_net_hdr *gso = >gso;
struct tun_pcpu_stats *stats;
@@ -2461,7 +2462,7 @@ static int tun_xdp_one(struct tun_struct *tun,
stats = get_cpu_ptr(tun->pcpu_stats);
u64_stats_update_begin(>syncp);
stats->rx_packets++;
-   stats->rx_bytes += skb->len;
+   stats->rx_bytes += datasize;
u64_stats_update_end(>syncp);
put_cpu_ptr(stats);
 
-- 
2.17.2




Re: [PATCH bpf 2/2] bpf: test_verifier, test for lookup on queue/stack maps

2018-11-28 Thread Prashant Bhole




On 11/28/2018 10:06 PM, Mauricio Vasquez wrote:


On 11/28/18 3:45 AM, Daniel Borkmann wrote:

On 11/28/2018 08:51 AM, Prashant Bhole wrote:

This patch adds tests to check whether bpf verifier prevents lookup
on queue/stack maps

Signed-off-by: Prashant Bhole 
---
  tools/testing/selftests/bpf/test_verifier.c | 52 +
  1 file changed, 52 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c

index 550b7e46bf4a..becd9f4f3980 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -74,6 +74,8 @@ struct bpf_test {
  int fixup_map_in_map[MAX_FIXUPS];
  int fixup_cgroup_storage[MAX_FIXUPS];
  int fixup_percpu_cgroup_storage[MAX_FIXUPS];
+    int fixup_map_queue[MAX_FIXUPS];
+    int fixup_map_stack[MAX_FIXUPS];
  const char *errstr;
  const char *errstr_unpriv;
  uint32_t retval, retval_unpriv;
@@ -4611,6 +4613,38 @@ static struct bpf_test tests[] = {
  .errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",

  .prog_type = BPF_PROG_TYPE_PERF_EVENT,
  },
+    {
+    "prevent map lookup in queue map",
+    .insns = {
+    BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+    BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+    BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+    BPF_LD_MAP_FD(BPF_REG_1, 0),
+    BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+    BPF_EXIT_INSN(),
+    },
+    .fixup_map_queue = { 3 },
+    .result = REJECT,
+    .errstr = "invalid stack type R2 off=-8 access_size=0",
+    .prog_type = BPF_PROG_TYPE_XDP,
Hmm, the approach in patch 1 is very fragile, and we're lucky in this 
case
that the verifier bailed out with 'invalid stack type R2 off=-8 
access_size=0'

because of key size being zero. If this would have not been the case then
the added ERR_PTR(-EOPNOTSUPP) would basically be seen as a valid 
pointer and
the program could read/write into it. Instead, this needs to be 
prevented much

earlier like check_map_func_compatibility(),


Actually it is prevented in check_map_func_compatibility(), but stack 
boundary check is done before in the verifier.



and I would like to have a split
on these approaches to make verifier more robust. While you want 
ERR_PTR(-EOPNOTSUPP)

for user space syscall side,


In the case of QUEUE and STACK maps this is not relevant because the 
lookup syscall is mapped into peek operation.


In fact queue_stack_map_lookup_elem() & queue_stack_map_update_elem() 
should be never called, I think we can remove them safely.


Got it. Shall we keep these verifier tests (patch 2)?

Thanks,
Prashant





[PATCH bpf 1/2] bpf: queue/stack map, fix return value of map_lookup_elem

2018-11-27 Thread Prashant Bhole
Since commit 509db2833e0d ("bpf: error handling when map_lookup_elem
isn't supported") when map lookup isn't supported, the map_lookup_elem
function should return ERR_PTR(-EOPNOTSUPP).

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Prashant Bhole 
---
 kernel/bpf/queue_stack_maps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index b384ea9f3254..8fd19b06da66 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -240,7 +240,7 @@ static int queue_stack_map_push_elem(struct bpf_map *map, 
void *value,
 /* Called from syscall or from eBPF program */
 static void *queue_stack_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* Called from syscall or from eBPF program */
-- 
2.17.2




[PATCH bpf 0/2] fix map_lookup_elem return value for queue/stack map

2018-11-27 Thread Prashant Bhole
This set fixes map_lookup_elem return value for queue/stack map.
Also adds verifier tests to check whether verifier prevents lookup on
these maps.

Note that patch 2 isn't dependant on patch 1. The verifier prevents
lookup on queue/stack map because key size is zero.

Prashant Bhole (2):
  bpf: queue/stack map, fix return value of map_lookup_elem
  bpf: test_verifier, test for lookup on queue/stack maps

 kernel/bpf/queue_stack_maps.c   |  2 +-
 tools/testing/selftests/bpf/test_verifier.c | 52 +
 2 files changed, 53 insertions(+), 1 deletion(-)

-- 
2.17.2




[PATCH bpf 2/2] bpf: test_verifier, test for lookup on queue/stack maps

2018-11-27 Thread Prashant Bhole
This patch adds tests to check whether bpf verifier prevents lookup
on queue/stack maps

Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_verifier.c | 52 +
 1 file changed, 52 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 550b7e46bf4a..becd9f4f3980 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -74,6 +74,8 @@ struct bpf_test {
int fixup_map_in_map[MAX_FIXUPS];
int fixup_cgroup_storage[MAX_FIXUPS];
int fixup_percpu_cgroup_storage[MAX_FIXUPS];
+   int fixup_map_queue[MAX_FIXUPS];
+   int fixup_map_stack[MAX_FIXUPS];
const char *errstr;
const char *errstr_unpriv;
uint32_t retval, retval_unpriv;
@@ -4611,6 +4613,38 @@ static struct bpf_test tests[] = {
.errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",
.prog_type = BPF_PROG_TYPE_PERF_EVENT,
},
+   {
+   "prevent map lookup in queue map",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_queue = { 3 },
+   .result = REJECT,
+   .errstr = "invalid stack type R2 off=-8 access_size=0",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "prevent map lookup in stack map",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_stack = { 3 },
+   .result = REJECT,
+   .errstr = "invalid stack type R2 off=-8 access_size=0",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
{
"prevent map lookup in prog array",
.insns = {
@@ -14048,6 +14082,8 @@ static void do_test_fixup(struct bpf_test *test, enum 
bpf_map_type prog_type,
int *fixup_map_sockhash = test->fixup_map_sockhash;
int *fixup_map_xskmap = test->fixup_map_xskmap;
int *fixup_map_stacktrace = test->fixup_map_stacktrace;
+   int *fixup_map_queue = test->fixup_map_queue;
+   int *fixup_map_stack = test->fixup_map_stack;
int *fixup_prog1 = test->fixup_prog1;
int *fixup_prog2 = test->fixup_prog2;
int *fixup_map_in_map = test->fixup_map_in_map;
@@ -14168,6 +14204,22 @@ static void do_test_fixup(struct bpf_test *test, enum 
bpf_map_type prog_type,
fixup_map_stacktrace++;
} while (fixup_map_stacktrace);
}
+   if (*fixup_map_queue) {
+   map_fds[13] = create_map(BPF_MAP_TYPE_QUEUE, 0,
+sizeof(u32), 1);
+   do {
+   prog[*fixup_map_queue].imm = map_fds[13];
+   fixup_map_queue++;
+   } while (*fixup_map_queue);
+   }
+   if (*fixup_map_stack) {
+   map_fds[14] = create_map(BPF_MAP_TYPE_STACK, 0,
+sizeof(u32), 1);
+   do {
+   prog[*fixup_map_stack].imm = map_fds[14];
+   fixup_map_stack++;
+   } while (*fixup_map_stack);
+   }
 }
 
 static int set_admin(bool admin)
-- 
2.17.2




Re: [PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog

2018-10-25 Thread Prashant Bhole




On 10/25/2018 5:54 PM, Naresh Kamboju wrote:

On Tue, 9 Oct 2018 at 12:32, Song Liu  wrote:


On Mon, Oct 8, 2018 at 6:07 PM Prashant Bhole
 wrote:


map_lookup_elem isn't supported by certain map types like:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Let's add verfier tests to check whether verifier prevents
bpf_map_lookup_elem call on above programs from bpf program.

Signed-off-by: Prashant Bhole 
Acked-by: Alexei Starovoitov 

Acked-by: Song Liu 


---
  tools/testing/selftests/bpf/test_verifier.c | 121 +++-
  1 file changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 65ae44c85d27..cf4cd32b6772 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -48,7 +48,7 @@

  #define MAX_INSNS  BPF_MAXINSNS
  #define MAX_FIXUPS 8
-#define MAX_NR_MAPS8
+#define MAX_NR_MAPS13
  #define POINTER_VALUE  0xcafe4all
  #define TEST_DATA_LEN  64

@@ -65,6 +65,10 @@ struct bpf_test {
 int fixup_map_hash_48b[MAX_FIXUPS];
 int fixup_map_hash_16b[MAX_FIXUPS];
 int fixup_map_array_48b[MAX_FIXUPS];
+   int fixup_map_sockmap[MAX_FIXUPS];
+   int fixup_map_sockhash[MAX_FIXUPS];
+   int fixup_map_xskmap[MAX_FIXUPS];
+   int fixup_map_stacktrace[MAX_FIXUPS];
 int fixup_prog1[MAX_FIXUPS];
 int fixup_prog2[MAX_FIXUPS];
 int fixup_map_in_map[MAX_FIXUPS];
@@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = {
 .errstr = "invalid access to packet",
 .prog_type = BPF_PROG_TYPE_SCHED_CLS,
 },
+   {
+   "prevent map lookup in sockmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_sockmap = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 15 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in sockhash",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_sockhash = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 18 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in xskmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_xskmap = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 17 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "prevent map lookup in stack trace",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_stacktrace = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_PERF_EVENT,
+   },
+   {
+   "prevent map lookup in prog array",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+  

[PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog

2018-10-08 Thread Prashant Bhole
map_lookup_elem isn't supported by certain map types like:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Let's add verfier tests to check whether verifier prevents
bpf_map_lookup_elem call on above programs from bpf program.

Signed-off-by: Prashant Bhole 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 121 +++-
 1 file changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 65ae44c85d27..cf4cd32b6772 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -48,7 +48,7 @@
 
 #define MAX_INSNS  BPF_MAXINSNS
 #define MAX_FIXUPS 8
-#define MAX_NR_MAPS8
+#define MAX_NR_MAPS13
 #define POINTER_VALUE  0xcafe4all
 #define TEST_DATA_LEN  64
 
@@ -65,6 +65,10 @@ struct bpf_test {
int fixup_map_hash_48b[MAX_FIXUPS];
int fixup_map_hash_16b[MAX_FIXUPS];
int fixup_map_array_48b[MAX_FIXUPS];
+   int fixup_map_sockmap[MAX_FIXUPS];
+   int fixup_map_sockhash[MAX_FIXUPS];
+   int fixup_map_xskmap[MAX_FIXUPS];
+   int fixup_map_stacktrace[MAX_FIXUPS];
int fixup_prog1[MAX_FIXUPS];
int fixup_prog2[MAX_FIXUPS];
int fixup_map_in_map[MAX_FIXUPS];
@@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = {
.errstr = "invalid access to packet",
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
+   {
+   "prevent map lookup in sockmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_sockmap = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 15 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in sockhash",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_sockhash = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 18 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in xskmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_xskmap = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 17 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "prevent map lookup in stack trace",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_stacktrace = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_PERF_EVENT,
+   },
+   {
+   "prevent map lookup in prog array",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+

[PATCH bpf-next 4/6] tools/bpf: bpftool, print strerror when map lookup error occurs

2018-10-08 Thread Prashant Bhole
Since map lookup error can be ENOENT or EOPNOTSUPP, let's print
strerror() as error message in normal and JSON output.

This patch adds helper function print_entry_error() to print
entry from lookup error occurs

Example: Following example dumps a map which does not support lookup.

Output before:
root# bpftool map -jp dump id 40
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
}
]

root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements

Output after changes:
root# bpftool map dump -jp  id 45
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "Operation not supported"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "Operation not supported"
}
]

root# bpftool map dump id 45
key:
0a 00 00 00
value:
Operation not supported
key:
0b 00 00 00
value:
Operation not supported
Found 0 elements

Signed-off-by: Prashant Bhole 
Acked-by: Jakub Kicinski 
Acked-by: Alexei Starovoitov 
---
 tools/bpf/bpftool/map.c | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 28d365435fea..9f5de48f8a99 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -336,6 +336,25 @@ static void print_entry_json(struct bpf_map_info *info, 
unsigned char *key,
jsonw_end_object(json_wtr);
 }
 
+static void print_entry_error(struct bpf_map_info *info, unsigned char *key,
+ const char *value)
+{
+   int value_size = strlen(value);
+   bool single_line, break_names;
+
+   break_names = info->key_size > 16 || value_size > 16;
+   single_line = info->key_size + value_size <= 24 && !break_names;
+
+   printf("key:%c", break_names ? '\n' : ' ');
+   fprint_hex(stdout, key, info->key_size, " ");
+
+   printf(single_line ? "  " : "\n");
+
+   printf("value:%c%s", break_names ? '\n' : ' ', value);
+
+   printf("\n");
+}
+
 static void print_entry_plain(struct bpf_map_info *info, unsigned char *key,
  unsigned char *value)
 {
@@ -663,6 +682,7 @@ static int dump_map_elem(int fd, void *key, void *value,
 json_writer_t *btf_wtr)
 {
int num_elems = 0;
+   int lookup_errno;
 
if (!bpf_map_lookup_elem(fd, key, value)) {
if (json_output) {
@@ -685,6 +705,8 @@ static int dump_map_elem(int fd, void *key, void *value,
}
 
/* lookup error handling */
+   lookup_errno = errno;
+
if (map_is_map_of_maps(map_info->type) ||
map_is_map_of_progs(map_info->type))
return 0;
@@ -694,13 +716,10 @@ static int dump_map_elem(int fd, void *key, void *value,
print_hex_data_json(key, map_info->key_size);
jsonw_name(json_wtr, "value");
jsonw_start_object(json_wtr);
-   jsonw_string_field(json_wtr, "error",
-  "can't lookup element");
+   jsonw_string_field(json_wtr, "error", strerror(lookup_errno));
jsonw_end_object(json_wtr);
} else {
-   p_info("can't lookup element with key: ");
-   fprint_hex(stderr, key, map_info->key_size, " ");
-   fprintf(stderr, "\n");
+   print_entry_error(map_info, key, strerror(lookup_errno));
}
 
return 0;
-- 
2.17.1




[PATCH bpf-next 3/6] tools/bpf: bpftool, split the function do_dump()

2018-10-08 Thread Prashant Bhole
do_dump() function in bpftool/map.c has deep indentations. In order
to reduce deep indent, let's move element printing code out of
do_dump() into dump_map_elem() function.

Signed-off-by: Prashant Bhole 
Acked-by: Jakub Kicinski 
Acked-by: Alexei Starovoitov 
---
 tools/bpf/bpftool/map.c | 83 -
 1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 6003e9598973..28d365435fea 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -658,6 +658,54 @@ static int do_show(int argc, char **argv)
return errno == ENOENT ? 0 : -1;
 }
 
+static int dump_map_elem(int fd, void *key, void *value,
+struct bpf_map_info *map_info, struct btf *btf,
+json_writer_t *btf_wtr)
+{
+   int num_elems = 0;
+
+   if (!bpf_map_lookup_elem(fd, key, value)) {
+   if (json_output) {
+   print_entry_json(map_info, key, value, btf);
+   } else {
+   if (btf) {
+   struct btf_dumper d = {
+   .btf = btf,
+   .jw = btf_wtr,
+   .is_plain_text = true,
+   };
+
+   do_dump_btf(, map_info, key, value);
+   } else {
+   print_entry_plain(map_info, key, value);
+   }
+   num_elems++;
+   }
+   return num_elems;
+   }
+
+   /* lookup error handling */
+   if (map_is_map_of_maps(map_info->type) ||
+   map_is_map_of_progs(map_info->type))
+   return 0;
+
+   if (json_output) {
+   jsonw_name(json_wtr, "key");
+   print_hex_data_json(key, map_info->key_size);
+   jsonw_name(json_wtr, "value");
+   jsonw_start_object(json_wtr);
+   jsonw_string_field(json_wtr, "error",
+  "can't lookup element");
+   jsonw_end_object(json_wtr);
+   } else {
+   p_info("can't lookup element with key: ");
+   fprint_hex(stderr, key, map_info->key_size, " ");
+   fprintf(stderr, "\n");
+   }
+
+   return 0;
+}
+
 static int do_dump(int argc, char **argv)
 {
struct bpf_map_info info = {};
@@ -713,40 +761,7 @@ static int do_dump(int argc, char **argv)
err = 0;
break;
}
-
-   if (!bpf_map_lookup_elem(fd, key, value)) {
-   if (json_output)
-   print_entry_json(, key, value, btf);
-   else
-   if (btf) {
-   struct btf_dumper d = {
-   .btf = btf,
-   .jw = btf_wtr,
-   .is_plain_text = true,
-   };
-
-   do_dump_btf(, , key, value);
-   } else {
-   print_entry_plain(, key, value);
-   }
-   num_elems++;
-   } else if (!map_is_map_of_maps(info.type) &&
-  !map_is_map_of_progs(info.type)) {
-   if (json_output) {
-   jsonw_name(json_wtr, "key");
-   print_hex_data_json(key, info.key_size);
-   jsonw_name(json_wtr, "value");
-   jsonw_start_object(json_wtr);
-   jsonw_string_field(json_wtr, "error",
-  "can't lookup element");
-   jsonw_end_object(json_wtr);
-   } else {
-   p_info("can't lookup element with key: ");
-   fprint_hex(stderr, key, info.key_size, " ");
-   fprintf(stderr, "\n");
-   }
-   }
-
+   num_elems += dump_map_elem(fd, key, value, , btf, btf_wtr);
prev_key = key;
}
 
-- 
2.17.1




[PATCH bpf-next 2/6] bpf: return EOPNOTSUPP when map lookup isn't supported

2018-10-08 Thread Prashant Bhole
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/sockmap.c  | 2 +-
 kernel/bpf/stackmap.c | 2 +-
 kernel/bpf/xskmap.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index dded84cbe814..24583da9ffd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map)
 
 static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* only called from syscall */
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index d37a1a0a6e1e..5d0677d808ae 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2096,7 +2096,7 @@ int sockmap_get_from_fd(const union bpf_attr *attr, int 
type,
 
 static void *sock_map_lookup(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 static int sock_map_update_elem(struct bpf_map *map,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 8061a439ef18..b2ade10f7ec3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -505,7 +505,7 @@ const struct bpf_func_proto bpf_get_stack_proto = {
 /* Called from eBPF program */
 static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* Called from syscall */
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 9f8463afda9c..ef0b7b6ef8a5 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -154,7 +154,7 @@ void __xsk_map_flush(struct bpf_map *map)
 
 static void *xsk_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
-- 
2.17.1




[PATCH bpf-next 5/6] selftests/bpf: test_verifier, change names of fixup maps

2018-10-08 Thread Prashant Bhole
Currently fixup map are named like fixup_map1, fixup_map2, and so on.
As suggested by Alexei let's change change map names such that we can
identify map type by looking at the name.

This patch is basically a find and replace change:
fixup_map1  ->  fixup_map_hash_8b
fixup_map2  ->  fixup_map_hash_48b
fixup_map3  ->  fixup_map_hash_16b
fixup_map4  ->  fixup_map_array_48b

Suggested-by: Alexei Starovoitov 
Signed-off-by: Prashant Bhole 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 380 ++--
 1 file changed, 190 insertions(+), 190 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index bc9cd8537467..65ae44c85d27 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -61,10 +61,10 @@ static bool unpriv_disabled = false;
 struct bpf_test {
const char *descr;
struct bpf_insn insns[MAX_INSNS];
-   int fixup_map1[MAX_FIXUPS];
-   int fixup_map2[MAX_FIXUPS];
-   int fixup_map3[MAX_FIXUPS];
-   int fixup_map4[MAX_FIXUPS];
+   int fixup_map_hash_8b[MAX_FIXUPS];
+   int fixup_map_hash_48b[MAX_FIXUPS];
+   int fixup_map_hash_16b[MAX_FIXUPS];
+   int fixup_map_array_48b[MAX_FIXUPS];
int fixup_prog1[MAX_FIXUPS];
int fixup_prog2[MAX_FIXUPS];
int fixup_map_in_map[MAX_FIXUPS];
@@ -876,7 +876,7 @@ static struct bpf_test tests[] = {
 BPF_FUNC_map_lookup_elem),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 2 },
+   .fixup_map_hash_8b = { 2 },
.errstr = "invalid indirect read from stack",
.result = REJECT,
},
@@ -1110,7 +1110,7 @@ static struct bpf_test tests[] = {
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 3 },
+   .fixup_map_hash_8b = { 3 },
.errstr = "R0 invalid mem access 'map_value_or_null'",
.result = REJECT,
},
@@ -1127,7 +1127,7 @@ static struct bpf_test tests[] = {
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 3 },
+   .fixup_map_hash_8b = { 3 },
.errstr = "misaligned value access",
.result = REJECT,
.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
@@ -1147,7 +1147,7 @@ static struct bpf_test tests[] = {
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 3 },
+   .fixup_map_hash_8b = { 3 },
.errstr = "R0 invalid mem access",
.errstr_unpriv = "R0 leaks addr",
.result = REJECT,
@@ -1237,7 +1237,7 @@ static struct bpf_test tests[] = {
 BPF_FUNC_map_delete_elem),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 24 },
+   .fixup_map_hash_8b = { 24 },
.errstr_unpriv = "R1 pointer comparison",
.result_unpriv = REJECT,
.result = ACCEPT,
@@ -1391,7 +1391,7 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, pkt_type)),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 4 },
+   .fixup_map_hash_8b = { 4 },
.errstr = "different pointers",
.errstr_unpriv = "R1 pointer comparison",
.result = REJECT,
@@ -1414,7 +1414,7 @@ static struct bpf_test tests[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JA, 0, 0, -12),
},
-   .fixup_map1 = { 6 },
+   .fixup_map_hash_8b = { 6 },
.errstr = "different pointers",
.errstr_unpriv = "R1 pointer comparison",
.result = REJECT,
@@ -1438,7 +1438,7 @@ static struct bpf_test tests[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_JMP_IMM(BPF_JA, 0, 0, -13),
},
-   .fixup_map1 = { 7 },
+   .fixup_map_hash_8b = { 7 },
.errstr = "different pointers",
.errstr_unpriv = "R1 pointer comparison",
.result = REJECT,
@@ -2575,7 +2575,7 @@ static struct bpf_test tests[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
-   .fixup_map1 = { 3 },
+   .fixup_map_hash_8b = { 3 },
   

[PATCH bpf-next 1/6] bpf: error handling when map_lookup_elem isn't supported

2018-10-08 Thread Prashant Bhole
The error value returned by map_lookup_elem doesn't differentiate
whether lookup was failed because of invalid key or lookup is not
supported.

Lets add handling for -EOPNOTSUPP return value of map_lookup_elem()
method of map, with expectation from map's implementation that it
should return -EOPNOTSUPP if lookup is not supported.

The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set
to EOPNOTSUPP if map lookup is not supported.

Signed-off-by: Prashant Bhole 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/syscall.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5742df21598c..4f416234251f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -719,10 +719,15 @@ static int map_lookup_elem(union bpf_attr *attr)
} else {
rcu_read_lock();
ptr = map->ops->map_lookup_elem(map, key);
-   if (ptr)
+   if (IS_ERR(ptr)) {
+   err = PTR_ERR(ptr);
+   } else if (!ptr) {
+   err = -ENOENT;
+   } else {
+   err = 0;
memcpy(value, ptr, value_size);
+   }
rcu_read_unlock();
-   err = ptr ? 0 : -ENOENT;
}
 
if (err)
-- 
2.17.1




[PATCH bpf-next 0/6] Error handling when map lookup isn't supported

2018-10-08 Thread Prashant Bhole
Currently when map a lookup fails, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.

In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.

Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup

Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup

Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.

Patch 4: Changes in bpftool to print strerror() message when lookup
error is occured. This will result in appropriate message like
"Operation not supported" when map doesn't support lookup.

Patch 5: test_verifier: change fixup map naming convention as
suggested by Alexei

Patch 6: Added verifier tests to check whether verifier rejects call 
to bpf_map_lookup_elem from bpf program. For all map types those
do not support map lookup.

Prashant Bhole (6):
  bpf: error handling when map_lookup_elem isn't supported
  bpf: return EOPNOTSUPP when map lookup isn't supported
  tools/bpf: bpftool, split the function do_dump()
  tools/bpf: bpftool, print strerror when map lookup error occurs
  selftests/bpf: test_verifier, change names of fixup maps
  selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf
prog

 kernel/bpf/arraymap.c   |   2 +-
 kernel/bpf/sockmap.c|   2 +-
 kernel/bpf/stackmap.c   |   2 +-
 kernel/bpf/syscall.c|   9 +-
 kernel/bpf/xskmap.c |   2 +-
 tools/bpf/bpftool/map.c | 102 ++--
 tools/testing/selftests/bpf/test_verifier.c | 501 
 7 files changed, 389 insertions(+), 231 deletions(-)

-- 
2.17.1




Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported

2018-10-08 Thread Prashant Bhole




On 10/9/2018 9:43 AM, Daniel Borkmann wrote:

On 10/09/2018 02:02 AM, Prashant Bhole wrote:

On 10/6/2018 3:35 AM, Alexei Starovoitov wrote:

On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote:

Currently when map a lookup fails, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.

In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.

Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup

Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup

Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.

Patch 4: Changes in bpftool to print strerror() message when lookup
error is occured. This will result in appropriate message like
"Operation not supported" when map doesn't support lookup.

Patch 5: test_verifier: change fixup map naming convention as
suggested by Alexei

Patch 6: Added verifier tests to check whether verifier rejects call
to bpf_map_lookup_elem from bpf program. For all map types those
do not support map lookup.


for the set:
Acked-by: Alexei Starovoitov 


Thanks. Is there any reason this series did not get posted on netdev-list and 
can not be seen in the patchwork?


Hmm, could you repost to netdev? Perhaps a netdev or patchwork issue that
it did not land there. I just double-checked and it's indeed not present.



Shall I repost with the same version and Alexei's Acked-by for the series?

-Prashant




Re: [PATCH bpf-next 0/6] Error handling when map lookup isn't supported

2018-10-08 Thread Prashant Bhole




On 10/6/2018 3:35 AM, Alexei Starovoitov wrote:

On Fri, Oct 05, 2018 at 12:35:55PM +0900, Prashant Bhole wrote:

Currently when map a lookup fails, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.

In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.

Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup

Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup

Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.

Patch 4: Changes in bpftool to print strerror() message when lookup
error is occured. This will result in appropriate message like
"Operation not supported" when map doesn't support lookup.

Patch 5: test_verifier: change fixup map naming convention as
suggested by Alexei

Patch 6: Added verifier tests to check whether verifier rejects call
to bpf_map_lookup_elem from bpf program. For all map types those
do not support map lookup.


for the set:
Acked-by: Alexei Starovoitov 


Thanks. Is there any reason this series did not get posted on 
netdev-list and can not be seen in the patchwork?




Re: [RFC v2 bpf-next 5/5] selftests/bpf: verifier, check bpf_map_lookup_elem access in bpf prog

2018-10-04 Thread Prashant Bhole




On 10/5/2018 10:51 AM, Alexei Starovoitov wrote:

On Tue, Oct 02, 2018 at 02:35:19PM +0900, Prashant Bhole wrote:

map_lookup_elem isn't supported by certain map types like:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Let's add verfier tests to check whether verifier prevents
bpf_map_lookup_elem call on above programs from bpf program.

Signed-off-by: Prashant Bhole 
---
  tools/testing/selftests/bpf/test_verifier.c | 121 +++-
  1 file changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index c7d25f23baf9..afa7e67f66e4 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -47,7 +47,7 @@
  
  #define MAX_INSNS	BPF_MAXINSNS

  #define MAX_FIXUPS8
-#define MAX_NR_MAPS8
+#define MAX_NR_MAPS13
  #define POINTER_VALUE 0xcafe4all
  #define TEST_DATA_LEN 64
  
@@ -64,6 +64,10 @@ struct bpf_test {

int fixup_map2[MAX_FIXUPS];
int fixup_map3[MAX_FIXUPS];
int fixup_map4[MAX_FIXUPS];
+   int fixup_map5[MAX_FIXUPS];
+   int fixup_map6[MAX_FIXUPS];
+   int fixup_map7[MAX_FIXUPS];
+   int fixup_map8[MAX_FIXUPS];
int fixup_prog1[MAX_FIXUPS];
int fixup_prog2[MAX_FIXUPS];
int fixup_map_in_map[MAX_FIXUPS];
@@ -4391,6 +4395,85 @@ static struct bpf_test tests[] = {
.errstr = "invalid access to packet",
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
+   {
+   "prevent map lookup in sockmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map5 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 15 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in sockhash",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map6 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 18 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in xskmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map7 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 17 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "prevent map lookup in stack trace",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map8 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_PERF_EVENT,
+   },
+   {
+   "prevent map lookup in prog array",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+

Re: [RFC v2 bpf-next 2/5] bpf: return EOPNOTSUPP when map lookup isn't supported

2018-10-04 Thread Prashant Bhole




On 10/5/2018 9:10 AM, Alexei Starovoitov wrote:

On Tue, Oct 02, 2018 at 02:35:16PM +0900, Prashant Bhole wrote:

Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole 
---
  kernel/bpf/arraymap.c | 2 +-
  kernel/bpf/sockmap.c  | 2 +-
  kernel/bpf/stackmap.c | 2 +-
  kernel/bpf/xskmap.c   | 2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index dded84cbe814..24583da9ffd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map)
  
  static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)

  {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);


last time we discussed that the verifier should make sure that
lookup is not called from bpf program for these map types.
I'd like to see test cases in test_verifier.c for these map types
to make sure we don't introduce crashes.


Hi Alexei,
Patch 05/05 adds such tests in test_verifier.c. Please review those 
changes. Thank you.


-Prashant



[RFC v2 bpf-next 2/5] bpf: return EOPNOTSUPP when map lookup isn't supported

2018-10-01 Thread Prashant Bhole
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole 
---
 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/sockmap.c  | 2 +-
 kernel/bpf/stackmap.c | 2 +-
 kernel/bpf/xskmap.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index dded84cbe814..24583da9ffd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map)
 
 static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* only called from syscall */
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index d37a1a0a6e1e..5d0677d808ae 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2096,7 +2096,7 @@ int sockmap_get_from_fd(const union bpf_attr *attr, int 
type,
 
 static void *sock_map_lookup(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 static int sock_map_update_elem(struct bpf_map *map,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 8061a439ef18..b2ade10f7ec3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -505,7 +505,7 @@ const struct bpf_func_proto bpf_get_stack_proto = {
 /* Called from eBPF program */
 static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* Called from syscall */
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 9f8463afda9c..ef0b7b6ef8a5 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -154,7 +154,7 @@ void __xsk_map_flush(struct bpf_map *map)
 
 static void *xsk_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
-- 
2.17.1




[RFC v2 bpf-next 5/5] selftests/bpf: verifier, check bpf_map_lookup_elem access in bpf prog

2018-10-01 Thread Prashant Bhole
map_lookup_elem isn't supported by certain map types like:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Let's add verfier tests to check whether verifier prevents
bpf_map_lookup_elem call on above programs from bpf program.

Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_verifier.c | 121 +++-
 1 file changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index c7d25f23baf9..afa7e67f66e4 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -47,7 +47,7 @@
 
 #define MAX_INSNS  BPF_MAXINSNS
 #define MAX_FIXUPS 8
-#define MAX_NR_MAPS8
+#define MAX_NR_MAPS13
 #define POINTER_VALUE  0xcafe4all
 #define TEST_DATA_LEN  64
 
@@ -64,6 +64,10 @@ struct bpf_test {
int fixup_map2[MAX_FIXUPS];
int fixup_map3[MAX_FIXUPS];
int fixup_map4[MAX_FIXUPS];
+   int fixup_map5[MAX_FIXUPS];
+   int fixup_map6[MAX_FIXUPS];
+   int fixup_map7[MAX_FIXUPS];
+   int fixup_map8[MAX_FIXUPS];
int fixup_prog1[MAX_FIXUPS];
int fixup_prog2[MAX_FIXUPS];
int fixup_map_in_map[MAX_FIXUPS];
@@ -4391,6 +4395,85 @@ static struct bpf_test tests[] = {
.errstr = "invalid access to packet",
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
+   {
+   "prevent map lookup in sockmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map5 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 15 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in sockhash",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map6 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 18 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in xskmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map7 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 17 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "prevent map lookup in stack trace",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map8 = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_PERF_EVENT,
+   },
+   {
+   "prevent map lookup in prog array",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_e

[RFC v2 bpf-next 4/5] tools/bpf: bpftool, print strerror when map lookup error occurs

2018-10-01 Thread Prashant Bhole
Since map lookup error can be ENOENT or EOPNOTSUPP, let's print
strerror() as error message in normal and JSON output.

This patch adds helper function print_entry_error() to print
entry from lookup error occurs

Example: Following example dumps a map which does not support lookup.

Output before:
root# bpftool map -jp dump id 40
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
}
]

root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements

Output after changes:
root# bpftool map dump -jp  id 45
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "Operation not supported"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "Operation not supported"
}
]

root# bpftool map dump id 45
key:
0a 00 00 00
value:
Operation not supported
key:
0b 00 00 00
value:
Operation not supported
Found 0 elements

Signed-off-by: Prashant Bhole 
---
 tools/bpf/bpftool/map.c | 29 -
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 28d365435fea..9f5de48f8a99 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -336,6 +336,25 @@ static void print_entry_json(struct bpf_map_info *info, 
unsigned char *key,
jsonw_end_object(json_wtr);
 }
 
+static void print_entry_error(struct bpf_map_info *info, unsigned char *key,
+ const char *value)
+{
+   int value_size = strlen(value);
+   bool single_line, break_names;
+
+   break_names = info->key_size > 16 || value_size > 16;
+   single_line = info->key_size + value_size <= 24 && !break_names;
+
+   printf("key:%c", break_names ? '\n' : ' ');
+   fprint_hex(stdout, key, info->key_size, " ");
+
+   printf(single_line ? "  " : "\n");
+
+   printf("value:%c%s", break_names ? '\n' : ' ', value);
+
+   printf("\n");
+}
+
 static void print_entry_plain(struct bpf_map_info *info, unsigned char *key,
  unsigned char *value)
 {
@@ -663,6 +682,7 @@ static int dump_map_elem(int fd, void *key, void *value,
 json_writer_t *btf_wtr)
 {
int num_elems = 0;
+   int lookup_errno;
 
if (!bpf_map_lookup_elem(fd, key, value)) {
if (json_output) {
@@ -685,6 +705,8 @@ static int dump_map_elem(int fd, void *key, void *value,
}
 
/* lookup error handling */
+   lookup_errno = errno;
+
if (map_is_map_of_maps(map_info->type) ||
map_is_map_of_progs(map_info->type))
return 0;
@@ -694,13 +716,10 @@ static int dump_map_elem(int fd, void *key, void *value,
print_hex_data_json(key, map_info->key_size);
jsonw_name(json_wtr, "value");
jsonw_start_object(json_wtr);
-   jsonw_string_field(json_wtr, "error",
-  "can't lookup element");
+   jsonw_string_field(json_wtr, "error", strerror(lookup_errno));
jsonw_end_object(json_wtr);
} else {
-   p_info("can't lookup element with key: ");
-   fprint_hex(stderr, key, map_info->key_size, " ");
-   fprintf(stderr, "\n");
+   print_entry_error(map_info, key, strerror(lookup_errno));
}
 
return 0;
-- 
2.17.1




[RFC v2 bpf-next 3/5] tools/bpf: bpftool, split the function do_dump()

2018-10-01 Thread Prashant Bhole
do_dump() function in bpftool/map.c has deep indentations. In order
to reduce deep indent, let's move element printing code out of
do_dump() into dump_map_elem() function.

Signed-off-by: Prashant Bhole 
---
 tools/bpf/bpftool/map.c | 83 -
 1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 6003e9598973..28d365435fea 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -658,6 +658,54 @@ static int do_show(int argc, char **argv)
return errno == ENOENT ? 0 : -1;
 }
 
+static int dump_map_elem(int fd, void *key, void *value,
+struct bpf_map_info *map_info, struct btf *btf,
+json_writer_t *btf_wtr)
+{
+   int num_elems = 0;
+
+   if (!bpf_map_lookup_elem(fd, key, value)) {
+   if (json_output) {
+   print_entry_json(map_info, key, value, btf);
+   } else {
+   if (btf) {
+   struct btf_dumper d = {
+   .btf = btf,
+   .jw = btf_wtr,
+   .is_plain_text = true,
+   };
+
+   do_dump_btf(, map_info, key, value);
+   } else {
+   print_entry_plain(map_info, key, value);
+   }
+   num_elems++;
+   }
+   return num_elems;
+   }
+
+   /* lookup error handling */
+   if (map_is_map_of_maps(map_info->type) ||
+   map_is_map_of_progs(map_info->type))
+   return 0;
+
+   if (json_output) {
+   jsonw_name(json_wtr, "key");
+   print_hex_data_json(key, map_info->key_size);
+   jsonw_name(json_wtr, "value");
+   jsonw_start_object(json_wtr);
+   jsonw_string_field(json_wtr, "error",
+  "can't lookup element");
+   jsonw_end_object(json_wtr);
+   } else {
+   p_info("can't lookup element with key: ");
+   fprint_hex(stderr, key, map_info->key_size, " ");
+   fprintf(stderr, "\n");
+   }
+
+   return 0;
+}
+
 static int do_dump(int argc, char **argv)
 {
struct bpf_map_info info = {};
@@ -713,40 +761,7 @@ static int do_dump(int argc, char **argv)
err = 0;
break;
}
-
-   if (!bpf_map_lookup_elem(fd, key, value)) {
-   if (json_output)
-   print_entry_json(, key, value, btf);
-   else
-   if (btf) {
-   struct btf_dumper d = {
-   .btf = btf,
-   .jw = btf_wtr,
-   .is_plain_text = true,
-   };
-
-   do_dump_btf(, , key, value);
-   } else {
-   print_entry_plain(, key, value);
-   }
-   num_elems++;
-   } else if (!map_is_map_of_maps(info.type) &&
-  !map_is_map_of_progs(info.type)) {
-   if (json_output) {
-   jsonw_name(json_wtr, "key");
-   print_hex_data_json(key, info.key_size);
-   jsonw_name(json_wtr, "value");
-   jsonw_start_object(json_wtr);
-   jsonw_string_field(json_wtr, "error",
-  "can't lookup element");
-   jsonw_end_object(json_wtr);
-   } else {
-   p_info("can't lookup element with key: ");
-   fprint_hex(stderr, key, info.key_size, " ");
-   fprintf(stderr, "\n");
-   }
-   }
-
+   num_elems += dump_map_elem(fd, key, value, , btf, btf_wtr);
prev_key = key;
}
 
-- 
2.17.1




[RFC v2 bpf-next 1/5] bpf: error handling when map_lookup_elem isn't supported

2018-10-01 Thread Prashant Bhole
The error value returned by map_lookup_elem doesn't differentiate
whether lookup was failed because of invalid key or lookup is not
supported.

Lets add handling for -EOPNOTSUPP return value of map_lookup_elem()
method of map, with expectation from map's implementation that it
should return -EOPNOTSUPP if lookup is not supported.

The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set
to EOPNOTSUPP if map lookup is not supported.

Signed-off-by: Prashant Bhole 
---
 kernel/bpf/syscall.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5742df21598c..4f416234251f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -719,10 +719,15 @@ static int map_lookup_elem(union bpf_attr *attr)
} else {
rcu_read_lock();
ptr = map->ops->map_lookup_elem(map, key);
-   if (ptr)
+   if (IS_ERR(ptr)) {
+   err = PTR_ERR(ptr);
+   } else if (!ptr) {
+   err = -ENOENT;
+   } else {
+   err = 0;
memcpy(value, ptr, value_size);
+   }
rcu_read_unlock();
-   err = ptr ? 0 : -ENOENT;
}
 
if (err)
-- 
2.17.1




[RFC v2 bpf-next 0/5] Error handling when map lookup isn't supported

2018-10-01 Thread Prashant Bhole
Currently when map a lookup fails, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.

In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.

Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup

Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup

Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.

Patch 4: Changes in bpftool to print strerror() message when lookup
error is occured. This will result in appropriate message like
"Operation not supported" when map doesn't support lookup.

Patch 5: Added verifier tests to check whether verifier rejects call 
to bpf_map_lookup_elem from bpf program. For all map types those
do not support map lookup.

v2: 
- bpftool: all nit-pick fixes pointed out by Jakub
- bpftool: removed usage of error strings. Now using strerror(),
  suggested by Jakub
- added tests in verifier_tests, suggested by Alexei


Prashant Bhole (5):
  bpf: error handling when map_lookup_elem isn't supported
  bpf: return EOPNOTSUPP when map lookup isn't supported
  tools/bpf: bpftool, split the function do_dump()
  tools/bpf: bpftool, print strerror when map lookup error occurs
  selftests/bpf: verifier, check bpf_map_lookup_elem access in bpf prog

 kernel/bpf/arraymap.c   |   2 +-
 kernel/bpf/sockmap.c|   2 +-
 kernel/bpf/stackmap.c   |   2 +-
 kernel/bpf/syscall.c|   9 +-
 kernel/bpf/xskmap.c |   2 +-
 tools/bpf/bpftool/map.c | 102 +++--
 tools/testing/selftests/bpf/test_verifier.c | 121 +++-
 7 files changed, 199 insertions(+), 41 deletions(-)

-- 
2.17.1




Re: [RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed

2018-10-01 Thread Prashant Bhole




On 9/21/2018 12:59 AM, Jakub Kicinski wrote:

On Thu, 20 Sep 2018 14:04:19 +0900, Prashant Bhole wrote:

On 9/20/2018 12:29 AM, Jakub Kicinski wrote:

On Wed, 19 Sep 2018 16:51:43 +0900, Prashant Bhole wrote:

Let's add a check for EOPNOTSUPP error when map lookup is failed.
Also in case map doesn't support lookup, the output of map dump is
changed from "can't lookup element" to "lookup not supported for
this map".

Patch adds function print_entry_error() function to print the error
value.

Following example dumps a map which does not support lookup.

Output before:
root# bpftool map -jp dump id 40
[
  "key": ["0x0a","0x00","0x00","0x00"
  ],
  "value": {
  "error": "can\'t lookup element"
  },
  "key": ["0x0b","0x00","0x00","0x00"
  ],
  "value": {
  "error": "can\'t lookup element"
  }
]

root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements

Output after changes:
root# bpftool map dump -jp  id 45
[
  "key": ["0x0a","0x00","0x00","0x00"
  ],
  "value": {
  "error": "lookup not supported for this map"
  },
  "key": ["0x0b","0x00","0x00","0x00"
  ],
  "value": {
  "error": "lookup not supported for this map"
  }
]

root# bpftool map dump id 45
key:
0a 00 00 00
value:
lookup not supported for this map
key:
0b 00 00 00
value:
lookup not supported for this map
Found 0 elements


Nice improvement, thanks for the changes!  I wonder what your thoughts
would be on just printing some form of "lookup not supported for this
map" only once?  It seems slightly like repeated information - if
lookup is not supported for one key it likely won't be for other keys
too, so we could shorten the output.  Would that make sense?
   

Signed-off-by: Prashant Bhole 
---
   tools/bpf/bpftool/main.h |  5 +
   tools/bpf/bpftool/map.c  | 35 ++-
   2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 40492cdc4e53..1a8c683f949b 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -46,6 +46,11 @@
   
   #include "json_writer.h"
   
+#define ERR_CANNOT_LOOKUP \

+   "can't lookup element"
+#define ERR_LOOKUP_NOT_SUPPORTED \
+   "lookup not supported for this map"


Do we need these?  Are we going to reused them in more parts of the
code?


These are used only once. These can be used in do_lookup(). Currently
do_lookup() prints strerror(errno) when lookup is failed. Shall I change
that do_lookup() output?


I actually prefer to stick to strerror(), the standard errors more
clearly correlate with what happened in my mind (i.e. "Operation not
supported" == kernel sent EOPNOTSUPP).   strerror() may also print in
local language if translation/localization matters.

We could even use strerr() in dump_map_elem() but up to you.  The one
in do_lookup() I'd prefer to leave be ;)


Sorry for the late reply.
In v2 I have removed the error strings altogether. As you suggested 
output will be strerror(). Also added verifier tests as Alexei 
suggested. I am sending the RFC-v2 series soon.

Thanks.

-Prashant



[PATCH bpf-next] samples/bpf: fix compilation failure

2018-09-20 Thread Prashant Bhole
following commit:
commit d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
added struct bpf_flow_keys which conflicts with the struct with
same name in sockex2_kern.c and sockex3_kern.c

similar to commit:
commit 534e0e52bc23 ("samples/bpf: fix a compilation failure")
we tried the rename it "flow_keys" but it also conflicted with struct
having same name in include/net/flow_dissector.h. Hence renaming the
struct to "flow_key_record". Also, this commit doesn't fix the
compilation error completely because the similar struct is present in
sockex3_kern.c. Hence renaming it in both files sockex3_user.c and
sockex3_kern.c

Signed-off-by: Prashant Bhole 
---
 samples/bpf/sockex2_kern.c | 11 ++-
 samples/bpf/sockex3_kern.c |  8 
 samples/bpf/sockex3_user.c |  4 ++--
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/samples/bpf/sockex2_kern.c b/samples/bpf/sockex2_kern.c
index f58acfc92556..f2f9dbc021b0 100644
--- a/samples/bpf/sockex2_kern.c
+++ b/samples/bpf/sockex2_kern.c
@@ -14,7 +14,7 @@ struct vlan_hdr {
__be16 h_vlan_encapsulated_proto;
 };
 
-struct bpf_flow_keys {
+struct flow_key_record {
__be32 src;
__be32 dst;
union {
@@ -59,7 +59,7 @@ static inline __u32 ipv6_addr_hash(struct __sk_buff *ctx, 
__u64 off)
 }
 
 static inline __u64 parse_ip(struct __sk_buff *skb, __u64 nhoff, __u64 
*ip_proto,
-struct bpf_flow_keys *flow)
+struct flow_key_record *flow)
 {
__u64 verlen;
 
@@ -83,7 +83,7 @@ static inline __u64 parse_ip(struct __sk_buff *skb, __u64 
nhoff, __u64 *ip_proto
 }
 
 static inline __u64 parse_ipv6(struct __sk_buff *skb, __u64 nhoff, __u64 
*ip_proto,
-  struct bpf_flow_keys *flow)
+  struct flow_key_record *flow)
 {
*ip_proto = load_byte(skb,
  nhoff + offsetof(struct ipv6hdr, nexthdr));
@@ -96,7 +96,8 @@ static inline __u64 parse_ipv6(struct __sk_buff *skb, __u64 
nhoff, __u64 *ip_pro
return nhoff;
 }
 
-static inline bool flow_dissector(struct __sk_buff *skb, struct bpf_flow_keys 
*flow)
+static inline bool flow_dissector(struct __sk_buff *skb,
+ struct flow_key_record *flow)
 {
__u64 nhoff = ETH_HLEN;
__u64 ip_proto;
@@ -198,7 +199,7 @@ struct bpf_map_def SEC("maps") hash_map = {
 SEC("socket2")
 int bpf_prog2(struct __sk_buff *skb)
 {
-   struct bpf_flow_keys flow = {};
+   struct flow_key_record flow = {};
struct pair *value;
u32 key;
 
diff --git a/samples/bpf/sockex3_kern.c b/samples/bpf/sockex3_kern.c
index 95907f8d2b17..c527b57d3ec8 100644
--- a/samples/bpf/sockex3_kern.c
+++ b/samples/bpf/sockex3_kern.c
@@ -61,7 +61,7 @@ struct vlan_hdr {
__be16 h_vlan_encapsulated_proto;
 };
 
-struct bpf_flow_keys {
+struct flow_key_record {
__be32 src;
__be32 dst;
union {
@@ -88,7 +88,7 @@ static inline __u32 ipv6_addr_hash(struct __sk_buff *ctx, 
__u64 off)
 }
 
 struct globals {
-   struct bpf_flow_keys flow;
+   struct flow_key_record flow;
 };
 
 struct bpf_map_def SEC("maps") percpu_map = {
@@ -114,14 +114,14 @@ struct pair {
 
 struct bpf_map_def SEC("maps") hash_map = {
.type = BPF_MAP_TYPE_HASH,
-   .key_size = sizeof(struct bpf_flow_keys),
+   .key_size = sizeof(struct flow_key_record),
.value_size = sizeof(struct pair),
.max_entries = 1024,
 };
 
 static void update_stats(struct __sk_buff *skb, struct globals *g)
 {
-   struct bpf_flow_keys key = g->flow;
+   struct flow_key_record key = g->flow;
struct pair *value;
 
value = bpf_map_lookup_elem(_map, );
diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c
index 22f74d0e1493..9d02e0404719 100644
--- a/samples/bpf/sockex3_user.c
+++ b/samples/bpf/sockex3_user.c
@@ -13,7 +13,7 @@
 #define PARSE_IP_PROG_FD (prog_fd[0])
 #define PROG_ARRAY_FD (map_fd[0])
 
-struct flow_keys {
+struct flow_key_record {
__be32 src;
__be32 dst;
union {
@@ -64,7 +64,7 @@ int main(int argc, char **argv)
(void) f;
 
for (i = 0; i < 5; i++) {
-   struct flow_keys key = {}, next_key;
+   struct flow_key_record key = {}, next_key;
struct pair value;
 
sleep(1);
-- 
2.17.1




Re: [RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed

2018-09-19 Thread Prashant Bhole




On 9/20/2018 12:29 AM, Jakub Kicinski wrote:

On Wed, 19 Sep 2018 16:51:43 +0900, Prashant Bhole wrote:

Let's add a check for EOPNOTSUPP error when map lookup is failed.
Also in case map doesn't support lookup, the output of map dump is
changed from "can't lookup element" to "lookup not supported for
this map".

Patch adds function print_entry_error() function to print the error
value.

Following example dumps a map which does not support lookup.

Output before:
root# bpftool map -jp dump id 40
[
 "key": ["0x0a","0x00","0x00","0x00"
 ],
 "value": {
 "error": "can\'t lookup element"
 },
 "key": ["0x0b","0x00","0x00","0x00"
 ],
 "value": {
 "error": "can\'t lookup element"
 }
]

root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements

Output after changes:
root# bpftool map dump -jp  id 45
[
 "key": ["0x0a","0x00","0x00","0x00"
 ],
 "value": {
 "error": "lookup not supported for this map"
 },
 "key": ["0x0b","0x00","0x00","0x00"
 ],
 "value": {
 "error": "lookup not supported for this map"
 }
]

root# bpftool map dump id 45
key:
0a 00 00 00
value:
lookup not supported for this map
key:
0b 00 00 00
value:
lookup not supported for this map
Found 0 elements


Nice improvement, thanks for the changes!  I wonder what your thoughts
would be on just printing some form of "lookup not supported for this
map" only once?  It seems slightly like repeated information - if
lookup is not supported for one key it likely won't be for other keys
too, so we could shorten the output.  Would that make sense?


Signed-off-by: Prashant Bhole 
---
  tools/bpf/bpftool/main.h |  5 +
  tools/bpf/bpftool/map.c  | 35 ++-
  2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 40492cdc4e53..1a8c683f949b 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -46,6 +46,11 @@
  
  #include "json_writer.h"
  
+#define ERR_CANNOT_LOOKUP \

+   "can't lookup element"
+#define ERR_LOOKUP_NOT_SUPPORTED \
+   "lookup not supported for this map"


Do we need these?  Are we going to reused them in more parts of the
code?


These are used only once. These can be used in do_lookup(). Currently 
do_lookup() prints strerror(errno) when lookup is failed. Shall I change 
that do_lookup() output?


-Prashant



Re: [RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed

2018-09-19 Thread Prashant Bhole




On 9/20/2018 12:29 AM, Jakub Kicinski wrote:

On Wed, 19 Sep 2018 16:51:43 +0900, Prashant Bhole wrote:

Let's add a check for EOPNOTSUPP error when map lookup is failed.
Also in case map doesn't support lookup, the output of map dump is
changed from "can't lookup element" to "lookup not supported for
this map".

Patch adds function print_entry_error() function to print the error
value.

Following example dumps a map which does not support lookup.

Output before:
root# bpftool map -jp dump id 40
[
 "key": ["0x0a","0x00","0x00","0x00"
 ],
 "value": {
 "error": "can\'t lookup element"
 },
 "key": ["0x0b","0x00","0x00","0x00"
 ],
 "value": {
 "error": "can\'t lookup element"
 }
]

root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements

Output after changes:
root# bpftool map dump -jp  id 45
[
 "key": ["0x0a","0x00","0x00","0x00"
 ],
 "value": {
 "error": "lookup not supported for this map"
 },
 "key": ["0x0b","0x00","0x00","0x00"
 ],
 "value": {
 "error": "lookup not supported for this map"
 }
]

root# bpftool map dump id 45
key:
0a 00 00 00
value:
lookup not supported for this map
key:
0b 00 00 00
value:
lookup not supported for this map
Found 0 elements


Nice improvement, thanks for the changes!  I wonder what your thoughts
would be on just printing some form of "lookup not supported for this
map" only once?  It seems slightly like repeated information - if
lookup is not supported for one key it likely won't be for other keys
too, so we could shorten the output.  Would that make sense?


I too thought that the message is repeated information. One idea was as 
you said, stop iterating once we get EOPNOTSUPP. But only reason for 
keeping this way is that user will at least see dump of keys along with 
it. Also it will be consistent with Json output. What is your opinion on it?


I will fix other things that you pointed out. Thanks!

-Prashant




Signed-off-by: Prashant Bhole 
---
  tools/bpf/bpftool/main.h |  5 +
  tools/bpf/bpftool/map.c  | 35 ++-
  2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 40492cdc4e53..1a8c683f949b 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -46,6 +46,11 @@
  
  #include "json_writer.h"
  
+#define ERR_CANNOT_LOOKUP \

+   "can't lookup element"
+#define ERR_LOOKUP_NOT_SUPPORTED \
+   "lookup not supported for this map"


Do we need these?  Are we going to reused them in more parts of the
code?


  #define ptr_to_u64(ptr)   ((__u64)(unsigned long)(ptr))
  
  #define NEXT_ARG()	({ argc--; argv++; if (argc < 0) usage(); })

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 284e12a289c0..2faccd2098c9 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -333,6 +333,25 @@ static void print_entry_json(struct bpf_map_info *info, 
unsigned char *key,
jsonw_end_object(json_wtr);
  }
  
+static void print_entry_error(struct bpf_map_info *info, unsigned char *key,

+ char *value)
+{
+   bool single_line, break_names;
+   int value_size = strlen(value);


nit: order variables declaration lines to shortest, please.


+
+   break_names = info->key_size > 16 || value_size > 16;
+   single_line = info->key_size + value_size <= 24 && !break_names;
+
+   printf("key:%c", break_names ? '\n' : ' ');
+   fprint_hex(stdout, key, info->key_size, " ");
+
+   printf(single_line ? "  " : "\n");
+
+   printf("value:%c%s", break_names ? '\n' : ' ', value);
+
+   printf("\n");
+}
+
  static void print_entry_plain(struct bpf_map_info *info, unsigned char *key,
  unsigned char *value)
  {
@@ -660,6 +679,8 @@ static int dump_map_elem(int fd, void *key, void *value,
 json_writer_t *btf_wtr)
  {
int num_elems = 0;
+   int lookup_errno;
+   char *errstr;


nit: const char?

  
  	if (!bpf_map_lookup_elem(fd, key, value)) {

if (json_output) {








Re: [RFC bpf-next 3/4] tools/bpf: bpftool, split the function do_dump()

2018-09-19 Thread Prashant Bhole




On 9/20/2018 12:26 AM, Jakub Kicinski wrote:

On Wed, 19 Sep 2018 16:51:42 +0900, Prashant Bhole wrote:

+static int dump_map_elem(int fd, void *key, void *value,
+struct bpf_map_info *map_info, struct btf *btf,
+json_writer_t *btf_wtr)
+{
+   int num_elems = 0;
+
+   if (!bpf_map_lookup_elem(fd, key, value)) {
+   if (json_output) {
+   print_entry_json(map_info, key, value, btf);
+   } else {
+   if (btf) {
+   struct btf_dumper d = {
+   .btf = btf,
+   .jw = btf_wtr,
+   .is_plain_text = true,
+   };
+
+   do_dump_btf(, map_info, key, value);
+   } else {
+   print_entry_plain(map_info, key, value);
+   }
+   num_elems++;
+   }
+   goto out;
+   }
+
+   /* lookup error handling */
+   if (map_is_map_of_maps(map_info->type) ||
+   map_is_map_of_progs(map_info->type))
+   goto out;
+


nit: why not just return?  the goto seems to only do a return anyway,
  is this suggested by some coding style?  Is it to help the
  compiler?  I see people do this from time to time..


Thanks for reviewing. I agree, goto and the label isn't needed. I will 
fix it.


-Prashant


[...]


+out:
+   return num_elems;







Re: [RFC bpf-next 2/4] bpf: return EOPNOTSUPP when map lookup isn't supported

2018-09-19 Thread Prashant Bhole




On 9/20/2018 12:14 AM, Alexei Starovoitov wrote:

On Wed, Sep 19, 2018 at 04:51:41PM +0900, Prashant Bhole wrote:

Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole 
---
  kernel/bpf/arraymap.c | 2 +-
  kernel/bpf/sockmap.c  | 2 +-
  kernel/bpf/stackmap.c | 2 +-
  kernel/bpf/xskmap.c   | 2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index dded84cbe814..24583da9ffd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map)
  
  static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)

  {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
  }


conceptually the set looks good to me.
Please add a test to test_verifier.c to make sure
that these lookup helpers cannot be called from BPF program.
Otherwise this diff may cause crashes.


Thanks for reviewing.
Is the verifier change below sufficient?

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2128,10 +2128,18 @@ static int check_map_func_compatibility(struct 
bpf_verifier_env *env,

if (env->subprog_cnt > 1) {
 			verbose(env, "tail_calls are not allowed in programs with 
bpf-to-bpf calls\n");

return -EINVAL;
}
break;
+   case BPF_FUNC_map_lookup_elem:
+   if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+   map->map_type == BPF_MAP_TYPE_STACK_TRACE ||
+   map->map_type == BPF_MAP_TYPE_XSKMAP ||
+   map->map_type == BPF_MAP_TYPE_SOCKMAP ||
+   map->map_type == BPF_MAP_TYPE_SOCKHASH)
+   goto error;
+   break;
case BPF_FUNC_perf_event_read:
case BPF_FUNC_perf_event_output:
case BPF_FUNC_perf_event_read_value:
if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
goto error;

-Prashant



Re: [RFC bpf-next 2/4] bpf: return EOPNOTSUPP when map lookup isn't supported

2018-09-19 Thread Prashant Bhole




On 9/20/2018 3:40 AM, Mauricio Vasquez wrote:



On 09/19/2018 10:14 AM, Alexei Starovoitov wrote:

On Wed, Sep 19, 2018 at 04:51:41PM +0900, Prashant Bhole wrote:

Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole 
---
  kernel/bpf/arraymap.c | 2 +-
  kernel/bpf/sockmap.c  | 2 +-
  kernel/bpf/stackmap.c | 2 +-
  kernel/bpf/xskmap.c   | 2 +-
  4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index dded84cbe814..24583da9ffd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map)
  static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
  {
-    return NULL;
+    return ERR_PTR(-EOPNOTSUPP);
  }

conceptually the set looks good to me.
Please add a test to test_verifier.c to make sure
that these lookup helpers cannot be called from BPF program.
Otherwise this diff may cause crashes.


I think we can remove all these stub functions that return EOPNOTSUPP or 
EINVAL and return the error in syscall.c if ops->map_[update, delete, 
lookup, ...] is null.
This will require to extend (and test) the verifier to guarantee that 
those helpers are not called in wrong maps, for example 
map_delete_element in array like maps.


Would it make sense?


Thanks for review and suggestion.

I had thought about this way too (except adding restrictions in the 
verifier). There is no strong reason for choosing current implementation.


I thought there must be some reason that those methods are implemented 
and just returning NULL. Also there are no NULL checks for 
map_lookup_elem stub. So I decided to simply change the return value.

If some more people agree with removing stub function idea, I will do it.

-Prashant





[RFC bpf-next 1/4] bpf: error handling when map_lookup_elem isn't supported

2018-09-19 Thread Prashant Bhole
The error value returned by map_lookup_elem doesn't differentiate
whether lookup was failed because of invalid key or lookup is not
supported.

Lets add handling for -EOPNOTSUPP return value of map_lookup_elem()
method of map, with expectation from map's implementation that it
should return -EOPNOTSUPP if lookup is not supported.

The errno for bpf syscall for BPF_MAP_LOOKUP_ELEM command will be set
to EOPNOTSUPP if map lookup is not supported.

Signed-off-by: Prashant Bhole 
---
 kernel/bpf/syscall.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b3c2d09bcf7a..ecb06352b5a0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -716,10 +716,15 @@ static int map_lookup_elem(union bpf_attr *attr)
} else {
rcu_read_lock();
ptr = map->ops->map_lookup_elem(map, key);
-   if (ptr)
+   if (IS_ERR(ptr)) {
+   err = PTR_ERR(ptr);
+   } else if (!ptr) {
+   err = -ENOENT;
+   } else {
+   err = 0;
memcpy(value, ptr, value_size);
+   }
rcu_read_unlock();
-   err = ptr ? 0 : -ENOENT;
}
 
if (err)
-- 
2.17.1




[RFC bpf-next 3/4] tools/bpf: bpftool, split the function do_dump()

2018-09-19 Thread Prashant Bhole
do_dump() function in bpftool/map.c has deep indentations. In order
to reduce deep indent, let's move element printing code out of
do_dump() into dump_map_elem() function.

Signed-off-by: Prashant Bhole 
---
 tools/bpf/bpftool/map.c | 83 -
 1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index af8ad32fa6e9..284e12a289c0 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -655,6 +655,54 @@ static int do_show(int argc, char **argv)
return errno == ENOENT ? 0 : -1;
 }
 
+static int dump_map_elem(int fd, void *key, void *value,
+struct bpf_map_info *map_info, struct btf *btf,
+json_writer_t *btf_wtr)
+{
+   int num_elems = 0;
+
+   if (!bpf_map_lookup_elem(fd, key, value)) {
+   if (json_output) {
+   print_entry_json(map_info, key, value, btf);
+   } else {
+   if (btf) {
+   struct btf_dumper d = {
+   .btf = btf,
+   .jw = btf_wtr,
+   .is_plain_text = true,
+   };
+
+   do_dump_btf(, map_info, key, value);
+   } else {
+   print_entry_plain(map_info, key, value);
+   }
+   num_elems++;
+   }
+   goto out;
+   }
+
+   /* lookup error handling */
+   if (map_is_map_of_maps(map_info->type) ||
+   map_is_map_of_progs(map_info->type))
+   goto out;
+
+   if (json_output) {
+   jsonw_name(json_wtr, "key");
+   print_hex_data_json(key, map_info->key_size);
+   jsonw_name(json_wtr, "value");
+   jsonw_start_object(json_wtr);
+   jsonw_string_field(json_wtr, "error",
+  "can't lookup element");
+   jsonw_end_object(json_wtr);
+   } else {
+   p_info("can't lookup element with key: ");
+   fprint_hex(stderr, key, map_info->key_size, " ");
+   fprintf(stderr, "\n");
+   }
+out:
+   return num_elems;
+}
+
 static int do_dump(int argc, char **argv)
 {
struct bpf_map_info info = {};
@@ -710,40 +758,7 @@ static int do_dump(int argc, char **argv)
err = 0;
break;
}
-
-   if (!bpf_map_lookup_elem(fd, key, value)) {
-   if (json_output)
-   print_entry_json(, key, value, btf);
-   else
-   if (btf) {
-   struct btf_dumper d = {
-   .btf = btf,
-   .jw = btf_wtr,
-   .is_plain_text = true,
-   };
-
-   do_dump_btf(, , key, value);
-   } else {
-   print_entry_plain(, key, value);
-   }
-   num_elems++;
-   } else if (!map_is_map_of_maps(info.type) &&
-  !map_is_map_of_progs(info.type)) {
-   if (json_output) {
-   jsonw_name(json_wtr, "key");
-   print_hex_data_json(key, info.key_size);
-   jsonw_name(json_wtr, "value");
-   jsonw_start_object(json_wtr);
-   jsonw_string_field(json_wtr, "error",
-  "can't lookup element");
-   jsonw_end_object(json_wtr);
-   } else {
-   p_info("can't lookup element with key: ");
-   fprint_hex(stderr, key, info.key_size, " ");
-   fprintf(stderr, "\n");
-   }
-   }
-
+   num_elems += dump_map_elem(fd, key, value, , btf, btf_wtr);
prev_key = key;
}
 
-- 
2.17.1




[RFC bpf-next 4/4] tools/bpf: handle EOPNOTSUPP when map lookup is failed

2018-09-19 Thread Prashant Bhole
Let's add a check for EOPNOTSUPP error when map lookup is failed.
Also in case map doesn't support lookup, the output of map dump is
changed from "can't lookup element" to "lookup not supported for
this map".

Patch adds function print_entry_error() function to print the error
value.

Following example dumps a map which does not support lookup.

Output before:
root# bpftool map -jp dump id 40
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "can\'t lookup element"
}
]

root# bpftool map dump id 40
can't lookup element with key:
0a 00 00 00
can't lookup element with key:
0b 00 00 00
Found 0 elements

Output after changes:
root# bpftool map dump -jp  id 45
[
"key": ["0x0a","0x00","0x00","0x00"
],
"value": {
"error": "lookup not supported for this map"
},
"key": ["0x0b","0x00","0x00","0x00"
],
"value": {
"error": "lookup not supported for this map"
}
]

root# bpftool map dump id 45
key:
0a 00 00 00
value:
lookup not supported for this map
key:
0b 00 00 00
value:
lookup not supported for this map
Found 0 elements

Signed-off-by: Prashant Bhole 
---
 tools/bpf/bpftool/main.h |  5 +
 tools/bpf/bpftool/map.c  | 35 ++-
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 40492cdc4e53..1a8c683f949b 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -46,6 +46,11 @@
 
 #include "json_writer.h"
 
+#define ERR_CANNOT_LOOKUP \
+   "can't lookup element"
+#define ERR_LOOKUP_NOT_SUPPORTED \
+   "lookup not supported for this map"
+
 #define ptr_to_u64(ptr)((__u64)(unsigned long)(ptr))
 
 #define NEXT_ARG() ({ argc--; argv++; if (argc < 0) usage(); })
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 284e12a289c0..2faccd2098c9 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -333,6 +333,25 @@ static void print_entry_json(struct bpf_map_info *info, 
unsigned char *key,
jsonw_end_object(json_wtr);
 }
 
+static void print_entry_error(struct bpf_map_info *info, unsigned char *key,
+ char *value)
+{
+   bool single_line, break_names;
+   int value_size = strlen(value);
+
+   break_names = info->key_size > 16 || value_size > 16;
+   single_line = info->key_size + value_size <= 24 && !break_names;
+
+   printf("key:%c", break_names ? '\n' : ' ');
+   fprint_hex(stdout, key, info->key_size, " ");
+
+   printf(single_line ? "  " : "\n");
+
+   printf("value:%c%s", break_names ? '\n' : ' ', value);
+
+   printf("\n");
+}
+
 static void print_entry_plain(struct bpf_map_info *info, unsigned char *key,
  unsigned char *value)
 {
@@ -660,6 +679,8 @@ static int dump_map_elem(int fd, void *key, void *value,
 json_writer_t *btf_wtr)
 {
int num_elems = 0;
+   int lookup_errno;
+   char *errstr;
 
if (!bpf_map_lookup_elem(fd, key, value)) {
if (json_output) {
@@ -682,22 +703,26 @@ static int dump_map_elem(int fd, void *key, void *value,
}
 
/* lookup error handling */
+   lookup_errno = errno;
+
if (map_is_map_of_maps(map_info->type) ||
map_is_map_of_progs(map_info->type))
goto out;
 
+   if (lookup_errno == EOPNOTSUPP)
+   errstr = ERR_LOOKUP_NOT_SUPPORTED;
+   else
+   errstr = ERR_CANNOT_LOOKUP;
+
if (json_output) {
jsonw_name(json_wtr, "key");
print_hex_data_json(key, map_info->key_size);
jsonw_name(json_wtr, "value");
jsonw_start_object(json_wtr);
-   jsonw_string_field(json_wtr, "error",
-  "can't lookup element");
+   jsonw_string_field(json_wtr, "error", errstr);
jsonw_end_object(json_wtr);
} else {
-   p_info("can't lookup element with key: ");
-   fprint_hex(stderr, key, map_info->key_size, " ");
-   fprintf(stderr, "\n");
+   print_entry_error(map_info, key, errstr);
}
 out:
return num_elems;
-- 
2.17.1




[RFC bpf-next 2/4] bpf: return EOPNOTSUPP when map lookup isn't supported

2018-09-19 Thread Prashant Bhole
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Signed-off-by: Prashant Bhole 
---
 kernel/bpf/arraymap.c | 2 +-
 kernel/bpf/sockmap.c  | 2 +-
 kernel/bpf/stackmap.c | 2 +-
 kernel/bpf/xskmap.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index dded84cbe814..24583da9ffd1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -449,7 +449,7 @@ static void fd_array_map_free(struct bpf_map *map)
 
 static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* only called from syscall */
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 488ef9663c01..e50922a802f7 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2076,7 +2076,7 @@ int sockmap_get_from_fd(const union bpf_attr *attr, int 
type,
 
 static void *sock_map_lookup(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 static int sock_map_update_elem(struct bpf_map *map,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 8061a439ef18..b2ade10f7ec3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -505,7 +505,7 @@ const struct bpf_func_proto bpf_get_stack_proto = {
 /* Called from eBPF program */
 static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 /* Called from syscall */
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 9f8463afda9c..ef0b7b6ef8a5 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -154,7 +154,7 @@ void __xsk_map_flush(struct bpf_map *map)
 
 static void *xsk_map_lookup_elem(struct bpf_map *map, void *key)
 {
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 
 static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
-- 
2.17.1




[RFC bpf-next 0/4] Error handling when map lookup isn't supported

2018-09-19 Thread Prashant Bhole
Currently when map a lookup is failed, user space API can not make any
distinction whether given key was not found or lookup is not supported
by particular map.

In this series we modify return value of maps which do not support
lookup. Lookup on such map implementation will return -EOPNOTSUPP.
bpf() syscall with BPF_MAP_LOOKUP_ELEM command will set EOPNOTSUPP
errno. We also handle this error in bpftool to print appropriate
message.

Patch 1: adds handling of BPF_MAP_LOOKUP ELEM command of bpf syscall
such that errno will set to EOPNOTSUPP when map doesn't support lookup

Patch 2: Modifies the return value of map_lookup_elem() to EOPNOTSUPP
for maps which do not support lookup

Patch 3: Splits do_dump() in bpftool/map.c. Element printing code is
moved out into new function dump_map_elem(). This was done in order to
reduce deep indentation and accomodate further changes.

Patch 4: Changes to bpftool to do additional checking for errno when
map lookup is failed. In case of EOPNOTSUPP errno, it prints message
"lookup not supported for this map"

Prashant Bhole (4):
  bpf: error handling when map_lookup_elem isn't supported
  bpf: return EOPNOTSUPP when map lookup isn't supported
  tools/bpf: bpftool, split the function do_dump()
  tools/bpf: handle EOPNOTSUPP when map lookup is failed

 kernel/bpf/arraymap.c|   2 +-
 kernel/bpf/sockmap.c |   2 +-
 kernel/bpf/stackmap.c|   2 +-
 kernel/bpf/syscall.c |   9 +++-
 kernel/bpf/xskmap.c  |   2 +-
 tools/bpf/bpftool/main.h |   5 ++
 tools/bpf/bpftool/map.c  | 108 +++
 7 files changed, 90 insertions(+), 40 deletions(-)

-- 
2.17.1




[PATCH bpf-next] tools/bpf: bpftool, add xskmap in map types

2018-08-31 Thread Prashant Bhole
When listed all maps, bpftool currently shows (null) for xskmap.
Added xskmap type in map_type_name[] to show correct type.

Signed-off-by: Prashant Bhole 
---
 tools/bpf/bpftool/map.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index df175bc33c5d..9c55077ca5dd 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -68,6 +68,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_DEVMAP]   = "devmap",
[BPF_MAP_TYPE_SOCKMAP]  = "sockmap",
[BPF_MAP_TYPE_CPUMAP]   = "cpumap",
+   [BPF_MAP_TYPE_XSKMAP]   = "xskmap",
[BPF_MAP_TYPE_SOCKHASH] = "sockhash",
[BPF_MAP_TYPE_CGROUP_STORAGE]   = "cgroup_storage",
 };
-- 
2.17.1




[PATCH bpf-next] samples/bpf: xdpsock, minor fixes

2018-08-30 Thread Prashant Bhole
- xsks_map size was fixed to 4, changed it MAX_SOCKS
- Remove redundant definition of MAX_SOCKS in xdpsock_user.c
- In dump_stats(), add NULL check for xsks[i]

Signed-off-by: Prashant Bhole 
---
 samples/bpf/xdpsock_kern.c | 2 +-
 samples/bpf/xdpsock_user.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdpsock_kern.c b/samples/bpf/xdpsock_kern.c
index d8806c41362e..b8ccd0802b3f 100644
--- a/samples/bpf/xdpsock_kern.c
+++ b/samples/bpf/xdpsock_kern.c
@@ -16,7 +16,7 @@ struct bpf_map_def SEC("maps") xsks_map = {
.type = BPF_MAP_TYPE_XSKMAP,
.key_size = sizeof(int),
.value_size = sizeof(int),
-   .max_entries = 4,
+   .max_entries = MAX_SOCKS,
 };
 
 struct bpf_map_def SEC("maps") rr_map = {
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index b3906111bedb..57ecadc58403 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -118,7 +118,6 @@ struct xdpsock {
unsigned long prev_tx_npkts;
 };
 
-#define MAX_SOCKS 4
 static int num_socks;
 struct xdpsock *xsks[MAX_SOCKS];
 
@@ -596,7 +595,7 @@ static void dump_stats(void)
 
prev_time = now;
 
-   for (i = 0; i < num_socks; i++) {
+   for (i = 0; i < num_socks && xsks[i]; i++) {
char *fmt = "%-15s %'-11.0f %'-11lu\n";
double rx_pps, tx_pps;
 
-- 
2.17.1




[PATCH bpf-next] xsk: remove unnecessary assignment

2018-08-30 Thread Prashant Bhole
Since xdp_umem_query() was added one assignment of bpf.command was
missed from cleanup. Removing the assignment statement.

Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level")
Signed-off-by: Prashant Bhole 
---
 net/xdp/xdp_umem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index bfe2dbea480b..d179732617dc 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -76,8 +76,6 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct 
net_device *dev,
if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
 
-   bpf.command = XDP_QUERY_XSK_UMEM;
-
rtnl_lock();
err = xdp_umem_query(dev, queue_id);
if (err) {
-- 
2.17.1




[PATCH bpf] xsk: fix return value of xdp_umem_assign_dev()

2018-08-19 Thread Prashant Bhole
s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev().
This function's return value is directly returned by xsk_bind().
EOPNOTSUPP is bind()'s possible return value.

Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()")
Signed-off-by: Prashant Bhole 
---
 net/xdp/xdp_umem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 911ca6d3cb5a..bfe2dbea480b 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -74,14 +74,14 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct 
net_device *dev,
return 0;
 
if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
-   return force_zc ? -ENOTSUPP : 0; /* fail or fallback */
+   return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
 
bpf.command = XDP_QUERY_XSK_UMEM;
 
rtnl_lock();
err = xdp_umem_query(dev, queue_id);
if (err) {
-   err = err < 0 ? -ENOTSUPP : -EBUSY;
+   err = err < 0 ? -EOPNOTSUPP : -EBUSY;
goto err_rtnl_unlock;
}
 
-- 
2.17.1




Re: selftests/bpf test_sockmap failure

2018-07-24 Thread Prashant Bhole




On 7/25/2018 8:02 AM, Yonghong Song wrote:



On 7/24/18 3:40 PM, John Fastabend wrote:

On 07/24/2018 08:45 AM, Yonghong Song wrote:

In one of our production machines, tools/testing/selftests/bpf
test_sockmap failed randomly like below:

...
[TEST 78]: (512, 1, 1, sendmsg, pass,apply 1,): rx thread exited with
err 1. FAILED
...

...
[TEST 80]: (2, 1024, 256, sendmsg, pass,apply 1,): rx thread exited with
err 1. FAILED
...

...
[TEST 83]: (100, 1, 5, sendpage, pass,apply 1,): rx thread exited with
err 1. FAILED
...

...
[TEST 79]: (512, 1, 1, sendpage, pass,apply 1,): rx thread exited with
err 1. FAILED
...

The command line is just `test_sockmap`. The machine has 80 cpus, 256G
memory. The kernel is based on 4.16 but backported with latest bpf-next
bpf changes.

The failed test number (78, 79, 80, or 83) is random. But they all share
similar characteristics:
    . the option rate is greater than one, i.e., more than one
  sendmsg/sendpage in the sender forked process.
    . The txmsg_apply is not 0

I debugged a little bit. It happens in msg_loop() function below
"unexpected timeout" path.

...
 slct = select(max_fd + 1, , NULL, NULL,
);
 if (slct == -1) {
 perror("select()");
 clock_gettime(CLOCK_MONOTONIC, 
>end);

 goto out_errno;
 } else if (!slct) {
 if (opt->verbose)
 fprintf(stderr, "unexpected
timeout\n");
 errno = -EIO;
 clock_gettime(CLOCK_MONOTONIC, 
>end);

 goto out_errno;
 }
...

It appears that when the error happens, the receive process does not
receive all bytes sent from the send process and eventually times out.

Has anybody seen this issue as well?
John, any comments on this failure?


Can you run the test with verbose enabled so we can determine if the tx
side is even sending the message? Sample patch below. This will allow
us to see the tx bytes and rx bytes, although it will be a bit noisy.

I notice that the test program is not smart enough to (re)send bytes if
the sendmsg call doesn't consume all bytes. This is a valid error if
we get a enomem or other normal error on the tx side. With apply this
is more likely because every byte (in apply = 1 case) is being sent
through BPF prog.


The following are some of logs for the failed tests:

...
[TEST 78]: (512, 1, 1, sendmsg, pass,apply 1,): connected sockets: c1 
<-> p1, c2 <-> p2

cgroups binding: c1(24) <-> s1(22) - - - c2(25) <-> s2(23)
tx_sendmsg: TX: 512B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 
0.00GB/s

unexpected timeout
msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err -5
rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 147B 147.00B/s 
0.00GB/s

rx thread exited with err 1. FAILED
...

[TEST 82]: (100, 1, 5, sendmsg, pass,apply 1,): connected sockets: c1 
<-> p1, c2 <-> p2

cgroups binding: c1(24) <-> s1(22) - - - c2(25) <-> s2(23)
tx_sendmsg: TX: 500B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 
0.00GB/s

unexpected timeout
msg_loop_rx: iov_count 1 iov_buf 5 cnt 100 err -5
rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 366B 366.00B/s 
0.00GB/s

rx thread exited with err 1. FAILED

[TEST 402]: (512, 1, 1, sendmsg, pass,apply 1,): connected sockets: c1 
<-> p1, c2 <-> p2

cgroups binding: c1(42) <-> s1(40) - - - c2(43) <-> s2(41)
tx_sendmsg: TX: 512B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 
0.00GB/s

unexpected timeout
msg_loop_rx: iov_count 1 iov_buf 1 cnt 512 err -5
rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 147B 147.00B/s 
0.00GB/s

rx thread exited with err 1. FAILED

[TEST 80]: (2, 1024, 256, sendmsg, pass,apply 1,): connected sockets: c1 
<-> p1, c2 <-> p2

cgroups binding: c1(24) <-> s1(22) - - - c2(25) <-> s2(23)
tx_sendmsg: TX: 524288B 0.00B/s 0.00 GB/s RX: 0B 0.00B/s 
0.00GB/s

unexpected timeout
msg_loop_rx: iov_count 1024 iov_buf 256 cnt 2 err -5
rx_sendmsg: TX: 0B 0.00B/s 0.00GB/s RX: 458805B 458805.00B/s 
0.000459GB/s

rx thread exited with err 1. FAILED




If this is not the case I can do some more digging but I've not seen
this before.


I cannot reproduce it in my FC27 VM (with latest bpf-next) either.
The bug only shows up on our production service which is a lot busier 
and has more cpus/memorys and some sysctl configurations are different 
from my VM setup.


In the past I have noticed that on busy machines sometimes data is 
delayed so much that select() timeout is triggered. Although it isn't a 
solution but you can try to increase the timeout value to check if that 
is the case.


-Prashant



[PATCH net-next] net: ip6_gre: get ipv6hdr after skb_cow_head()

2018-07-12 Thread Prashant Bhole
A KASAN:use-after-free bug was found related to ip6-erspan
while running selftests/net/ip6_gre_headroom.sh

It happens because of following sequence:
- ipv6hdr pointer is obtained from skb
- skb_cow_head() is called, skb->head memory is reallocated
- old data is accessed using ipv6hdr pointer

skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure
enough headroom at xmit."), but looking at the history there was a
chance of similar bug because gre_handle_offloads() and pskb_trim()
can also reallocate skb->head memory. Fixes tag points to commit
which introduced possibility of this bug.

This patch moves ipv6hdr pointer assignment after skb_cow_head() call.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Prashant Bhole 
---
 net/ipv6/ip6_gre.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 367177786e34..fc7dd3a04360 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -927,7 +927,6 @@ static netdev_tx_t ip6gre_tunnel_xmit(struct sk_buff *skb,
 static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 struct net_device *dev)
 {
-   struct ipv6hdr *ipv6h = ipv6_hdr(skb);
struct ip6_tnl *t = netdev_priv(dev);
struct dst_entry *dst = skb_dst(skb);
struct net_device_stats *stats;
@@ -1012,6 +1011,8 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff 
*skb,
goto tx_err;
}
} else {
+   struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+
switch (skb->protocol) {
case htons(ETH_P_IP):
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
-- 
2.17.1




Re: [BUG net-next] BUG triggered with GRO SKB list_head changes

2018-07-12 Thread Prashant Bhole




On 7/12/2018 7:39 AM, Tyler Hicks wrote:

Starting with the following net-next commit, I see a BUG when starting a
LXD container inside of a KVM guest using virtio-net:

   d4546c2509b1 net: Convert GRO SKB handling to list_head.


Recently I encountered KASAN:use-after-free BUG and git bisect pointed 
to above commit. Looks like this is the same issue without KASAN 
enabled. I have submitted a bugfix for this BUG with Tyler Hicks in 
Reported-by tag.


-Prashant




Here's what the kernel spits out:

  kernel BUG at /var/scm/kernel/linux/include/linux/skbuff.h:2080!
  invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  CPU: 0 PID: 1362 Comm: libvirtd Not tainted 4.18.0-rc2+ #69
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
  RIP: 0010:skb_pull+0x36/0x40
  Code: c6 77 24 29 f0 3b 87 84 00 00 00 89 87 80 00 00 00 72 17 89 f6 48 89 f0 48 03 
87 d8 00 00 00 48 89 87 d8 00 00 00 c3 31 c0 c3 <0f> 0b 0f 1f 84 00 00 00
00 00 0f 1f 44 00 00 39 b7 80 00 00 00 76
  RSP: :96737f6039f0 EFLAGS: 00010297
  RAX: 9c66e2f2 RBX:  RCX: 0501
  RDX: 0001 RSI: 000e RDI: 96737f7e3938
  RBP: 967379f40020 R08:  R09: 
  R10: 96737f603988 R11: c0461335 R12: 967379f409e0
  R13: 96737f7e3938 R14:  R15: 967379e96ac0
  FS:  7fc96087e640() GS:96737f60() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7fc913608aa0 CR3: 5dacc001 CR4: 001606f0
  Call Trace:
   
   br_dev_xmit+0xe1/0x3d0 [bridge]
   dev_hard_start_xmit+0xbc/0x3b0
   __dev_queue_xmit+0xb98/0xc30
   ip_finish_output2+0x3e5/0x670
   ? ip_output+0x7f/0x250
   ip_output+0x7f/0x250
   ? ip_fragment.constprop.5+0x80/0x80
   ip_forward+0x3e2/0x650
   ? ipv4_frags_init_net+0x130/0x130
   ip_rcv+0x2be/0x500
   ? ip_local_deliver_finish+0x3b0/0x3b0
   __netif_receive_skb_core+0x6a8/0xb30
   ? lock_acquire+0xab/0x200
   ? netif_receive_skb_internal+0x2a/0x380
   netif_receive_skb_internal+0x73/0x380
   ? napi_gro_complete+0xcf/0x1b0
   dev_gro_receive+0x374/0x730
   napi_gro_receive+0x4f/0x1d0
   receive_buf+0x4b6/0x1930 [virtio_net]
   ? detach_buf+0x69/0x120 [virtio_ring]
   virtnet_poll+0x122/0x2e0 [virtio_net]
   net_rx_action+0x207/0x450
   __do_softirq+0x149/0x4ea
   irq_exit+0xbf/0xd0
   do_IRQ+0x6c/0x130
   common_interrupt+0xf/0xf
   
  RIP: 0010:__radix_tree_lookup+0x28/0xe0
  Code: 00 00 53 49 89 ca 41 bb 40 00 00 00 4c 8b 47 50 4c 89 c0 83 e0 03 48 83 f8 01 
0f 85 a8 00 00 00 4c 89 c0 48 83 e0 fe 0f b6 08 <4c> 89 d8 48 d3 e0 48 83
e8 01 48 39 c6 76 11 e9 9f 00 00 00 4c 89
  RSP: :ae150048fcc0 EFLAGS: 0282 ORIG_RAX: ffd9
  RAX: 96735d2ef908 RBX: 001f RCX: 0006
  RDX:  RSI: 02e2 RDI: 96735d10b788
  RBP: 02e2 R08: 96735d2ef909 R09: 
  R10:  R11: 0040 R12: 001f
  R13: ec01c15f3a80 R14: 001f R15: ae150048fd18
   __do_page_cache_readahead+0x11f/0x2e0
   filemap_fault+0x408/0x660
   ext4_filemap_fault+0x2f/0x40
   __do_fault+0x1f/0xd0
   __handle_mm_fault+0x915/0xfa0
   handle_mm_fault+0x1c2/0x390
   __do_page_fault+0x2f6/0x580
   ? async_page_fault+0x5/0x20
   async_page_fault+0x1b/0x20
  RIP: 0033:0x7fc913608aa0
  Code: Bad RIP value.
  RSP: 002b:7ffcfa9c7f08 EFLAGS: 00010206
  RAX:  RBX: 0003 RCX: 0080
  RDX: 0006 RSI: 7fc913a74bf8 RDI: 7fc913df9720
  RBP: 0001 R08: 55df45795700 R09: 
  R10: 55df4574c010 R11: 0001 R12: 7ffcfa9c8c38
  R13: 7ffcfa9c8c48 R14: 7fc913dc3d70 R15: 55df4578ab30
  Modules linked in: veth ebtable_filter ebtables ipt_MASQUERADE xt_CHECKSUM 
xt_comment xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_filter bpfilter bridge stp 
llc fuse kvm_intel kvm irqbypass 9pnet_virtio 9pnet virtio_balloon ib_iser 
rdma_cm configfs iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ip_tables x_tables virtio_net net_failover virtio_blk 
failover crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd 
cryptd glue_helper virtio_pci psmouse virtio_ring virtio

I'm not very familiar with the GRO or IP fragmentation code but I was
able to identify that this change "fixes" the issue:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7ccc601b55d9..a5cea572a7f1 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -666,6 +666,7 @@ struct sk_buff {
/* These two members must be first. */
struct sk_buff  *next;
struct sk_buff  *prev;
+   struct list_head

[PATCH net-next] net: gro: properly remove skb from list

2018-07-12 Thread Prashant Bhole
Following crash occurs in validate_xmit_skb_list() when same skb is
iterated multiple times in the loop and consume_skb() is called.

The root cause is calling list_del_init(>list) and not clearing
skb->next in d4546c2509b1. list_del_init(>list) sets skb->next
to point to skb itself. skb->next needs to be cleared because other
parts of network stack uses another kind of SKB lists.
validate_xmit_skb_list() uses such list.

A similar type of bugfix was reported by Jesper Dangaard Brouer.
https://patchwork.ozlabs.org/patch/942541/

This patch clears skb->next and changes list_del_init() to list_del()
so that list->prev will maintain the list poison.

[  148.185511] 
==
[  148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0
[  148.190158] Read of size 8 at addr 8801e52eefc0 by task swapper/1/0
[  148.192940]
[  148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25
[  148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[  148.199129] Call Trace:
[  148.200565]  
[  148.201911]  dump_stack+0xc6/0x14c
[  148.203572]  ? dump_stack_print_info.cold.1+0x2f/0x2f
[  148.205083]  ? kmsg_dump_rewind_nolock+0x59/0x59
[  148.206307]  ? validate_xmit_skb+0x2c6/0x560
[  148.207432]  ? debug_show_held_locks+0x30/0x30
[  148.208571]  ? validate_xmit_skb_list+0x4b/0xa0
[  148.211144]  print_address_description+0x6c/0x23c
[  148.212601]  ? validate_xmit_skb_list+0x4b/0xa0
[  148.213782]  kasan_report.cold.6+0x241/0x2fd
[  148.214958]  validate_xmit_skb_list+0x4b/0xa0
[  148.216494]  sch_direct_xmit+0x1b0/0x680
[  148.217601]  ? dev_watchdog+0x4e0/0x4e0
[  148.218675]  ? do_raw_spin_trylock+0x10/0x120
[  148.219818]  ? do_raw_spin_lock+0xe0/0xe0
[  148.221032]  __dev_queue_xmit+0x1167/0x1810
[  148.222155]  ? sched_clock+0x5/0x10
[...]

[  148.474257] Allocated by task 0:
[  148.475363]  kasan_kmalloc+0xbf/0xe0
[  148.476503]  kmem_cache_alloc+0xb4/0x1b0
[  148.477654]  __build_skb+0x91/0x250
[  148.478677]  build_skb+0x67/0x180
[  148.479657]  e1000_clean_rx_irq+0x542/0x8a0
[  148.480757]  e1000_clean+0x652/0xd10
[  148.481772]  net_rx_action+0x4ea/0xc20
[  148.482808]  __do_softirq+0x1f9/0x574
[  148.483831]
[  148.484575] Freed by task 0:
[  148.485504]  __kasan_slab_free+0x12e/0x180
[  148.486589]  kmem_cache_free+0xb4/0x240
[  148.487634]  kfree_skbmem+0xed/0x150
[  148.488648]  consume_skb+0x146/0x250
[  148.489665]  validate_xmit_skb+0x2b7/0x560
[  148.490754]  validate_xmit_skb_list+0x70/0xa0
[  148.491897]  sch_direct_xmit+0x1b0/0x680
[  148.493949]  __dev_queue_xmit+0x1167/0x1810
[  148.495103]  br_dev_queue_push_xmit+0xce/0x250
[  148.496196]  br_forward_finish+0x276/0x280
[  148.497234]  __br_forward+0x44f/0x520
[  148.498260]  br_forward+0x19f/0x1b0
[  148.499264]  br_handle_frame_finish+0x65e/0x980
[  148.500398]  NF_HOOK.constprop.10+0x290/0x2a0
[  148.501522]  br_handle_frame+0x417/0x640
[  148.502582]  __netif_receive_skb_core+0xaac/0x18f0
[  148.503753]  __netif_receive_skb_one_core+0x98/0x120
[  148.504958]  netif_receive_skb_internal+0xe3/0x330
[  148.506154]  napi_gro_complete+0x190/0x2a0
[  148.507243]  dev_gro_receive+0x9f7/0x1100
[  148.508316]  napi_gro_receive+0xcb/0x260
[  148.509387]  e1000_clean_rx_irq+0x2fc/0x8a0
[  148.510501]  e1000_clean+0x652/0xd10
[  148.511523]  net_rx_action+0x4ea/0xc20
[  148.512566]  __do_softirq+0x1f9/0x574
[  148.513598]
[  148.514346] The buggy address belongs to the object at 8801e52eefc0
[  148.514346]  which belongs to the cache skbuff_head_cache of size 232
[  148.517047] The buggy address is located 0 bytes inside of
[  148.517047]  232-byte region [8801e52eefc0, 8801e52ef0a8)
[  148.519549] The buggy address belongs to the page:
[  148.520726] page:ea000794bb00 count:1 mapcount:0 
mapping:880106f4dfc0 index:0x8801e52ee840 compound_mapcount: 0
[  148.524325] flags: 0x17c0008100(slab|head)
[  148.525481] raw: 0017c0008100 880106b938d0 880106b938d0 
880106f4dfc0
[  148.527503] raw: 8801e52ee840 00190011 0001 

[  148.529547] page dumped because: kasan: bad access detected

Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
Signed-off-by: Prashant Bhole 
Reported-by: Tyler Hicks 
---
 net/core/dev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d13cddcac41f..08c41941f912 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5169,7 +5169,8 @@ static void __napi_gro_flush_chain(struct napi_struct 
*napi, u32 index,
list_for_each_entry_safe_reverse(skb, p, head, list) {
if (flush_old && NAPI_GRO_CB(skb)->age == jiffies)
return;
-   list_del_init(>list);
+   list_del(>list);
+   skb->

[PATCH net-next] rtnetlink: Fix null-ptr-deref in rtnl_newlink

2018-06-01 Thread Prashant Bhole
In rtnl_newlink(), NULL check is performed on m_ops however member of
ops is accessed. Fixed by accessing member of m_ops instead of ops.

[  345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110
[  345.432629] Read of size 4 at addr 0088 by task ip/986
[  345.432629]
[  345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9
[  345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[  345.432629] Call Trace:
[  345.432629]  dump_stack+0xc6/0x150
[  345.432629]  ? dump_stack_print_info.cold.0+0x1b/0x1b
[  345.432629]  ? kasan_report+0xb4/0x410
[  345.432629]  kasan_report.cold.4+0x8f/0x91
[  345.432629]  ? rtnl_newlink+0x400/0x1110
[  345.432629]  rtnl_newlink+0x400/0x1110
[...]

Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage")
Signed-off-by: Prashant Bhole 
---
 net/core/rtnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8ca49a0e13fb..9a1ba2015ad8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2936,7 +2936,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh,
}
 
if (m_ops) {
-   if (ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
+   if (m_ops->slave_maxtype > RTNL_SLAVE_MAX_TYPE)
return -EINVAL;
 
if (m_ops->slave_maxtype &&
-- 
2.17.0




[PATCH bpf-next v4 5/5] selftests/bpf: test_sockmap, print additional test options

2018-05-30 Thread Prashant Bhole
Print values of test options like apply, cork, start, end so that
individual failed tests can be identified for manual run

Acked-by: John Fastabend 
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 28 +++---
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 2bc70e1ab2eb..05c8cb71724a 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -876,6 +876,8 @@ static char *test_to_str(int test)
 #define OPTSTRING 60
 static void test_options(char *options)
 {
+   char tstr[OPTSTRING];
+
memset(options, 0, OPTSTRING);
 
if (txmsg_pass)
@@ -888,14 +890,22 @@ static void test_options(char *options)
strncat(options, "redir_noisy,", OPTSTRING);
if (txmsg_drop)
strncat(options, "drop,", OPTSTRING);
-   if (txmsg_apply)
-   strncat(options, "apply,", OPTSTRING);
-   if (txmsg_cork)
-   strncat(options, "cork,", OPTSTRING);
-   if (txmsg_start)
-   strncat(options, "start,", OPTSTRING);
-   if (txmsg_end)
-   strncat(options, "end,", OPTSTRING);
+   if (txmsg_apply) {
+   snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_cork) {
+   snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_start) {
+   snprintf(tstr, OPTSTRING, "start %d,", txmsg_start);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_end) {
+   snprintf(tstr, OPTSTRING, "end %d,", txmsg_end);
+   strncat(options, tstr, OPTSTRING);
+   }
if (txmsg_ingress)
strncat(options, "ingress,", OPTSTRING);
if (txmsg_skb)
@@ -904,7 +914,7 @@ static void test_options(char *options)
 
 static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
 {
-   char *options = calloc(60, sizeof(char));
+   char *options = calloc(OPTSTRING, sizeof(char));
int err;
 
if (test == SENDPAGE)
-- 
2.17.0




[PATCH bpf-next v4 4/5] selftests/bpf: test_sockmap, fix data verification

2018-05-30 Thread Prashant Bhole
When data verification is enabled, some tests fail because verification is done
incorrectly. Following changes fix it.

- Identify the size of data block to be verified
- Reset verification counter when data block size is reached
- Fixed the value printed in case of verfication failure

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend 
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 5cd0550af595..2bc70e1ab2eb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
int fd_flags = O_NONBLOCK;
struct timeval timeout;
float total_bytes;
+   int bytes_cnt = 0;
+   int chunk_sz;
fd_set w;
 
+   if (opt->sendpage)
+   chunk_sz = iov_length * cnt;
+   else
+   chunk_sz = iov_length * iov_count;
+
fcntl(fd, fd_flags);
total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
err = clock_gettime(CLOCK_MONOTONIC, >start);
@@ -393,9 +400,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
errno = -EIO;
fprintf(stderr,
"detected data 
corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
-   i, j, d[j], k - 
1, d[j+1], k + 1);
+   i, j, d[j], k - 
1, d[j+1], k);
goto out_errno;
}
+   bytes_cnt++;
+   if (bytes_cnt == chunk_sz) {
+   k = 0;
+   bytes_cnt = 0;
+   }
recv--;
}
}
-- 
2.17.0




[PATCH bpf-next v4 1/5] selftests/bpf: test_sockmap, check test failure

2018-05-30 Thread Prashant Bhole
Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads
- Skip error checking for corked tests as they are expected to timeout

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 27 +-
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..7b2008a144cb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
struct msg_stats s = {0};
int iov_count = opt->iov_count;
int iov_buf = opt->iov_length;
+   int rx_status, tx_status;
int cnt = opt->rate;
-   int status;
 
errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
rxpid = fork();
if (rxpid == 0) {
if (opt->drop_expected)
-   exit(1);
+   exit(0);
 
if (opt->sendpage)
iov_count = 1;
@@ -463,7 +463,9 @@ static int sendmsg_test(struct sockmap_options *opt)
"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   if (err && txmsg_cork)
+   err = 0;
+   exit(err ? 1 : 0);
} else if (rxpid == -1) {
perror("msg_loop_rx: ");
return errno;
@@ -491,14 +493,27 @@ static int sendmsg_test(struct sockmap_options *opt)
"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (txpid == -1) {
perror("msg_loop_tx: ");
return errno;
}
 
-   assert(waitpid(rxpid, , 0) == rxpid);
-   assert(waitpid(txpid, , 0) == txpid);
+   assert(waitpid(rxpid, _status, 0) == rxpid);
+   assert(waitpid(txpid, _status, 0) == txpid);
+   if (WIFEXITED(rx_status)) {
+   err = WEXITSTATUS(rx_status);
+   if (err) {
+   fprintf(stderr, "rx thread exited with err %d. ", err);
+   goto out;
+   }
+   }
+   if (WIFEXITED(tx_status)) {
+   err = WEXITSTATUS(tx_status);
+   if (err)
+   fprintf(stderr, "tx thread exited with err %d. ", err);
+   }
+out:
return err;
 }
 
-- 
2.17.0




[PATCH bpf-next v4 0/5] fix test_sockmap

2018-05-30 Thread Prashant Bhole
test_sockmap was originally written only to exercise kernel code
paths, so there was no strict checking of errors. When the code was
modified to run as selftests, due to lack of error handling it was not
able to detect test failures.

In order to improve, this series fixes error handling, test run time
and data verification.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Changes in v4:
  - patch1: Ignore RX timoute error only for corked tests
  - patch3: Setting different timeout for corked tests and reduce
  run time by reducing number of iterations in some tests

Changes in v3:
  - Skipped error checking for corked tests

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, timing improvements
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 87 +-
 1 file changed, 67 insertions(+), 20 deletions(-)

-- 
2.17.0




[PATCH bpf-next v4 3/5] selftests/bpf: test_sockmap, timing improvements

2018-05-30 Thread Prashant Bhole
Currently 10us delay is too low for many tests to succeed. It needs to
be increased. Also, many corked tests are expected to hit rx timeout
irrespective of timeout value.

- This patch sets 1000usec timeout value for corked tests because less
than that causes broken-pipe error in tx thread. Also sets 1 second
timeout for all other tests because less than that results in RX
timeout
- tests with apply=1 and higher number of iterations were taking lot
of time. This patch reduces test run time by reducing iterations.

real0m12.968s
user0m0.219s
sys 0m14.337s

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 7f9ca79aadbc..5cd0550af595 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,13 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
if (err < 0)
perror("recv start time: ");
while (s->bytes_recvd < total_bytes) {
-   timeout.tv_sec = 0;
-   timeout.tv_usec = 10;
+   if (txmsg_cork) {
+   timeout.tv_sec = 0;
+   timeout.tv_usec = 1000;
+   } else {
+   timeout.tv_sec = 1;
+   timeout.tv_usec = 0;
+   }
 
/* FD sets */
FD_ZERO();
@@ -1025,14 +1030,14 @@ static int test_send(struct sockmap_options *opt, int 
cgrp)
 
opt->iov_length = 1;
opt->iov_count = 1;
-   opt->rate = 1024;
+   opt->rate = 512;
err = test_exec(cgrp, opt);
if (err)
goto out;
 
opt->iov_length = 256;
opt->iov_count = 1024;
-   opt->rate = 10;
+   opt->rate = 2;
err = test_exec(cgrp, opt);
if (err)
goto out;
-- 
2.17.0




[PATCH bpf-next v4 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode

2018-05-30 Thread Prashant Bhole
In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend 
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 7b2008a144cb..7f9ca79aadbc 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1344,6 +1344,11 @@ static int __test_suite(char *bpf_file)
return cg_fd;
}
 
+   if (join_cgroup(CG_PATH)) {
+   fprintf(stderr, "ERROR: failed to join cgroup\n");
+   return -EINVAL;
+   }
+
/* Tests basic commands and APIs with range of iov values */
txmsg_start = txmsg_end = 0;
err = test_txmsg(cg_fd);
-- 
2.17.0




Re: [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout

2018-05-30 Thread Prashant Bhole




On 5/31/2018 4:59 AM, John Fastabend wrote:

On 05/30/2018 12:29 PM, Alexei Starovoitov wrote:

On Wed, May 30, 2018 at 02:56:09PM +0900, Prashant Bhole wrote:

In order to reduce runtime of tests, recently timout for select() call
was reduced from 1sec to 10usec. This was causing many tests failures.
It was caught with failure handling commits in this series.

Restoring the timeout from 10usec to 1sec

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole 
---
  tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 64f9e25c451f..9d01f5c2abe2 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
if (err < 0)
perror("recv start time: ");
while (s->bytes_recvd < total_bytes) {
-   timeout.tv_sec = 0;
-   timeout.tv_usec = 10;
+   timeout.tv_sec = 1;
+   timeout.tv_usec = 0;


I've applied the set, but had to revert it, since it takes too long.

real1m40.124s
user0m0.375s
sys 0m14.521s



Dang, I thought it would be a bit longer but not minutes.


Myself and Daniel run the test semi-manually when we apply patches.> Adding 2 
extra minutes of wait time is unnecessary.


Yep.


Especially since most of it is idle time.
Please find a way to fix tests differently.
btw I don't see any failures today. Not sure what is being fixed
by incresing a timeout.



Calling these fixes is a bit much, they are primarily improvements.

The background is, when I originally wrote the tests my goal was to
exercise the kernel code paths. Because of this I didn't really care if
the tests actually sent/recv all bytes in the test. (I have long
running tests using netperf/wrk/apached/etc. for that) But, the manual
tests do have an option to verify the data if specified. The 'verify'
option is a bit fragile in that with the right tests (e.g. drop)
or the certain options (e.g. cork) it can fail which is expected.

What Prashant added was support to actually verify the data correctly.
And also fix a few cgroup handling and some pretty printing as well.
He noticed the low timeout causing issue in these cases though so
increased it.

@Prashant, how about increasing this less dramatically because now
all cork tests are going to stall for 1s unless perfectly aligned.
How about 100us? Or even better we can conditionally set it based
on if tx_cork is set. If tx_cork is set use 1us otherwise use 200us
or something. (1s is really to high in any cases for lo)

Also capturing some of the above in the cover letter would help
folks understand the context a bit better.



I did trial and error for timeout values. Currently 1000us for corked 
tests and 1 sec for other tests works fine. I observed broken-pipe error 
at tx side when timeout was < 1000us.


Also tests with apply=1 and higher number of iterations were taking 
time, so reducing iterations reduces the test run time drastically.


real0m12.968s
user0m0.219s
sys 0m14.337s

Also I will try to explain background in the cover letter of next series.

-Prashant




[PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout

2018-05-29 Thread Prashant Bhole
In order to reduce runtime of tests, recently timout for select() call
was reduced from 1sec to 10usec. This was causing many tests failures.
It was caught with failure handling commits in this series.

Restoring the timeout from 10usec to 1sec

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 64f9e25c451f..9d01f5c2abe2 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
if (err < 0)
perror("recv start time: ");
while (s->bytes_recvd < total_bytes) {
-   timeout.tv_sec = 0;
-   timeout.tv_usec = 10;
+   timeout.tv_sec = 1;
+   timeout.tv_usec = 0;
 
/* FD sets */
FD_ZERO();
-- 
2.17.0




[PATCH bpf v3 4/5] selftests/bpf: test_sockmap, fix data verification

2018-05-29 Thread Prashant Bhole
When data verification is enabled, some tests fail because verification is done
incorrectly. Following changes fix it.

- Identify the size of data block to be verified
- Reset verification counter when data block size is reached
- Fixed the value printed in case of verfication failure

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend 
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 9d01f5c2abe2..664f268dc02a 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
int fd_flags = O_NONBLOCK;
struct timeval timeout;
float total_bytes;
+   int bytes_cnt = 0;
+   int chunk_sz;
fd_set w;
 
+   if (opt->sendpage)
+   chunk_sz = iov_length * cnt;
+   else
+   chunk_sz = iov_length * iov_count;
+
fcntl(fd, fd_flags);
total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
err = clock_gettime(CLOCK_MONOTONIC, >start);
@@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
errno = -EIO;
fprintf(stderr,
"detected data 
corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
-   i, j, d[j], k - 
1, d[j+1], k + 1);
+   i, j, d[j], k - 
1, d[j+1], k);
goto out_errno;
}
+   bytes_cnt++;
+   if (bytes_cnt == chunk_sz) {
+   k = 0;
+   bytes_cnt = 0;
+   }
recv--;
}
}
-- 
2.17.0




[PATCH bpf v3 5/5] selftests/bpf: test_sockmap, print additional test options

2018-05-29 Thread Prashant Bhole
Print values of test options like apply, cork, start, end so that
individual failed tests can be identified for manual run

Acked-by: John Fastabend 
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 28 +++---
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 664f268dc02a..637c6585ff80 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -869,6 +869,8 @@ static char *test_to_str(int test)
 #define OPTSTRING 60
 static void test_options(char *options)
 {
+   char tstr[OPTSTRING];
+
memset(options, 0, OPTSTRING);
 
if (txmsg_pass)
@@ -881,14 +883,22 @@ static void test_options(char *options)
strncat(options, "redir_noisy,", OPTSTRING);
if (txmsg_drop)
strncat(options, "drop,", OPTSTRING);
-   if (txmsg_apply)
-   strncat(options, "apply,", OPTSTRING);
-   if (txmsg_cork)
-   strncat(options, "cork,", OPTSTRING);
-   if (txmsg_start)
-   strncat(options, "start,", OPTSTRING);
-   if (txmsg_end)
-   strncat(options, "end,", OPTSTRING);
+   if (txmsg_apply) {
+   snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_cork) {
+   snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_start) {
+   snprintf(tstr, OPTSTRING, "start %d,", txmsg_start);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_end) {
+   snprintf(tstr, OPTSTRING, "end %d,", txmsg_end);
+   strncat(options, tstr, OPTSTRING);
+   }
if (txmsg_ingress)
strncat(options, "ingress,", OPTSTRING);
if (txmsg_skb)
@@ -897,7 +907,7 @@ static void test_options(char *options)
 
 static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
 {
-   char *options = calloc(60, sizeof(char));
+   char *options = calloc(OPTSTRING, sizeof(char));
int err;
 
if (test == SENDPAGE)
-- 
2.17.0




[PATCH bpf v3 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode

2018-05-29 Thread Prashant Bhole
In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend 
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 01bc9c6745e8..64f9e25c451f 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file)
return cg_fd;
}
 
+   if (join_cgroup(CG_PATH)) {
+   fprintf(stderr, "ERROR: failed to join cgroup\n");
+   return -EINVAL;
+   }
+
/* Tests basic commands and APIs with range of iov values */
txmsg_start = txmsg_end = 0;
err = test_txmsg(cg_fd);
-- 
2.17.0




[PATCH bpf v3 0/5] fix test_sockmap

2018-05-29 Thread Prashant Bhole
This series fixes error handling, timeout and data verification in
test_sockmap. Previously it was not able to detect failure/timeout in
RX/TX thread because error was not notified to the main thread.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Changes in v3:
  - Skipped error checking for corked tests

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, fix test timeout
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 76 +-
 1 file changed, 58 insertions(+), 18 deletions(-)

-- 
2.17.0




[PATCH bpf v3 1/5] selftests/bpf: test_sockmap, check test failure

2018-05-29 Thread Prashant Bhole
Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads
- Skip error checking for corked tests as they are expected to timeout

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole 
---
 tools/testing/selftests/bpf/test_sockmap.c | 25 --
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..01bc9c6745e8 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
struct msg_stats s = {0};
int iov_count = opt->iov_count;
int iov_buf = opt->iov_length;
+   int rx_status, tx_status;
int cnt = opt->rate;
-   int status;
 
errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
rxpid = fork();
if (rxpid == 0) {
if (opt->drop_expected)
-   exit(1);
+   exit(0);
 
if (opt->sendpage)
iov_count = 1;
@@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt)
"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (rxpid == -1) {
perror("msg_loop_rx: ");
return errno;
@@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt)
"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (txpid == -1) {
perror("msg_loop_tx: ");
return errno;
}
 
-   assert(waitpid(rxpid, , 0) == rxpid);
-   assert(waitpid(txpid, , 0) == txpid);
+   assert(waitpid(rxpid, _status, 0) == rxpid);
+   assert(waitpid(txpid, _status, 0) == txpid);
+   if (WIFEXITED(rx_status)) {
+   err = WEXITSTATUS(rx_status);
+   if (err && !txmsg_cork) {
+   fprintf(stderr, "rx thread exited with err %d. ", err);
+   goto out;
+   }
+   }
+   if (WIFEXITED(tx_status)) {
+   err = WEXITSTATUS(tx_status);
+   if (err)
+   fprintf(stderr, "tx thread exited with err %d. ", err);
+   }
+out:
return err;
 }
 
-- 
2.17.0




Re: [PATCH bpf v2 0/5] fix test_sockmap

2018-05-29 Thread Prashant Bhole




On 5/30/2018 2:12 PM, John Fastabend wrote:

On 05/29/2018 05:44 PM, Prashant Bhole wrote:



On 5/30/2018 12:48 AM, John Fastabend wrote:

On 05/27/2018 09:37 PM, Prashant Bhole wrote:

This series fixes error handling, timeout and data verification in
test_sockmap. Previously it was not able to detect failure/timeout in
RX/TX thread because error was not notified to the main thread.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Prashant Bhole (5):
    selftests/bpf: test_sockmap, check test failure
    selftests/bpf: test_sockmap, join cgroup in selftest mode
    selftests/bpf: test_sockmap, fix test timeout
    selftests/bpf: test_sockmap, fix data verification
    selftests/bpf: test_sockmap, print additional test options

   tools/testing/selftests/bpf/test_sockmap.c | 76 +-
   1 file changed, 58 insertions(+), 18 deletions(-)



After first patch "check test failure" how do we handle the case
where test is known to cause timeouts because we are specifically testing
these cases. This is the 'cork' parameter we discussed in the last
series. It looks like with this series the test may still throw an
error?


Sorry. In your comment in last series, did you mean to skip error
checking only for all cork tests (for now)?

-Prashant



Hi, After this is applied are any errors returned from test_sockmap?
When I read the first patch it looked like timeouts from the cork
tests may result in errors "FAILED" tests. If this is the case then
yes we need skip error checking on all tests or just the corked
tests.


Yes errors returned after applying this series. I will skip error 
checking on just corked tests.


-Prashant



Re: [PATCH bpf v2 0/5] fix test_sockmap

2018-05-29 Thread Prashant Bhole




On 5/30/2018 12:48 AM, John Fastabend wrote:

On 05/27/2018 09:37 PM, Prashant Bhole wrote:

This series fixes error handling, timeout and data verification in
test_sockmap. Previously it was not able to detect failure/timeout in
RX/TX thread because error was not notified to the main thread.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Prashant Bhole (5):
   selftests/bpf: test_sockmap, check test failure
   selftests/bpf: test_sockmap, join cgroup in selftest mode
   selftests/bpf: test_sockmap, fix test timeout
   selftests/bpf: test_sockmap, fix data verification
   selftests/bpf: test_sockmap, print additional test options

  tools/testing/selftests/bpf/test_sockmap.c | 76 +-
  1 file changed, 58 insertions(+), 18 deletions(-)



After first patch "check test failure" how do we handle the case
where test is known to cause timeouts because we are specifically testing
these cases. This is the 'cork' parameter we discussed in the last
series. It looks like with this series the test may still throw an
error?


Sorry. In your comment in last series, did you mean to skip error 
checking only for all cork tests (for now)?


-Prashant



[PATCH net-next] netfilter: fix null-ptr-deref in nf_nat_decode_session

2018-05-28 Thread Prashant Bhole
Add null check for nat_hook in nf_nat_decode_session()

[  195.648098] UBSAN: Undefined behaviour in ./include/linux/netfilter.h:348:14
[  195.651366] BUG: KASAN: null-ptr-deref in __xfrm_policy_check+0x208/0x1d70
[  195.653888] member access within null pointer of type 'struct nf_nat_hook'
[  195.653896] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.17.0-rc6+ #5
[  195.656320] Read of size 8 at addr 0008 by task ping/2469
[  195.658715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[  195.658721] Call Trace:
[  195.661087]
[  195.669341]  
[  195.670574]  dump_stack+0xc6/0x150
[  195.672156]  ? dump_stack_print_info.cold.0+0x1b/0x1b
[  195.674121]  ? ubsan_prologue+0x31/0x92
[  195.676546]  ubsan_epilogue+0x9/0x49
[  195.678159]  handle_null_ptr_deref+0x11a/0x130
[  195.679800]  ? sprint_OID+0x1a0/0x1a0
[  195.681322]  __ubsan_handle_type_mismatch_v1+0xd5/0x11d
[  195.683146]  ? ubsan_prologue+0x92/0x92
[  195.684642]  __xfrm_policy_check+0x18ef/0x1d70
[  195.686294]  ? rt_cache_valid+0x118/0x180
[  195.687804]  ? __xfrm_route_forward+0x410/0x410
[  195.689463]  ? fib_multipath_hash+0x700/0x700
[  195.691109]  ? kvm_sched_clock_read+0x23/0x40
[  195.692805]  ? pvclock_clocksource_read+0xf6/0x280
[  195.694409]  ? graph_lock+0xa0/0xa0
[  195.695824]  ? pvclock_clocksource_read+0xf6/0x280
[  195.697508]  ? pvclock_read_flags+0x80/0x80
[  195.698981]  ? kvm_sched_clock_read+0x23/0x40
[  195.700347]  ? sched_clock+0x5/0x10
[  195.701525]  ? sched_clock_cpu+0x18/0x1a0
[  195.702846]  tcp_v4_rcv+0x1d32/0x1de0
[  195.704115]  ? lock_repin_lock+0x70/0x270
[  195.707072]  ? pvclock_read_flags+0x80/0x80
[  195.709302]  ? tcp_v4_early_demux+0x4b0/0x4b0
[  195.711833]  ? lock_acquire+0x195/0x380
[  195.714222]  ? ip_local_deliver_finish+0xfc/0x770
[  195.716967]  ? raw_rcv+0x2b0/0x2b0
[  195.718856]  ? lock_release+0xa00/0xa00
[  195.720938]  ip_local_deliver_finish+0x1b9/0x770
[...]

Fixes: 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 include/linux/netfilter.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 04551af2ff23..dd2052f0efb7 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -345,7 +345,7 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi 
*fl, u_int8_t family)
 
rcu_read_lock();
nat_hook = rcu_dereference(nf_nat_hook);
-   if (nat_hook->decode_session)
+   if (nat_hook && nat_hook->decode_session)
nat_hook->decode_session(skb, fl);
rcu_read_unlock();
 #endif
-- 
2.17.0




[PATCH bpf v2 5/5] selftests/bpf: test_sockmap, print additional test options

2018-05-27 Thread Prashant Bhole
Print values of test options like apply, cork, start, end so that
individual failed tests can be identified for manual run

Acked-by: John Fastabend <john.fastab...@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 28 +++---
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index e8faf4596dac..7970d48c4acb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -869,6 +869,8 @@ static char *test_to_str(int test)
 #define OPTSTRING 60
 static void test_options(char *options)
 {
+   char tstr[OPTSTRING];
+
memset(options, 0, OPTSTRING);
 
if (txmsg_pass)
@@ -881,14 +883,22 @@ static void test_options(char *options)
strncat(options, "redir_noisy,", OPTSTRING);
if (txmsg_drop)
strncat(options, "drop,", OPTSTRING);
-   if (txmsg_apply)
-   strncat(options, "apply,", OPTSTRING);
-   if (txmsg_cork)
-   strncat(options, "cork,", OPTSTRING);
-   if (txmsg_start)
-   strncat(options, "start,", OPTSTRING);
-   if (txmsg_end)
-   strncat(options, "end,", OPTSTRING);
+   if (txmsg_apply) {
+   snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_cork) {
+   snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_start) {
+   snprintf(tstr, OPTSTRING, "start %d,", txmsg_start);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_end) {
+   snprintf(tstr, OPTSTRING, "end %d,", txmsg_end);
+   strncat(options, tstr, OPTSTRING);
+   }
if (txmsg_ingress)
strncat(options, "ingress,", OPTSTRING);
if (txmsg_skb)
@@ -897,7 +907,7 @@ static void test_options(char *options)
 
 static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
 {
-   char *options = calloc(60, sizeof(char));
+   char *options = calloc(OPTSTRING, sizeof(char));
int err;
 
if (test == SENDPAGE)
-- 
2.17.0




[PATCH bpf v2 4/5] selftests/bpf: test_sockmap, fix data verification

2018-05-27 Thread Prashant Bhole
When data verification is enabled, some tests fail because verification is done
incorrectly. Following changes fix it.

- Identify the size of data block to be verified
- Reset verification counter when data block size is reached
- Fixed the value printed in case of verfication failure

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastab...@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index f79397513362..e8faf4596dac 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
int fd_flags = O_NONBLOCK;
struct timeval timeout;
float total_bytes;
+   int bytes_cnt = 0;
+   int chunk_sz;
fd_set w;
 
+   if (opt->sendpage)
+   chunk_sz = iov_length * cnt;
+   else
+   chunk_sz = iov_length * iov_count;
+
fcntl(fd, fd_flags);
total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
err = clock_gettime(CLOCK_MONOTONIC, >start);
@@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
errno = -EIO;
fprintf(stderr,
"detected data 
corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
-   i, j, d[j], k - 
1, d[j+1], k + 1);
+   i, j, d[j], k - 
1, d[j+1], k);
goto out_errno;
}
+   bytes_cnt++;
+   if (bytes_cnt == chunk_sz) {
+   k = 0;
+   bytes_cnt = 0;
+   }
recv--;
}
}
-- 
2.17.0




[PATCH bpf v2 3/5] selftests/bpf: test_sockmap, fix test timeout

2018-05-27 Thread Prashant Bhole
In order to reduce runtime of tests, recently timout for select() call
was reduced from 1sec to 10usec. This was causing many tests failures.
It was caught with failure handling commits in this series.

Restoring the timeout from 10usec to 1sec

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 8a81ea0e9fb6..f79397513362 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
if (err < 0)
perror("recv start time: ");
while (s->bytes_recvd < total_bytes) {
-   timeout.tv_sec = 0;
-   timeout.tv_usec = 10;
+   timeout.tv_sec = 1;
+   timeout.tv_usec = 0;
 
/* FD sets */
FD_ZERO();
-- 
2.17.0




[PATCH bpf v2 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode

2018-05-27 Thread Prashant Bhole
In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Acked-by: John Fastabend <john.fastab...@gmail.com>
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 34feb74c95c4..8a81ea0e9fb6 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file)
return cg_fd;
}
 
+   if (join_cgroup(CG_PATH)) {
+   fprintf(stderr, "ERROR: failed to join cgroup\n");
+   return -EINVAL;
+   }
+
/* Tests basic commands and APIs with range of iov values */
txmsg_start = txmsg_end = 0;
err = test_txmsg(cg_fd);
-- 
2.17.0




[PATCH bpf v2 1/5] selftests/bpf: test_sockmap, check test failure

2018-05-27 Thread Prashant Bhole
Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 25 --
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..34feb74c95c4 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
struct msg_stats s = {0};
int iov_count = opt->iov_count;
int iov_buf = opt->iov_length;
+   int rx_status, tx_status;
int cnt = opt->rate;
-   int status;
 
errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
rxpid = fork();
if (rxpid == 0) {
if (opt->drop_expected)
-   exit(1);
+   exit(0);
 
if (opt->sendpage)
iov_count = 1;
@@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt)
"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (rxpid == -1) {
perror("msg_loop_rx: ");
return errno;
@@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt)
"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (txpid == -1) {
perror("msg_loop_tx: ");
return errno;
}
 
-   assert(waitpid(rxpid, , 0) == rxpid);
-   assert(waitpid(txpid, , 0) == txpid);
+   assert(waitpid(rxpid, _status, 0) == rxpid);
+   assert(waitpid(txpid, _status, 0) == txpid);
+   if (WIFEXITED(rx_status)) {
+   err = WEXITSTATUS(rx_status);
+   if (err) {
+   fprintf(stderr, "rx thread exited with err %d. ", err);
+   goto out;
+   }
+   }
+   if (WIFEXITED(tx_status)) {
+   err = WEXITSTATUS(tx_status);
+   if (err)
+   fprintf(stderr, "tx thread exited with err %d. ", err);
+   }
+out:
return err;
 }
 
-- 
2.17.0




[PATCH bpf v2 0/5] fix test_sockmap

2018-05-27 Thread Prashant Bhole
This series fixes error handling, timeout and data verification in
test_sockmap. Previously it was not able to detect failure/timeout in
RX/TX thread because error was not notified to the main thread.

Also slightly improved test output by printing parameter values (cork,
apply, start, end) so that parameters for all tests are displayed.

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, fix test timeout
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 76 +-
 1 file changed, 58 insertions(+), 18 deletions(-)

-- 
2.17.0




Re: [PATCH bpf-next 0/5] fix test_sockmap

2018-05-27 Thread Prashant Bhole

On 5/25/2018 11:01 PM, John Fastabend wrote:

On 05/25/2018 01:28 AM, Prashant Bhole wrote:



On 5/24/2018 1:58 PM, John Fastabend wrote:

On 05/23/2018 09:47 PM, Prashant Bhole wrote:



On 5/23/2018 6:44 PM, Prashant Bhole wrote:



On 5/22/2018 2:08 AM, John Fastabend wrote:

On 05/20/2018 10:13 PM, Prashant Bhole wrote:



On 5/19/2018 1:42 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test
output
such that parameters values (cork, apply, start, end) of failed
test
can be easily seen.



Great, this was on my list so thanks for taking care of it.


Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't
multiples
of cork value.



This is expected. When 'cork' is set the sender should only xmit
the data when 'cork' bytes are available. If the user doesn't
provide the N bytes the data is cork'ed waiting for the bytes and
if the socket is closed the state is cleaned up. What these tests
are testing is the cleanup path when a user doesn't provide the
N bytes. In practice this is used to validate headers and prevent
users from sending partial headers. We want to keep these tests
because
they verify a tear-down path in the code.


Ok.



After your changes do these get reported as failures? If so we
need to account for the above in the calculations.


Yes, cork related test are reported as failures because of RX thread
timeout.

So with your above description, I think we need to differentiate cork
tests with partial data and full data. In partial data test we can
have
something like "timeout_expected" flag. Any other way to fix it?



Adding a flag seems reasonable to me. Lets do this for now. Also I
plan to add more negative tests so we can either use the same
flag or a new one for those cases as well.



John,
I worked on this for some time and noticed that the RX-timeout of
tests with cork parameter is dependent on various parameters. So we
can not set a flag like the way 'drop_expected' flag is set before
executing the test.

So I decided to write a function which judges all parameters before
each test and decides whether a test with cork parameter will
timeout or not. Then the conditions in the function became
complicated. For example some tests fail if opt->rate < 17 (with
some other conditions). Here is 17 is related to FRAGS_PER_SKB.
Consider following two examples.

I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/



./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage
--txmsg --txmsg_cork 1024   # RX timeout occurs

./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage
--txmsg --txmsg_cork 1024   # Success!



Ah yes this hits the buffer limit and flushes the queue. The kernel
side doesn't know how to merge those specific sendpage requests so
it gives each request its own buffer and when the limit is reached
we flush it.


Do we need to keep such tests? if yes, then I will continue with
adding such conditions in the function.



Yes, these tests are needed because they are testing the edge cases.
These are probably the most important tests because my normal usage
will catch any issues in the "good" cases its these types of things
that can go unnoticed (at least for a short while) if we don't have
specific tests for them.


I tried but it is difficult to come up with a right set of conditions
which lead to test failure.



Agreed, it can be yes. How about adding your logic for all tests except
"cork" cases. If there is a flag to set if the timeout is expected we
can always manually set it in the test invocation. Might not be as
nice as automatically learning the expected results but possibly easier
than building some complicated logic to figure it out.

Would you mind submitting your series again without the "cork" tests
being tracked? And if you want add a bit to tell if the "cork" tests are
going to timeout or not setting it per test manually. But I think
your series can just omit the cork test for now and still be useful.


Ok. I will submit the series again. Without any change in actual 
patches, but cover letter reorganized. Thanks.


-Prashant



Re: [PATCH bpf-next 0/5] fix test_sockmap

2018-05-25 Thread Prashant Bhole



On 5/24/2018 1:58 PM, John Fastabend wrote:

On 05/23/2018 09:47 PM, Prashant Bhole wrote:



On 5/23/2018 6:44 PM, Prashant Bhole wrote:



On 5/22/2018 2:08 AM, John Fastabend wrote:

On 05/20/2018 10:13 PM, Prashant Bhole wrote:



On 5/19/2018 1:42 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test output
such that parameters values (cork, apply, start, end) of failed test
can be easily seen.



Great, this was on my list so thanks for taking care of it.


Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't multiples
of cork value.



This is expected. When 'cork' is set the sender should only xmit
the data when 'cork' bytes are available. If the user doesn't
provide the N bytes the data is cork'ed waiting for the bytes and
if the socket is closed the state is cleaned up. What these tests
are testing is the cleanup path when a user doesn't provide the
N bytes. In practice this is used to validate headers and prevent
users from sending partial headers. We want to keep these tests because
they verify a tear-down path in the code.


Ok.



After your changes do these get reported as failures? If so we
need to account for the above in the calculations.


Yes, cork related test are reported as failures because of RX thread
timeout.

So with your above description, I think we need to differentiate cork
tests with partial data and full data. In partial data test we can have
something like "timeout_expected" flag. Any other way to fix it?



Adding a flag seems reasonable to me. Lets do this for now. Also I
plan to add more negative tests so we can either use the same
flag or a new one for those cases as well.



John,
I worked on this for some time and noticed that the RX-timeout of
tests with cork parameter is dependent on various parameters. So we
can not set a flag like the way 'drop_expected' flag is set before
executing the test.

So I decided to write a function which judges all parameters before
each test and decides whether a test with cork parameter will
timeout or not. Then the conditions in the function became
complicated. For example some tests fail if opt->rate < 17 (with
some other conditions). Here is 17 is related to FRAGS_PER_SKB.
Consider following two examples.

I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/



./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage
--txmsg --txmsg_cork 1024   # RX timeout occurs

./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage
--txmsg --txmsg_cork 1024   # Success!



Ah yes this hits the buffer limit and flushes the queue. The kernel
side doesn't know how to merge those specific sendpage requests so
it gives each request its own buffer and when the limit is reached
we flush it.


Do we need to keep such tests? if yes, then I will continue with
adding such conditions in the function.



Yes, these tests are needed because they are testing the edge cases.
These are probably the most important tests because my normal usage
will catch any issues in the "good" cases its these types of things
that can go unnoticed (at least for a short while) if we don't have
specific tests for them.


I tried but it is difficult to come up with a right set of conditions 
which lead to test failure.


-Prashant


Thanks for doing this.
John





Re: [PATCH bpf-next 0/5] fix test_sockmap

2018-05-23 Thread Prashant Bhole



On 5/23/2018 6:44 PM, Prashant Bhole wrote:



On 5/22/2018 2:08 AM, John Fastabend wrote:

On 05/20/2018 10:13 PM, Prashant Bhole wrote:



On 5/19/2018 1:42 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test output
such that parameters values (cork, apply, start, end) of failed test
can be easily seen.



Great, this was on my list so thanks for taking care of it.


Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't multiples
of cork value.



This is expected. When 'cork' is set the sender should only xmit
the data when 'cork' bytes are available. If the user doesn't
provide the N bytes the data is cork'ed waiting for the bytes and
if the socket is closed the state is cleaned up. What these tests
are testing is the cleanup path when a user doesn't provide the
N bytes. In practice this is used to validate headers and prevent
users from sending partial headers. We want to keep these tests because
they verify a tear-down path in the code.


Ok.



After your changes do these get reported as failures? If so we
need to account for the above in the calculations.


Yes, cork related test are reported as failures because of RX thread
timeout.

So with your above description, I think we need to differentiate cork
tests with partial data and full data. In partial data test we can have
something like "timeout_expected" flag. Any other way to fix it?



Adding a flag seems reasonable to me. Lets do this for now. Also I
plan to add more negative tests so we can either use the same
flag or a new one for those cases as well.



John,
I worked on this for some time and noticed that the RX-timeout of tests 
with cork parameter is dependent on various parameters. So we can not 
set a flag like the way 'drop_expected' flag is set before executing the 
test.


So I decided to write a function which judges all parameters before each 
test and decides whether a test with cork parameter will timeout or not. 
Then the conditions in the function became complicated. For example some 
tests fail if opt->rate < 17 (with some other conditions). Here is 17 is 
related to FRAGS_PER_SKB. Consider following two examples.


I'm sorry. Correction: s/FRAGS_PER_SKB/MAX_SKB_FRAGS/



./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage 
--txmsg --txmsg_cork 1024   # RX timeout occurs


./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage 
--txmsg --txmsg_cork 1024   # Success!


Do we need to keep such tests? if yes, then I will continue with adding 
such conditions in the function.



-Prashant







Re: [PATCH bpf-next 0/5] fix test_sockmap

2018-05-23 Thread Prashant Bhole



On 5/22/2018 2:08 AM, John Fastabend wrote:

On 05/20/2018 10:13 PM, Prashant Bhole wrote:



On 5/19/2018 1:42 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test output
such that parameters values (cork, apply, start, end) of failed test
can be easily seen.



Great, this was on my list so thanks for taking care of it.


Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't multiples
of cork value.



This is expected. When 'cork' is set the sender should only xmit
the data when 'cork' bytes are available. If the user doesn't
provide the N bytes the data is cork'ed waiting for the bytes and
if the socket is closed the state is cleaned up. What these tests
are testing is the cleanup path when a user doesn't provide the
N bytes. In practice this is used to validate headers and prevent
users from sending partial headers. We want to keep these tests because
they verify a tear-down path in the code.


Ok.



After your changes do these get reported as failures? If so we
need to account for the above in the calculations.


Yes, cork related test are reported as failures because of RX thread
timeout.

So with your above description, I think we need to differentiate cork
tests with partial data and full data. In partial data test we can have
something like "timeout_expected" flag. Any other way to fix it?



Adding a flag seems reasonable to me. Lets do this for now. Also I
plan to add more negative tests so we can either use the same
flag or a new one for those cases as well.



John,
I worked on this for some time and noticed that the RX-timeout of tests 
with cork parameter is dependent on various parameters. So we can not 
set a flag like the way 'drop_expected' flag is set before executing the 
test.


So I decided to write a function which judges all parameters before each 
test and decides whether a test with cork parameter will timeout or not. 
Then the conditions in the function became complicated. For example some 
tests fail if opt->rate < 17 (with some other conditions). Here is 17 is 
related to FRAGS_PER_SKB. Consider following two examples.


./test_sockmap --cgroup /mnt/cgroup2 -r 16 -i 1 -l 30 -t sendpage 
--txmsg --txmsg_cork 1024   # RX timeout occurs


./test_sockmap --cgroup /mnt/cgroup2 -r 17 -i 1 -l 30 -t sendpage 
--txmsg --txmsg_cork 1024   # Success!


Do we need to keep such tests? if yes, then I will continue with adding 
such conditions in the function.



-Prashant



Re: [PATCH bpf-next 0/5] fix test_sockmap

2018-05-20 Thread Prashant Bhole



On 5/19/2018 1:54 AM, Shuah Khan wrote:

On 05/18/2018 01:17 AM, Prashant Bhole wrote:

This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test output
such that parameters values (cork, apply, start, end) of failed test
can be easily seen.

Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't multiples
of cork value.

Prashant Bhole (5):
   selftests/bpf: test_sockmap, check test failure
   selftests/bpf: test_sockmap, join cgroup in selftest mode
   selftests/bpf: test_sockmap, fix test timeout
   selftests/bpf: test_sockmap, fix data verification
   selftests/bpf: test_sockmap, print additional test options

  tools/testing/selftests/bpf/test_sockmap.c | 76 +++---
  1 file changed, 58 insertions(+), 18 deletions(-)



Please remember to cc linux-kselftest mailing list as well. I would like to see
all the test patches cc'ed to it. Linaro and other test users watch the 
kselftest
mailing list. I also have patchwork project now to manage the patch volume. >
I am okay with patches going through net/bpf trees - there are always test
dependencies on net/bpf trees.


Ok. I will remember this. Thanks.

-Prashant




Re: [PATCH bpf-next 0/5] fix test_sockmap

2018-05-20 Thread Prashant Bhole



On 5/19/2018 1:42 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test output
such that parameters values (cork, apply, start, end) of failed test
can be easily seen.



Great, this was on my list so thanks for taking care of it.


Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't multiples
of cork value.



This is expected. When 'cork' is set the sender should only xmit
the data when 'cork' bytes are available. If the user doesn't
provide the N bytes the data is cork'ed waiting for the bytes and
if the socket is closed the state is cleaned up. What these tests
are testing is the cleanup path when a user doesn't provide the
N bytes. In practice this is used to validate headers and prevent
users from sending partial headers. We want to keep these tests because
they verify a tear-down path in the code.


Ok.



After your changes do these get reported as failures? If so we
need to account for the above in the calculations.


Yes, cork related test are reported as failures because of RX thread 
timeout.


So with your above description, I think we need to differentiate cork 
tests with partial data and full data. In partial data test we can have 
something like "timeout_expected" flag. Any other way to fix it?


-Prashant




Prashant Bhole (5):
   selftests/bpf: test_sockmap, check test failure
   selftests/bpf: test_sockmap, join cgroup in selftest mode
   selftests/bpf: test_sockmap, fix test timeout
   selftests/bpf: test_sockmap, fix data verification
   selftests/bpf: test_sockmap, print additional test options

  tools/testing/selftests/bpf/test_sockmap.c | 76 +++---
  1 file changed, 58 insertions(+), 18 deletions(-)









Re: [PATCH bpf-next 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode

2018-05-20 Thread Prashant Bhole



On 5/19/2018 1:45 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
--


Thanks, LGTM. Should this be the first patch in the series though?
I wonder if after patch 1 if you would get failures without this
patch.


Patch 1 fixes selftest mode as well as manual mode. This patch 2 is 
specifically for selftest mode, hence the sequence.


- Prashant




Re: [PATCH bpf-next 3/5] selftests/bpf: test_sockmap, fix test timeout

2018-05-20 Thread Prashant Bhole



On 5/19/2018 1:47 AM, John Fastabend wrote:

On 05/18/2018 12:17 AM, Prashant Bhole wrote:

In order to reduce runtime of tests, recently timout for select() call
was reduced from 1sec to 10usec. This was causing many tests failures.
It was caught with failure handling commits in this series.

Restoring the timeout from 10usec to 1sec

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---


Whats the runtime for the entire test suite after this? I agree
I was probably to aggressive in setting this but on the other
hand I was trying to avoid letting the test run for minutes.



Currently any failure stops further tests.
So I made local change to force cork=0. With this change, it takes 
around 60 seconds for all 648 tests. It will change after fixing cork tests.


-Prashant



[PATCH bpf-next 5/5] selftests/bpf: test_sockmap, print additional test options

2018-05-18 Thread Prashant Bhole
Print values of test options like apply, cork, start, end so that
individual failed tests can be identified for manual run

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index e8faf4596dac..7970d48c4acb 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -869,6 +869,8 @@ static char *test_to_str(int test)
 #define OPTSTRING 60
 static void test_options(char *options)
 {
+   char tstr[OPTSTRING];
+
memset(options, 0, OPTSTRING);
 
if (txmsg_pass)
@@ -881,14 +883,22 @@ static void test_options(char *options)
strncat(options, "redir_noisy,", OPTSTRING);
if (txmsg_drop)
strncat(options, "drop,", OPTSTRING);
-   if (txmsg_apply)
-   strncat(options, "apply,", OPTSTRING);
-   if (txmsg_cork)
-   strncat(options, "cork,", OPTSTRING);
-   if (txmsg_start)
-   strncat(options, "start,", OPTSTRING);
-   if (txmsg_end)
-   strncat(options, "end,", OPTSTRING);
+   if (txmsg_apply) {
+   snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_cork) {
+   snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_start) {
+   snprintf(tstr, OPTSTRING, "start %d,", txmsg_start);
+   strncat(options, tstr, OPTSTRING);
+   }
+   if (txmsg_end) {
+   snprintf(tstr, OPTSTRING, "end %d,", txmsg_end);
+   strncat(options, tstr, OPTSTRING);
+   }
if (txmsg_ingress)
strncat(options, "ingress,", OPTSTRING);
if (txmsg_skb)
@@ -897,7 +907,7 @@ static void test_options(char *options)
 
 static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
 {
-   char *options = calloc(60, sizeof(char));
+   char *options = calloc(OPTSTRING, sizeof(char));
int err;
 
if (test == SENDPAGE)
-- 
2.14.3




[PATCH bpf-next 4/5] selftests/bpf: test_sockmap, fix data verification

2018-05-18 Thread Prashant Bhole
When data verification is enabled, some tests fail because verification is done
incorrectly. Following changes fix it.

- Identify the size of data block to be verified
- Reset verification counter when data block size is reached
- Fixed the value printed in case of verification failure

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index f79397513362..e8faf4596dac 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -337,8 +337,15 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
int fd_flags = O_NONBLOCK;
struct timeval timeout;
float total_bytes;
+   int bytes_cnt = 0;
+   int chunk_sz;
fd_set w;
 
+   if (opt->sendpage)
+   chunk_sz = iov_length * cnt;
+   else
+   chunk_sz = iov_length * iov_count;
+
fcntl(fd, fd_flags);
total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
err = clock_gettime(CLOCK_MONOTONIC, >start);
@@ -388,9 +395,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
errno = -EIO;
fprintf(stderr,
"detected data 
corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
-   i, j, d[j], k - 
1, d[j+1], k + 1);
+   i, j, d[j], k - 
1, d[j+1], k);
goto out_errno;
}
+   bytes_cnt++;
+   if (bytes_cnt == chunk_sz) {
+   k = 0;
+   bytes_cnt = 0;
+   }
recv--;
}
}
-- 
2.14.3




[PATCH bpf-next 1/5] selftests/bpf: test_sockmap, check test failure

2018-05-18 Thread Prashant Bhole
Test failures are not identified because exit code of RX/TX threads
is not checked. Also threads are not returning correct exit code.

- Return exit code from threads depending on test execution status
- In main thread, check the exit code of RX/TX threads

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index eb17fae458e6..34feb74c95c4 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -429,8 +429,8 @@ static int sendmsg_test(struct sockmap_options *opt)
struct msg_stats s = {0};
int iov_count = opt->iov_count;
int iov_buf = opt->iov_length;
+   int rx_status, tx_status;
int cnt = opt->rate;
-   int status;
 
errno = 0;
 
@@ -442,7 +442,7 @@ static int sendmsg_test(struct sockmap_options *opt)
rxpid = fork();
if (rxpid == 0) {
if (opt->drop_expected)
-   exit(1);
+   exit(0);
 
if (opt->sendpage)
iov_count = 1;
@@ -463,7 +463,7 @@ static int sendmsg_test(struct sockmap_options *opt)
"rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (rxpid == -1) {
perror("msg_loop_rx: ");
return errno;
@@ -491,14 +491,27 @@ static int sendmsg_test(struct sockmap_options *opt)
"tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB 
%fB/s %fGB/s\n",
s.bytes_sent, sent_Bps, sent_Bps/giga,
s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
-   exit(1);
+   exit(err ? 1 : 0);
} else if (txpid == -1) {
perror("msg_loop_tx: ");
return errno;
}
 
-   assert(waitpid(rxpid, , 0) == rxpid);
-   assert(waitpid(txpid, , 0) == txpid);
+   assert(waitpid(rxpid, _status, 0) == rxpid);
+   assert(waitpid(txpid, _status, 0) == txpid);
+   if (WIFEXITED(rx_status)) {
+   err = WEXITSTATUS(rx_status);
+   if (err) {
+   fprintf(stderr, "rx thread exited with err %d. ", err);
+   goto out;
+   }
+   }
+   if (WIFEXITED(tx_status)) {
+   err = WEXITSTATUS(tx_status);
+   if (err)
+   fprintf(stderr, "tx thread exited with err %d. ", err);
+   }
+out:
return err;
 }
 
-- 
2.14.3




[PATCH bpf-next 2/5] selftests/bpf: test_sockmap, join cgroup in selftest mode

2018-05-18 Thread Prashant Bhole
In case of selftest mode, temporary cgroup environment is created but
cgroup is not joined. It causes test failures. Fixed by joining the
cgroup

Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 34feb74c95c4..8a81ea0e9fb6 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1342,6 +1342,11 @@ static int __test_suite(char *bpf_file)
return cg_fd;
}
 
+   if (join_cgroup(CG_PATH)) {
+   fprintf(stderr, "ERROR: failed to join cgroup\n");
+   return -EINVAL;
+   }
+
/* Tests basic commands and APIs with range of iov values */
txmsg_start = txmsg_end = 0;
err = test_txmsg(cg_fd);
-- 
2.14.3




[PATCH bpf-next 3/5] selftests/bpf: test_sockmap, fix test timeout

2018-05-18 Thread Prashant Bhole
In order to reduce runtime of tests, recently timout for select() call
was reduced from 1sec to 10usec. This was causing many tests failures.
It was caught with failure handling commits in this series.

Restoring the timeout from 10usec to 1sec

Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/bpf/test_sockmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c 
b/tools/testing/selftests/bpf/test_sockmap.c
index 8a81ea0e9fb6..f79397513362 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -345,8 +345,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
if (err < 0)
perror("recv start time: ");
while (s->bytes_recvd < total_bytes) {
-   timeout.tv_sec = 0;
-   timeout.tv_usec = 10;
+   timeout.tv_sec = 1;
+   timeout.tv_usec = 0;
 
/* FD sets */
FD_ZERO();
-- 
2.14.3




[PATCH bpf-next 0/5] fix test_sockmap

2018-05-18 Thread Prashant Bhole
This series fixes bugs in test_sockmap code. They weren't caught
previously because failure in RX/TX thread was not notified to the
main thread.

Also fixed data verification logic and slightly improved test output
such that parameters values (cork, apply, start, end) of failed test
can be easily seen.

Note: Even after fixing above problems there are issues with tests
which set cork parameter. Tests fail (RX thread timeout) when cork
value is non-zero and overall data sent by TX thread isn't multiples
of cork value.

Prashant Bhole (5):
  selftests/bpf: test_sockmap, check test failure
  selftests/bpf: test_sockmap, join cgroup in selftest mode
  selftests/bpf: test_sockmap, fix test timeout
  selftests/bpf: test_sockmap, fix data verification
  selftests/bpf: test_sockmap, print additional test options

 tools/testing/selftests/bpf/test_sockmap.c | 76 +++---
 1 file changed, 58 insertions(+), 18 deletions(-)

-- 
2.14.3




[PATCH bpf-next v2] samples/bpf: xdp_monitor, accept short options

2018-05-14 Thread Prashant Bhole
Updated optstring parameter for getopt_long() to accept short options.
Also updated usage() function.

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 samples/bpf/xdp_monitor_user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..05ad3f590c91 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -58,7 +58,7 @@ static void usage(char *argv[])
printf(" flag (internal value:%d)",
   *long_options[i].flag);
else
-   printf("(internal short-option: -%c)",
+   printf("short-option: -%c",
   long_options[i].val);
printf("\n");
}
@@ -594,7 +594,7 @@ int main(int argc, char **argv)
snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
 
/* Parse commands line args */
-   while ((opt = getopt_long(argc, argv, "h",
+   while ((opt = getopt_long(argc, argv, "hDSs:",
  long_options, )) != -1) {
switch (opt) {
case 'D':
-- 
2.13.6




Re: [PATCH bpf-next] samples/bpf: xdp_monitor, accept short options

2018-05-14 Thread Prashant Bhole



On 5/12/2018 1:31 AM, Jesper Dangaard Brouer wrote:

On Fri, 11 May 2018 10:37:51 +0900
Prashant Bhole <bhole_prashant...@lab.ntt.co.jp> wrote:


updated optstring accept short options

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
  samples/bpf/xdp_monitor_user.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..668511c77aaf 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -594,7 +594,7 @@ int main(int argc, char **argv)
snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
  
  	/* Parse commands line args */

-   while ((opt = getopt_long(argc, argv, "h",
+   while ((opt = getopt_long(argc, argv, "hDSs:",
  long_options, )) != -1) {
switch (opt) {
case 'D':


It was actually on purpose that I didn't add the short options,
in-order to force people use those "self-documenting" long-options when
they show the usage on public mailing lists or in blog-posts.


Got it.



If you want these short options, you also have to correct the "usage"
function that state these are "internal" short-options.


I am submitting v2 with "usage" updated, because with usability point of 
view it is nice to have short options. Thanks.


-Prashant



Notice the long options parsing done by getopt_long() allow you to only
specify part of the string.  Al-through, I can see --s is ambiguous.

$ sudo ./xdp_monitor --s
./xdp_monitor: option '--s' is ambiguous; possibilities: '--stats' '--sec'





[PATCH bpf-next] samples/bpf: xdp_monitor, accept short options

2018-05-10 Thread Prashant Bhole
updated optstring accept short options

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 samples/bpf/xdp_monitor_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index 894bc64c2cac..668511c77aaf 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -594,7 +594,7 @@ int main(int argc, char **argv)
snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
 
/* Parse commands line args */
-   while ((opt = getopt_long(argc, argv, "h",
+   while ((opt = getopt_long(argc, argv, "hDSs:",
  long_options, )) != -1) {
switch (opt) {
case 'D':
-- 
2.13.6




[PATCH bpf-next] bpf: sync tools bpf.h uapi header

2018-05-08 Thread Prashant Bhole
sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/include/uapi/linux/bpf.h | 84 +-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 83a95ae388dd..ddc566cb7492 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -10,6 +10,8 @@
 
 #include 
 #include 
+#include 
+#include 
 
 /* Extended instruction set based on top of classic BPF */
 
@@ -116,6 +118,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_DEVMAP,
BPF_MAP_TYPE_SOCKMAP,
BPF_MAP_TYPE_CPUMAP,
+   BPF_MAP_TYPE_XSKMAP,
 };
 
 enum bpf_prog_type {
@@ -1825,6 +1828,33 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
+ *
+ * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 
flags)
+ * Description
+ * Do FIB lookup in kernel tables using parameters in *params*.
+ * If lookup is successful and result shows packet is to be
+ * forwarded, the neighbor tables are searched for the nexthop.
+ * If successful (ie., FIB lookup shows forwarding and nexthop
+ * is resolved), the nexthop address is returned in ipv4_dst,
+ * ipv6_dst or mpls_out based on family, smac is set to mac
+ * address of egress device, dmac is set to nexthop mac address,
+ * rt_metric is set to metric from route.
+ *
+ * *plen* argument is the size of the passed in struct.
+ * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags:
+ *
+ * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs
+ * full lookup using FIB rules
+ * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress
+ * perspective (default is ingress)
+ *
+ * *ctx* is either **struct xdp_md** for XDP programs or
+ * **struct sk_buff** tc cls_act programs.
+ *
+ * Return
+ * Egress device index on success, 0 if packet needs to continue
+ * up the stack for further processing or a negative error in case
+ * of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -1895,7 +1925,8 @@ union bpf_attr {
FN(xdp_adjust_tail),\
FN(skb_get_xfrm_state), \
FN(get_stack),  \
-   FN(skb_load_bytes_relative),
+   FN(skb_load_bytes_relative),\
+   FN(fib_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2309,4 +2340,55 @@ struct bpf_raw_tracepoint_args {
__u64 args[0];
 };
 
+/* DIRECT:  Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT:  Do lookup from egress perspective; default is ingress
+ */
+#define BPF_FIB_LOOKUP_DIRECT  BIT(0)
+#define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
+
+struct bpf_fib_lookup {
+   /* input */
+   __u8family;   /* network family, AF_INET, AF_INET6, AF_MPLS */
+
+   /* set if lookup is to consider L4 data - e.g., FIB rules */
+   __u8l4_protocol;
+   __be16  sport;
+   __be16  dport;
+
+   /* total length of packet from network header - used for MTU check */
+   __u16   tot_len;
+   __u32   ifindex;  /* L3 device index for lookup */
+
+   union {
+   /* inputs to lookup */
+   __u8tos;/* AF_INET  */
+   __be32  flowlabel;  /* AF_INET6 */
+
+   /* output: metric of fib result */
+   __u32 rt_metric;
+   };
+
+   union {
+   __be32  mpls_in;
+   __be32  ipv4_src;
+   struct in6_addr ipv6_src;
+   };
+
+   /* input to bpf_fib_lookup, *dst is destination address.
+* output: bpf_fib_lookup sets to gateway address
+*/
+   union {
+   /* return for MPLS lookups */
+   __be32  mpls_out[4];  /* support up to 4 labels */
+   __be32  ipv4_dst;
+   struct in6_addr ipv6_dst;
+   };
+
+   /* output */
+   __be16  h_vlan_proto;
+   __be16  h_vlan_TCI;
+   __u8smac[ETH_ALEN];
+   __u8dmac[ETH_ALEN];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.14.3




[PATCH v2 bpf-next 1/2] lib/scatterlist: add sg_init_marker() helper

2018-03-29 Thread Prashant Bhole
sg_init_marker initializes sg_magic in the sg table and calls
sg_mark_end() on the last entry of the table. This can be useful to
avoid memset in sg_init_table() when scatterlist is already zeroed out

For example: when scatterlist is embedded inside other struct and that
container struct is zeroed out

Suggested-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 include/linux/scatterlist.h | 18 ++
 lib/scatterlist.c   |  9 +
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 22b2131bcdcd..aa5d4eb725f5 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -248,6 +248,24 @@ static inline void *sg_virt(struct scatterlist *sg)
return page_address(sg_page(sg)) + sg->offset;
 }
 
+/**
+ * sg_init_marker - Initialize markers in sg table
+ * @sgl:  The SG table
+ * @nents:Number of entries in table
+ *
+ **/
+static inline void sg_init_marker(struct scatterlist *sgl,
+ unsigned int nents)
+{
+#ifdef CONFIG_DEBUG_SG
+   unsigned int i;
+
+   for (i = 0; i < nents; i++)
+   sgl[i].sg_magic = SG_MAGIC;
+#endif
+   sg_mark_end([nents - 1]);
+}
+
 int sg_nents(struct scatterlist *sg);
 int sg_nents_for_len(struct scatterlist *sg, u64 len);
 struct scatterlist *sg_next(struct scatterlist *);
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 53728d391d3a..06dad7a072fd 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -132,14 +132,7 @@ EXPORT_SYMBOL(sg_last);
 void sg_init_table(struct scatterlist *sgl, unsigned int nents)
 {
memset(sgl, 0, sizeof(*sgl) * nents);
-#ifdef CONFIG_DEBUG_SG
-   {
-   unsigned int i;
-   for (i = 0; i < nents; i++)
-   sgl[i].sg_magic = SG_MAGIC;
-   }
-#endif
-   sg_mark_end([nents - 1]);
+   sg_init_marker(sgl, nents);
 }
 EXPORT_SYMBOL(sg_init_table);
 
-- 
2.14.3




[PATCH v2 bpf-next 0/2] sockmap: fix sg api usage

2018-03-29 Thread Prashant Bhole
These patches fix sg api usage in sockmap. Previously sockmap didn't
use sg_init_table(), which caused hitting BUG_ON in sg api, when
CONFIG_DEBUG_SG is enabled

v1: added sg_init_table() calls wherever needed.

v2:
- Patch1 adds new helper function in sg api. sg_init_marker()
- Patch2 sg_init_marker() and sg_init_table() in appropriate places

Backgroud:
While reviewing v1, John Fastabend raised a valid point about
unnecessary memset in sg_init_table() because sockmap uses sg table
which embedded in a struct. As enclosing struct is zeroed out, there
is unnecessary memset in sg_init_table.

So Daniel Borkmann suggested to define another static inline function
in scatterlist.h which only initializes sg_magic. Also this function 
will be called from sg_init_table. From this suggestion I defined a
function sg_init_marker() which sets sg_magic and calls sg_mark_end()

Prashant Bhole (2):
  lib/scatterlist: add sg_init_marker() helper
  bpf: sockmap: initialize sg table entries properly

 include/linux/scatterlist.h | 18 ++
 kernel/bpf/sockmap.c| 13 -
 lib/scatterlist.c   |  9 +
 3 files changed, 27 insertions(+), 13 deletions(-)

-- 
2.14.3




[PATCH v2 bpf-next 2/2] bpf: sockmap: initialize sg table entries properly

2018-03-29 Thread Prashant Bhole
When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized in
sg_init_table() and it is verified in sg api while navigating. We hit
BUG_ON when magic check is failed.

In functions sg_tcp_sendpage and sg_tcp_sendmsg, the struct containing
the scatterlist is already zeroed out. So to avoid extra memset, we
use sg_init_marker() to initialize sg_magic.

Fixed following things:
- In bpf_tcp_sendpage: initialize sg using sg_init_marker
- In bpf_tcp_sendmsg: Replace sg_init_table with sg_init_marker
- In bpf_tcp_push: Replace memset with sg_init_table where consumed
  sg entry needs to be re-initialized.

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 kernel/bpf/sockmap.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 69c5bccabd22..b4f01656c452 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
md->sg_start++;
if (md->sg_start == MAX_SKB_FRAGS)
md->sg_start = 0;
-   memset(sg, 0, sizeof(*sg));
+   sg_init_table(sg, 1);
 
if (md->sg_start == md->sg_end)
break;
@@ -656,7 +656,7 @@ static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t size)
}
 
sg = md.sg_data;
-   sg_init_table(sg, MAX_SKB_FRAGS);
+   sg_init_marker(sg, MAX_SKB_FRAGS);
rcu_read_unlock();
 
lock_sock(sk);
@@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
 
lock_sock(sk);
 
-   if (psock->cork_bytes)
+   if (psock->cork_bytes) {
m = psock->cork;
-   else
+   sg = >sg_data[m->sg_end];
+   } else {
m = 
+   sg = m->sg_data;
+   sg_init_marker(sg, MAX_SKB_FRAGS);
+   }
 
/* Catch case where ring is full and sendpage is stalled. */
if (unlikely(m->sg_end == m->sg_start &&
@@ -774,7 +778,6 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
goto out_err;
 
psock->sg_size += size;
-   sg = >sg_data[m->sg_end];
sg_set_page(sg, page, size, offset);
get_page(page);
m->sg_copy[m->sg_end] = true;
-- 
2.14.3




Re: [PATCH bpf-next] bpf: sockmap: initialize sg table entries properly

2018-03-29 Thread Prashant Bhole



On 3/28/2018 5:51 PM, Daniel Borkmann wrote:

On 03/28/2018 08:18 AM, Prashant Bhole wrote:

On 3/27/2018 6:05 PM, Daniel Borkmann wrote:

On 03/27/2018 10:41 AM, Prashant Bhole wrote:

On 3/27/2018 12:15 PM, John Fastabend wrote:

On 03/25/2018 11:54 PM, Prashant Bhole wrote:

When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC,
when sg table is initialized using sg_init_table(). Magic is checked
while navigating the scatterlist. We hit BUG_ON when magic check is
failed.

Fixed following things:
- Initialization of sg table in bpf_tcp_sendpage() was missing,
     initialized it using sg_init_table()

- bpf_tcp_sendmsg() initializes sg table using sg_init_table() before
     entering the loop, but further consumed sg entries are initialized
     using memset. Fixed it by replacing memset with sg_init_table() in
     function bpf_tcp_push()

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
    kernel/bpf/sockmap.c | 11 +++
    1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 69c5bccabd22..8a848a99d768 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
    md->sg_start++;
    if (md->sg_start == MAX_SKB_FRAGS)
    md->sg_start = 0;
-    memset(sg, 0, sizeof(*sg));
+    sg_init_table(sg, 1);


Looks OK here.


      if (md->sg_start == md->sg_end)
    break;
@@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
      lock_sock(sk);
    -    if (psock->cork_bytes)
+    if (psock->cork_bytes) {
    m = psock->cork;
-    else
+    sg = >sg_data[m->sg_end];
+    } else {
    m = 
+    sg = m->sg_data;
+    sg_init_table(sg, MAX_SKB_FRAGS);


sg_init_table() does an unnecessary memset() though. We
probably either want a new scatterlist API or just open
code this,

#ifdef CONFIG_DEBUG_SG
{
  unsigned int i;
  for (i = 0; i < nents; i++)
  sgl[i].sg_magic = SG_MAGIC;
}


Similar sg_init_table() is present in bpf_tcp_sendmsg().
I agree that it causes unnecessary memset, but I don't agree with open coded 
fix.


But then lets fix is properly and add a static inline helper to the
include/linux/scatterlist.h header like ...

static inline void sg_init_debug_marker(struct scatterlist *sgl,
     unsigned int nents)
{
#ifdef CONFIG_DEBUG_SG
 unsigned int i;

 for (i = 0; i < nents; i++)
     sgl[i].sg_magic = SG_MAGIC;
#endif
}

... and reuse it in all the places that would otherwise open-code this,
as well as sg_init_table():

void sg_init_table(struct scatterlist *sgl, unsigned int nents)
{
  memset(sgl, 0, sizeof(*sgl) * nents);
 sg_init_debug_marker(sgl, nents);
  sg_mark_end([nents - 1]);
}

This would be a lot cleaner than having this duplicated in various places.


Daniel, This is a good suggestion. Is it ok if I submit both changes in
a patch series?


Sure, that's fine.


How scatterlist related changes will be picked up by other subsystems?


Once this gets applied into bpf-next, this will be pushed to net-next tree,
and during the merge window net-next will be pulled into Linus' tree if this
is what you are asking. Then also other subsystems outside of bpf/networking
can make use of the sg_init_debug_marker() helper if suitable for their
situation.


Thanks. I am submitting V2 soon.

-Prashant



Re: [PATCH bpf-next] bpf: sockmap: initialize sg table entries properly

2018-03-28 Thread Prashant Bhole



On 3/27/2018 6:05 PM, Daniel Borkmann wrote:

On 03/27/2018 10:41 AM, Prashant Bhole wrote:

On 3/27/2018 12:15 PM, John Fastabend wrote:

On 03/25/2018 11:54 PM, Prashant Bhole wrote:

When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC,
when sg table is initialized using sg_init_table(). Magic is checked
while navigating the scatterlist. We hit BUG_ON when magic check is
failed.

Fixed following things:
- Initialization of sg table in bpf_tcp_sendpage() was missing,
    initialized it using sg_init_table()

- bpf_tcp_sendmsg() initializes sg table using sg_init_table() before
    entering the loop, but further consumed sg entries are initialized
    using memset. Fixed it by replacing memset with sg_init_table() in
    function bpf_tcp_push()

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
   kernel/bpf/sockmap.c | 11 +++
   1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 69c5bccabd22..8a848a99d768 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
   md->sg_start++;
   if (md->sg_start == MAX_SKB_FRAGS)
   md->sg_start = 0;
-    memset(sg, 0, sizeof(*sg));
+    sg_init_table(sg, 1);


Looks OK here.


     if (md->sg_start == md->sg_end)
   break;
@@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
     lock_sock(sk);
   -    if (psock->cork_bytes)
+    if (psock->cork_bytes) {
   m = psock->cork;
-    else
+    sg = >sg_data[m->sg_end];
+    } else {
   m = 
+    sg = m->sg_data;
+    sg_init_table(sg, MAX_SKB_FRAGS);


sg_init_table() does an unnecessary memset() though. We
probably either want a new scatterlist API or just open
code this,

#ifdef CONFIG_DEBUG_SG
{
 unsigned int i;
 for (i = 0; i < nents; i++)
     sgl[i].sg_magic = SG_MAGIC;
}


Similar sg_init_table() is present in bpf_tcp_sendmsg().
I agree that it causes unnecessary memset, but I don't agree with open coded 
fix.


But then lets fix is properly and add a static inline helper to the
include/linux/scatterlist.h header like ...

static inline void sg_init_debug_marker(struct scatterlist *sgl,
unsigned int nents)
{
#ifdef CONFIG_DEBUG_SG
unsigned int i;

for (i = 0; i < nents; i++)
sgl[i].sg_magic = SG_MAGIC;
#endif
}

... and reuse it in all the places that would otherwise open-code this,
as well as sg_init_table():

void sg_init_table(struct scatterlist *sgl, unsigned int nents)
{
 memset(sgl, 0, sizeof(*sgl) * nents);
sg_init_debug_marker(sgl, nents);
 sg_mark_end([nents - 1]);
}

This would be a lot cleaner than having this duplicated in various places.


Daniel, This is a good suggestion. Is it ok if I submit both changes in
a patch series? How scatterlist related changes will be picked up by 
other subsystems?


-Prashant



Re: [PATCH bpf-next] bpf: sockmap: initialize sg table entries properly

2018-03-27 Thread Prashant Bhole



On 3/27/2018 12:15 PM, John Fastabend wrote:

On 03/25/2018 11:54 PM, Prashant Bhole wrote:

When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC,
when sg table is initialized using sg_init_table(). Magic is checked
while navigating the scatterlist. We hit BUG_ON when magic check is
failed.

Fixed following things:
- Initialization of sg table in bpf_tcp_sendpage() was missing,
   initialized it using sg_init_table()

- bpf_tcp_sendmsg() initializes sg table using sg_init_table() before
   entering the loop, but further consumed sg entries are initialized
   using memset. Fixed it by replacing memset with sg_init_table() in
   function bpf_tcp_push()

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
  kernel/bpf/sockmap.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 69c5bccabd22..8a848a99d768 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
md->sg_start++;
if (md->sg_start == MAX_SKB_FRAGS)
md->sg_start = 0;
-   memset(sg, 0, sizeof(*sg));
+   sg_init_table(sg, 1);


Looks OK here.

  
  			if (md->sg_start == md->sg_end)

break;
@@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
  
  	lock_sock(sk);
  
-	if (psock->cork_bytes)

+   if (psock->cork_bytes) {
m = psock->cork;
-   else
+   sg = >sg_data[m->sg_end];
+   } else {
m = 
+   sg = m->sg_data;
+   sg_init_table(sg, MAX_SKB_FRAGS);


sg_init_table() does an unnecessary memset() though. We
probably either want a new scatterlist API or just open
code this,

#ifdef CONFIG_DEBUG_SG
{
unsigned int i;
for (i = 0; i < nents; i++)
sgl[i].sg_magic = SG_MAGIC;
}


Similar sg_init_table() is present in bpf_tcp_sendmsg().
I agree that it causes unnecessary memset, but I don't agree with open 
coded fix.

I am still with V1. Thanks.

-Prashant




[PATCH bpf-next] bpf: sockmap: initialize sg table entries properly

2018-03-26 Thread Prashant Bhole
When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized to SG_MAGIC,
when sg table is initialized using sg_init_table(). Magic is checked
while navigating the scatterlist. We hit BUG_ON when magic check is
failed.

Fixed following things:
- Initialization of sg table in bpf_tcp_sendpage() was missing,
  initialized it using sg_init_table()

- bpf_tcp_sendmsg() initializes sg table using sg_init_table() before
  entering the loop, but further consumed sg entries are initialized
  using memset. Fixed it by replacing memset with sg_init_table() in
  function bpf_tcp_push()

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 kernel/bpf/sockmap.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 69c5bccabd22..8a848a99d768 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -312,7 +312,7 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
md->sg_start++;
if (md->sg_start == MAX_SKB_FRAGS)
md->sg_start = 0;
-   memset(sg, 0, sizeof(*sg));
+   sg_init_table(sg, 1);
 
if (md->sg_start == md->sg_end)
break;
@@ -763,10 +763,14 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
 
lock_sock(sk);
 
-   if (psock->cork_bytes)
+   if (psock->cork_bytes) {
m = psock->cork;
-   else
+   sg = >sg_data[m->sg_end];
+   } else {
m = 
+   sg = m->sg_data;
+   sg_init_table(sg, MAX_SKB_FRAGS);
+   }
 
/* Catch case where ring is full and sendpage is stalled. */
if (unlikely(m->sg_end == m->sg_start &&
@@ -774,7 +778,6 @@ static int bpf_tcp_sendpage(struct sock *sk, struct page 
*page,
goto out_err;
 
psock->sg_size += size;
-   sg = >sg_data[m->sg_end];
sg_set_page(sg, page, size, offset);
get_page(page);
m->sg_copy[m->sg_end] = true;
-- 
2.14.3




[PATCH net-next] selftests/net: fix in_netns.sh script

2018-03-06 Thread Prashant Bhole
execute the subprocess in netns using 'ip netns exec'

Fixes: cc30c93fa020 ("selftests/net: ignore background traffic in psock_fanout")
Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/net/in_netns.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/in_netns.sh 
b/tools/testing/selftests/net/in_netns.sh
index f57a2eaad069..88795b510b32 100755
--- a/tools/testing/selftests/net/in_netns.sh
+++ b/tools/testing/selftests/net/in_netns.sh
@@ -19,5 +19,5 @@ cleanup() {
 trap cleanup EXIT
 setup
 
-"$@"
+ip netns exec "${NETNS}" "$@"
 exit "$?"
-- 
2.14.3




[PATCH net-next] selftests: rtnetlink: remove testns on test fail

2018-03-01 Thread Prashant Bhole
This patch removes testns after test failure so that next test can
continue with clean ns

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/net/rtnetlink.sh | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/net/rtnetlink.sh 
b/tools/testing/selftests/net/rtnetlink.sh
index a622eeecc3a6..e6f485235435 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -517,6 +517,7 @@ kci_test_gretap()
ip link help gretap 2>&1 | grep -q "^Usage:"
if [ $? -ne 0 ];then
echo "SKIP: gretap: iproute2 too old"
+   ip netns del "$testns"
return 1
fi
 
@@ -543,6 +544,7 @@ kci_test_gretap()
 
if [ $ret -ne 0 ]; then
echo "FAIL: gretap"
+   ip netns del "$testns"
return 1
fi
echo "PASS: gretap"
@@ -565,6 +567,7 @@ kci_test_ip6gretap()
ip link help ip6gretap 2>&1 | grep -q "^Usage:"
if [ $? -ne 0 ];then
echo "SKIP: ip6gretap: iproute2 too old"
+   ip netns del "$testns"
return 1
fi
 
@@ -591,6 +594,7 @@ kci_test_ip6gretap()
 
if [ $ret -ne 0 ]; then
echo "FAIL: ip6gretap"
+   ip netns del "$testns"
return 1
fi
echo "PASS: ip6gretap"
@@ -655,6 +659,7 @@ kci_test_erspan()
 
if [ $ret -ne 0 ]; then
echo "FAIL: erspan"
+   ip netns del "$testns"
return 1
fi
echo "PASS: erspan"
@@ -720,6 +725,7 @@ kci_test_ip6erspan()
 
if [ $ret -ne 0 ]; then
echo "FAIL: ip6erspan"
+   ip netns del "$testns"
return 1
fi
echo "PASS: ip6erspan"
-- 
2.13.6




[PATCH bpf-next] samples/bpf: detach prog from cgroup

2018-02-28 Thread Prashant Bhole
test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
attached to cgroup even after completion.
Using detach functionality of test_cgrp2_sock in both scripts.

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 samples/bpf/test_cgrp2_sock.sh  | 1 +
 samples/bpf/test_cgrp2_sock2.sh | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh
index 8ee0371a100a..9f6174236856 100755
--- a/samples/bpf/test_cgrp2_sock.sh
+++ b/samples/bpf/test_cgrp2_sock.sh
@@ -61,6 +61,7 @@ cleanup_and_exit()
 
[ -n "$msg" ] && echo "ERROR: $msg"
 
+   test_cgrp2_sock -d ${CGRP_MNT}/sockopts
ip li del cgrp2_sock
umount ${CGRP_MNT}
 
diff --git a/samples/bpf/test_cgrp2_sock2.sh b/samples/bpf/test_cgrp2_sock2.sh
index fc4e64d00cb3..0f396a86e0cb 100755
--- a/samples/bpf/test_cgrp2_sock2.sh
+++ b/samples/bpf/test_cgrp2_sock2.sh
@@ -28,6 +28,9 @@ function attach_bpf {
 }
 
 function cleanup {
+   if [ -d /tmp/cgroupv2/foo ]; then
+   test_cgrp2_sock -d /tmp/cgroupv2/foo
+   fi
ip link del veth0b
ip netns delete at_ns0
umount /tmp/cgroupv2
-- 
2.13.6




[PATCH net-next v2] selftests/net: fixes psock_fanout eBPF test case

2018-02-14 Thread Prashant Bhole
eBPF test fails due to verifier failure because log_buf is too small.
Fixed by increasing log_buf size

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
v2: log_buf is statically allocated

 tools/testing/selftests/net/psock_fanout.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c 
b/tools/testing/selftests/net/psock_fanout.c
index 989f917068d1..d4346b16b2c1 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -128,6 +128,8 @@ static void sock_fanout_getopts(int fd, uint16_t 
*typeflags, uint16_t *group_id)
 
 static void sock_fanout_set_ebpf(int fd)
 {
+   static char log_buf[65536];
+
const int len_off = __builtin_offsetof(struct __sk_buff, len);
struct bpf_insn prog[] = {
{ BPF_ALU64 | BPF_MOV | BPF_X,   6, 1, 0, 0 },
@@ -140,7 +142,6 @@ static void sock_fanout_set_ebpf(int fd)
{ BPF_ALU   | BPF_MOV | BPF_K,   0, 0, 0, 0 },
{ BPF_JMP   | BPF_EXIT,  0, 0, 0, 0 }
};
-   char log_buf[512];
union bpf_attr attr;
int pfd;
 
-- 
2.13.6




[PATCH net] selftests/net: fixes psock_fanout eBPF test case

2018-02-13 Thread Prashant Bhole
eBPF test fails due to verifier failure because log_buf is too small

Fixed by increasing log_buf size

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 tools/testing/selftests/net/psock_fanout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c 
b/tools/testing/selftests/net/psock_fanout.c
index 989f917068d1..898278a41322 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -140,7 +140,7 @@ static void sock_fanout_set_ebpf(int fd)
{ BPF_ALU   | BPF_MOV | BPF_K,   0, 0, 0, 0 },
{ BPF_JMP   | BPF_EXIT,  0, 0, 0, 0 }
};
-   char log_buf[512];
+   char log_buf[65536];
union bpf_attr attr;
int pfd;
 
-- 
2.13.6




[PATCH bpf] bpf: samples/sockmap detach sock ops program

2018-02-12 Thread Prashant Bhole
samples/sockops program keeps the sock_ops program attached to cgroup.
Fixed this by detaching program before exit.

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 samples/sockmap/sockmap_user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 7c25c0c112bc..95a54a89a532 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -566,6 +566,7 @@ int main(int argc, char **argv)
else
fprintf(stderr, "unknown test\n");
 out:
+   bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS);
close(s1);
close(s2);
close(p1);
-- 
2.13.6




[PATCH bpf] bpf: samples/sockmap fix Makefile for build error

2018-02-12 Thread Prashant Bhole
While building samples/sockmap, undefined reference error is thrown
for `nla_dump_errormsg'.
Linking tools/lib/bpf/nlattr.o as a fix 

Signed-off-by: Prashant Bhole <bhole_prashant...@lab.ntt.co.jp>
---
 samples/sockmap/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/sockmap/Makefile b/samples/sockmap/Makefile
index 73f1da4d116c..9bf2881bd11b 100644
--- a/samples/sockmap/Makefile
+++ b/samples/sockmap/Makefile
@@ -2,7 +2,7 @@
 hostprogs-y := sockmap
 
 # Libbpf dependencies
-LIBBPF := ../../tools/lib/bpf/bpf.o
+LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
-- 
2.13.6




  1   2   >