Re: [PATCH bpf-next v2 1/2] bpf: verifier: Support eliding map lookup nullness

2024-09-22 Thread Daniel Xu
On Fri, Sep 20, 2024 at 03:05:35PM GMT, Eduard Zingerman wrote:
> On Sun, 2024-09-15 at 21:45 -0600, Daniel Xu wrote:
> > This commit allows progs to elide a null check on statically known map
> > lookup keys. In other words, if the verifier can statically prove that
> > the lookup will be in-bounds, allow the prog to drop the null check.
> > 
> > This is useful for two reasons:
> > 
> > 1. Large numbers of nullness checks (especially when they cannot fail)
> >unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
> > 2. It forms a tighter contract between programmer and verifier.
> > 
> > For (1), bpftrace is starting to make heavier use of percpu scratch
> > maps. As a result, for user scripts with large number of unrolled loops,
> > we are starting to hit jump complexity verification errors.  These
> > percpu lookups cannot fail anyways, as we only use static key values.
> > Eliding nullness probably results in less work for verifier as well.
> > 
> > For (2), percpu scratch maps are often used as a larger stack, as the
> > currrent stack is limited to 512 bytes. In these situations, it is
> > desirable for the programmer to express: "this lookup should never fail,
> > and if it does, it means I messed up the code". By omitting the null
> > check, the programmer can "ask" the verifier to double check the logic.
> 
> Nit: maybe add a few lines why tools/testing/selftests/bpf/progs/iters.c
>  has to be changed.

Ack.

> 
> [...]
> 
> > +/* Returns constant key value if possible, else -1 */
> > +static long get_constant_map_key(struct bpf_verifier_env *env,
> > +struct bpf_reg_state *key)
> > +{
> > +   struct bpf_func_state *state = func(env, key);
> > +   struct bpf_reg_state *reg;
> > +   int stack_off;
> > +   int slot;
> > +   int spi;
> > +
> > +   if (key->type != PTR_TO_STACK)
> > +   return -1;
> > +   if (!tnum_is_const(key->var_off))
> > +   return -1;
> > +
> > +   stack_off = key->off + key->var_off.value;
> > +   slot = -stack_off - 1;
> > +   if (slot >= state->allocated_stack)
> > +   /* Stack uninitialized */
> > +   return -1;
> 
> I'm not sure verifier guarantees that key->off is negative.
> E.g. the following simple program:
> 
> 0: (b7) r1 = 16   ; R1_w=16
> 1: (bf) r2 = r10  ; R2_w=fp0 R10=fp0
> 2: (0f) r2 += r1
> mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1 
> mark_precise: frame0: regs=r1 stack= before 1: (bf) r2 = r10
> mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 16
> 3: R1_w=16 R2_w=fp16
> 
> => I think 'slot' should be checked to be >= 0.

Ah, so in case stack grows "up" right? Which seems invalid but probably
good to check.

> 
> > +
> > +   spi = slot / BPF_REG_SIZE;
> > +   reg = &state->stack[spi].spilled_ptr;
> > +   if (!tnum_is_const(reg->var_off))
> > +   /* Stack value not statically known */
> > +   return -1;
> > +
> > +   return reg->var_off.value;
> > +}
> > +
> >  static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
> > const struct bpf_func_proto **ptr)
> >  {
> > @@ -10511,6 +10557,15 @@ static int check_helper_call(struct 
> > bpf_verifier_env *env, struct bpf_insn *insn
> > env->insn_aux_data[insn_idx].storage_get_func_atomic = 
> > true;
> > }
> >  
> > +   /* Logically we are trying to check on key register state before
> > +* the helper is called, so process here. Otherwise argument processing
> > +* may clobber the spilled key values.
> > +*/
> > +   regs = cur_regs(env);
> > +   if (func_id == BPF_FUNC_map_lookup_elem)
> > +   meta.const_map_key = get_constant_map_key(env, 
> > ®s[BPF_REG_2]);
> 
> Nit: there is a long 'switch (func_id)' slightly below this point,
>  maybe move this check there?

I had that initially but discovered that verifier marks the stack value
as unknown as part of check_func_arg(). I _think_ it was:

if (is_spilled_reg(&state->stack[spi]) &&
(state->stack[spi].spilled_ptr.type == SCALAR_VALUE ||
 env->allow_ptr_leaks)) {
if (clobber) {
__mark_reg_unknown(env, &state->stack[spi].spilled_ptr);
for (j = 0; j < BPF_REG_SIZE; j++)

scrub_spilled_slot(&state->stack[spi].slot_type[j]);
}
goto mark;
}

I remember spending some time debugging it. Which is why I left that
comment above this code.

Thanks for reviewing!



Re: [PATCH bpf-next v2 1/2] bpf: verifier: Support eliding map lookup nullness

2024-09-20 Thread Eduard Zingerman
On Sun, 2024-09-15 at 21:45 -0600, Daniel Xu wrote:
> This commit allows progs to elide a null check on statically known map
> lookup keys. In other words, if the verifier can statically prove that
> the lookup will be in-bounds, allow the prog to drop the null check.
> 
> This is useful for two reasons:
> 
> 1. Large numbers of nullness checks (especially when they cannot fail)
>unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
> 2. It forms a tighter contract between programmer and verifier.
> 
> For (1), bpftrace is starting to make heavier use of percpu scratch
> maps. As a result, for user scripts with large number of unrolled loops,
> we are starting to hit jump complexity verification errors.  These
> percpu lookups cannot fail anyways, as we only use static key values.
> Eliding nullness probably results in less work for verifier as well.
> 
> For (2), percpu scratch maps are often used as a larger stack, as the
> currrent stack is limited to 512 bytes. In these situations, it is
> desirable for the programmer to express: "this lookup should never fail,
> and if it does, it means I messed up the code". By omitting the null
> check, the programmer can "ask" the verifier to double check the logic.

Nit: maybe add a few lines why tools/testing/selftests/bpf/progs/iters.c
 has to be changed.

[...]

> +/* Returns constant key value if possible, else -1 */
> +static long get_constant_map_key(struct bpf_verifier_env *env,
> +  struct bpf_reg_state *key)
> +{
> + struct bpf_func_state *state = func(env, key);
> + struct bpf_reg_state *reg;
> + int stack_off;
> + int slot;
> + int spi;
> +
> + if (key->type != PTR_TO_STACK)
> + return -1;
> + if (!tnum_is_const(key->var_off))
> + return -1;
> +
> + stack_off = key->off + key->var_off.value;
> + slot = -stack_off - 1;
> + if (slot >= state->allocated_stack)
> + /* Stack uninitialized */
> + return -1;

I'm not sure verifier guarantees that key->off is negative.
E.g. the following simple program:

0: (b7) r1 = 16   ; R1_w=16
1: (bf) r2 = r10  ; R2_w=fp0 R10=fp0
2: (0f) r2 += r1
mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1 
mark_precise: frame0: regs=r1 stack= before 1: (bf) r2 = r10
mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 16
3: R1_w=16 R2_w=fp16

=> I think 'slot' should be checked to be >= 0.

> +
> + spi = slot / BPF_REG_SIZE;
> + reg = &state->stack[spi].spilled_ptr;
> + if (!tnum_is_const(reg->var_off))
> + /* Stack value not statically known */
> + return -1;
> +
> + return reg->var_off.value;
> +}
> +
>  static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
>   const struct bpf_func_proto **ptr)
>  {
> @@ -10511,6 +10557,15 @@ static int check_helper_call(struct bpf_verifier_env 
> *env, struct bpf_insn *insn
>   env->insn_aux_data[insn_idx].storage_get_func_atomic = 
> true;
>   }
>  
> + /* Logically we are trying to check on key register state before
> +  * the helper is called, so process here. Otherwise argument processing
> +  * may clobber the spilled key values.
> +  */
> + regs = cur_regs(env);
> + if (func_id == BPF_FUNC_map_lookup_elem)
> + meta.const_map_key = get_constant_map_key(env, 
> ®s[BPF_REG_2]);

Nit: there is a long 'switch (func_id)' slightly below this point,
 maybe move this check there?

> +
> +
>   meta.func_id = func_id;
>   /* check args */
>   for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {

[...]




[PATCH bpf-next v2 1/2] bpf: verifier: Support eliding map lookup nullness

2024-09-15 Thread Daniel Xu
This commit allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.

This is useful for two reasons:

1. Large numbers of nullness checks (especially when they cannot fail)
   unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.

For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors.  These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.

For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.

Signed-off-by: Daniel Xu 
---
 kernel/bpf/verifier.c | 64 ++-
 tools/testing/selftests/bpf/progs/iters.c | 14 ++--
 .../selftests/bpf/progs/map_kptr_fail.c   |  2 +-
 .../selftests/bpf/progs/verifier_map_in_map.c |  2 +-
 .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
 5 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7df5c29293a4..e0c9c53ce9c0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -282,6 +282,7 @@ struct bpf_call_arg_meta {
u32 ret_btf_id;
u32 subprogno;
struct btf_field *kptr_field;
+   long const_map_key;
 };
 
 struct bpf_kfunc_call_arg_meta {
@@ -10414,6 +10415,51 @@ static void update_loop_inline_state(struct 
bpf_verifier_env *env, u32 subprogno
 state->callback_subprogno == subprogno);
 }
 
+/* Returns whether or not the given map type can potentially elide
+ * lookup return value nullness check. This is possible if the key
+ * is statically known.
+ */
+static bool can_elide_value_nullness(enum bpf_map_type type)
+{
+   switch (type) {
+   case BPF_MAP_TYPE_ARRAY:
+   case BPF_MAP_TYPE_PERCPU_ARRAY:
+   return true;
+   default:
+   return false;
+   }
+}
+
+/* Returns constant key value if possible, else -1 */
+static long get_constant_map_key(struct bpf_verifier_env *env,
+struct bpf_reg_state *key)
+{
+   struct bpf_func_state *state = func(env, key);
+   struct bpf_reg_state *reg;
+   int stack_off;
+   int slot;
+   int spi;
+
+   if (key->type != PTR_TO_STACK)
+   return -1;
+   if (!tnum_is_const(key->var_off))
+   return -1;
+
+   stack_off = key->off + key->var_off.value;
+   slot = -stack_off - 1;
+   if (slot >= state->allocated_stack)
+   /* Stack uninitialized */
+   return -1;
+
+   spi = slot / BPF_REG_SIZE;
+   reg = &state->stack[spi].spilled_ptr;
+   if (!tnum_is_const(reg->var_off))
+   /* Stack value not statically known */
+   return -1;
+
+   return reg->var_off.value;
+}
+
 static int get_helper_proto(struct bpf_verifier_env *env, int func_id,
const struct bpf_func_proto **ptr)
 {
@@ -10511,6 +10557,15 @@ static int check_helper_call(struct bpf_verifier_env 
*env, struct bpf_insn *insn
env->insn_aux_data[insn_idx].storage_get_func_atomic = 
true;
}
 
+   /* Logically we are trying to check on key register state before
+* the helper is called, so process here. Otherwise argument processing
+* may clobber the spilled key values.
+*/
+   regs = cur_regs(env);
+   if (func_id == BPF_FUNC_map_lookup_elem)
+   meta.const_map_key = get_constant_map_key(env, 
®s[BPF_REG_2]);
+
+
meta.func_id = func_id;
/* check args */
for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
@@ -10771,10 +10826,17 @@ static int check_helper_call(struct bpf_verifier_env 
*env, struct bpf_insn *insn
"kernel subsystem misconfigured verifier\n");
return -EINVAL;
}
+
+   if (func_id == BPF_FUNC_map_lookup_elem &&
+   can_elide_value_nullness(meta.map_ptr->map_type) &&
+   meta.const_map_key >= 0 &&
+   meta.const_map_key < meta.map_ptr->max_entries)
+   ret_flag &= ~PTR_MAYBE_NULL;
+
regs[BPF_REG_0].map_ptr = meta.map_ptr;
regs[BPF_REG_0].map_uid = meta.map_uid;
regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag;
-   if (!type_may_be_null(ret_typ