Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 8/21/17 2:00 PM, Daniel Borkmann wrote: On 08/21/2017 10:44 PM, Edward Cree wrote: On 21/08/17 21:27, Daniel Borkmann wrote: On 08/21/2017 08:36 PM, Edward Cree wrote: On 19/08/17 00:37, Alexei Starovoitov wrote: [...] I'm tempted to just rip out env->varlen_map_value_access and always check the whole thing, because honestly I don't know what it was meant to do originally or how it can ever do any useful pruning. While drastic, it does cause your test case to pass. Original intention from 484611357c19 ("bpf: allow access into map value arrays") was that it wouldn't potentially make pruning worse if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need to take reg state's min_value and max_value into account for state checking; this was basically due to min_value / max_value is being adjusted/tracked on every alu/jmp ops for involved regs (e.g. adjust_reg_min_max_vals() and others that mangle them) even if we have the case that no actual dynamic map access is used throughout the program. To give an example on net tree, the bpf_lxc.o prog's section increases from 36,386 to 68,226 when env->varlen_map_value_access is always true, so it does have an effect. Did you do some checks on this on net-next? I tested with the cilium progs and saw no change in insn count. I suspect that for the normal case I already killed this optimisation when I did my unification patch, it was previously about ignoring min/max values on all regs (including scalars), whereas on net-next it only ignores them on map_value pointers; in practice this is useless because we tend to still have the offset scalar sitting in a register somewhere. (Come to think of it, this may have been behind a large chunk of the #insn increase that my patches caused.) Yeah, this would seem plausible. Since we use umax_value in find_good_pkt_pointers() now (to check against MAX_PACKET_OFF and ensure our reg->range is really ok), we can't just stop caring about all min/max values just because we haven't done any variable map accesses. I don't see a way around this. Agree, was thinking the same. If there's not really a regression in terms of complexity, then lets kill the flag. +1 diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2489e67b65f6..908d13b2a2aa 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3582,7 +3582,7 @@ static int do_check(struct bpf_verifier_env *env) init_reg_state(regs); state->parent = NULL; insn_idx = 0; - env->varlen_map_value_access = false; + env->varlen_map_value_access = true; makes _zero_ difference on cilium*.o tests, so let's just kill that workaround. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 08/21/2017 10:44 PM, Edward Cree wrote: On 21/08/17 21:27, Daniel Borkmann wrote: On 08/21/2017 08:36 PM, Edward Cree wrote: On 19/08/17 00:37, Alexei Starovoitov wrote: [...] I'm tempted to just rip out env->varlen_map_value_access and always check the whole thing, because honestly I don't know what it was meant to do originally or how it can ever do any useful pruning. While drastic, it does cause your test case to pass. Original intention from 484611357c19 ("bpf: allow access into map value arrays") was that it wouldn't potentially make pruning worse if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need to take reg state's min_value and max_value into account for state checking; this was basically due to min_value / max_value is being adjusted/tracked on every alu/jmp ops for involved regs (e.g. adjust_reg_min_max_vals() and others that mangle them) even if we have the case that no actual dynamic map access is used throughout the program. To give an example on net tree, the bpf_lxc.o prog's section increases from 36,386 to 68,226 when env->varlen_map_value_access is always true, so it does have an effect. Did you do some checks on this on net-next? I tested with the cilium progs and saw no change in insn count. I suspect that for the normal case I already killed this optimisation when I did my unification patch, it was previously about ignoring min/max values on all regs (including scalars), whereas on net-next it only ignores them on map_value pointers; in practice this is useless because we tend to still have the offset scalar sitting in a register somewhere. (Come to think of it, this may have been behind a large chunk of the #insn increase that my patches caused.) Yeah, this would seem plausible. Since we use umax_value in find_good_pkt_pointers() now (to check against MAX_PACKET_OFF and ensure our reg->range is really ok), we can't just stop caring about all min/max values just because we haven't done any variable map accesses. I don't see a way around this. Agree, was thinking the same. If there's not really a regression in terms of complexity, then lets kill the flag. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 18/08/17 15:16, Edward Cree wrote: > On 18/08/17 04:21, Alexei Starovoitov wrote: >> It seems you're trying to sort-of do per-fake-basic block liveness >> analysis, but our state_list_marks are not correct if we go with >> canonical basic block definition, since we mark the jump insn and >> not insn after the branch and not every basic block boundary is >> properly detected. > I think the reason this works is that jump insns can't do writes. > [snip] > the sl->state will never have any write marks and it'll all just work. > But I should really test that! I tested this, and found that, no, sl->state can have write marks, and the algorithm will get the wrong answer in that case. So I've got a patch to make the first iteration ignore write marks, as part of a series which I will post shortly. When I do so, please re-do your tests with adding state_list_marks in strange and exciting places; it should work wherever you put them. Like you say, it "magically doesn't depend on proper basic block boundaries", and that's because really pruning is just a kind of checkpointing that just happens to be most effective when done just after a jump (pop_stack). Can I have a SOB for your "grr" test program, so I can include it in the series? -Ed ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 19/08/17 00:37, Alexei Starovoitov wrote: > that '14: safe' above is not correct. > > Disabling liveness as: > @@ -3282,7 +3288,7 @@ static bool regsafe(struct bpf_reg_state *rold, > struct bpf_reg_state *rcur, > bool varlen_map_access, struct idpair *idmap) > { > - if (!(rold->live & REG_LIVE_READ)) > + if (0 && !(rold->live & REG_LIVE_READ)) > > makes the test work properly and proper verifier output is: > from 9 to 11: R0=map_value(id=0,off=0,ks=8,vs=48,imm=0) > R1=inv(id=0,smax_value=10) R2=inv11 R10=fp0 > 11: (64) (u32) r1 <<= (u32) 2 > 12: (0f) r0 += r1 > 13: (05) goto pc+0 > 14: (7a) *(u64 *)(r0 +0) = 4 > > R0=map_value(id=0,off=0,ks=8,vs=48,umax_value=17179869180,var_off=(0x0; > 0x3fffc)) R1=inv(id=0,umax_value=17179869180,var_off=(0x0; 0x3fffc)) > R2=inv11 R10=fp0 > R0 unbounded memory access, make sure to bounds check any array access into a > map > > I don't yet understand the underlying reason. R0 should have been > marked as LIVE_READ by ST_MEM... > Please help debug it further. > Having added a bunch of debugging, I found out that indeed R0 _had_ been marked as LIVE_READ. The problem was that env->varlen_map_value_access wasn't set, because the access was at a constant offset (imm=0), but then when we compare register states we just say "oh yeah, it's a map_value, we don't need to look at the var_off". This probably results from my unifying PTR_TO_MAP_VALUE with PTR_TO_MAP_VALUE_ADJ; before that the old and new R0 would have different reg->type so wouldn't match. I'm tempted to just rip out env->varlen_map_value_access and always check the whole thing, because honestly I don't know what it was meant to do originally or how it can ever do any useful pruning. While drastic, it does cause your test case to pass. I'm not quite sure why your test passed when you disabled liveness, though; that I can't explain. -Ed ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 8/18/17 7:16 AM, Edward Cree wrote: On 18/08/17 04:21, Alexei Starovoitov wrote: On 8/15/17 12:34 PM, Edward Cree wrote: State of a register doesn't matter if it wasn't read in reaching an exit; a write screens off all reads downstream of it from all explored_states upstream of it. This allows us to prune many more branches; here are some processed insn counts for some Cilium programs: Program before after bpf_lb_opt_-DLB_L3.o 6515 3361 bpf_lb_opt_-DLB_L4.o 8976 5176 bpf_lb_opt_-DUNKNOWN.o 2960 1137 bpf_lxc_opt_-DDROP_ALL.o 95412 48537 bpf_lxc_opt_-DUNKNOWN.o 141706 78718 bpf_netdev.o 24251 17995 bpf_overlay.o 10999 9385 The runtime is also improved; here are 'time' results in ms: Program before after bpf_lb_opt_-DLB_L3.o 24 6 bpf_lb_opt_-DLB_L4.o 26 11 bpf_lb_opt_-DUNKNOWN.o 11 2 bpf_lxc_opt_-DDROP_ALL.o 1288139 bpf_lxc_opt_-DUNKNOWN.o1768234 bpf_netdev.o 62 31 bpf_overlay.o15 13 Signed-off-by: Edward Creethis is one ingenious hack. Love it! I took me whole day to understand most of it, but I still have few questions: + +static void propagate_liveness(const struct bpf_verifier_state *state, + struct bpf_verifier_state *parent) here the name 'parent' is very confusing, since for the first iteration of the loop below it transfers lives from 'neighbor' state to the current state and only then traverses the link of parents in the current. Would be good to document it, since I was struggling the most with this name until I realized that the way you build parent link list in is_state_visited() is actual sequence of roughly basic blocks and the name 'parent' applies there, but not for the first iteration of this function. In the way I think about it, it really is a parent, by which I mean "a block whose output registers are our inputs". When we call propagate_liveness(), we're saying "here's another block whose outputs could be our inputs". So while the 'state->parent' datastructure is a linked list, the 'parentage' relationship is really a DAG. While 'state->parent' always has to point at an explored_state (which are roughly immutable), the 'parent' we pass into propagate_liveness is just env->cur_state which is mutable. The weird "it's not actually state->parent" comes from (a) there's only room for one state->parent and (b) we didn't create a new_sl for it because we're pruning. I agree this is not at all explained in the code or comments, except for the glib "Registers read by the continuation are read by us". I will try to write some comments and/or documentation explaining how and why the liveness tracking works, because it _is_ subtle and a week from now _I_ probably won't understand it either. a bit twisted, but yeah agree with this angle of thinking about it :) @@ -3407,6 +3501,14 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) memcpy(_sl->state, >cur_state, sizeof(env->cur_state)); new_sl->next = env->explored_states[insn_idx]; env->explored_states[insn_idx] = new_sl; +/* connect new state to parentage chain */ +env->cur_state.parent = _sl->state; +/* clear liveness marks in current state */ +for (i = 0; i < BPF_REG_FP; i++) +env->cur_state.regs[i].live = REG_LIVE_NONE; +for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++) +if (env->cur_state.stack_slot_type[i * BPF_REG_SIZE] == STACK_SPILL) +env->cur_state.spilled_regs[i].live = REG_LIVE_NONE; and this part I don't get at all. The idea behind the liveness marks is that 'writes go down and reads go up and up'. That is, each state 'tells' its parent state 'I read from you' (which can then end up recursing), but it remembers for itself that it wrote to a register and therefore should swallow subsequent reads rather than forwarding them to its parent. While a block is being walked, its liveness marks _only_ record writes, and then only writes _made in that block_. The read marks go to the parent, which is "some block that has been walked", but whose continuations haven't all been walked yet so (by looplessness) won't show up as a pruning opportunity. An explored_state's liveness marks record the writes _done in reaching that state_ from its parent, but the reads _done by the state's children_. A cur_state's liveness marks do the same, but it doesn't have any children yet so it never gets read-marks (until it gets turned into an explored_state, of course). We clear our liveness marks because the writes our parent block did are not writes we did, so they don't screen off our reads from our parent; got it. makes sense. would be good to document it. It seems you're trying to sort-of do per-fake-basic block liveness analysis, but our state_list_marks are not correct if we go with canonical
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 18/08/17 04:21, Alexei Starovoitov wrote: > On 8/15/17 12:34 PM, Edward Cree wrote: >> State of a register doesn't matter if it wasn't read in reaching an exit; >> a write screens off all reads downstream of it from all explored_states >> upstream of it. >> This allows us to prune many more branches; here are some processed insn >> counts for some Cilium programs: >> Program before after >> bpf_lb_opt_-DLB_L3.o 6515 3361 >> bpf_lb_opt_-DLB_L4.o 8976 5176 >> bpf_lb_opt_-DUNKNOWN.o 2960 1137 >> bpf_lxc_opt_-DDROP_ALL.o 95412 48537 >> bpf_lxc_opt_-DUNKNOWN.o 141706 78718 >> bpf_netdev.o 24251 17995 >> bpf_overlay.o 10999 9385 >> >> The runtime is also improved; here are 'time' results in ms: >> Program before after >> bpf_lb_opt_-DLB_L3.o 24 6 >> bpf_lb_opt_-DLB_L4.o 26 11 >> bpf_lb_opt_-DUNKNOWN.o 11 2 >> bpf_lxc_opt_-DDROP_ALL.o 1288139 >> bpf_lxc_opt_-DUNKNOWN.o1768234 >> bpf_netdev.o 62 31 >> bpf_overlay.o15 13 >> >> Signed-off-by: Edward Cree> > this is one ingenious hack. Love it! > I took me whole day to understand most of it, but I still have > few questions: > >> + >> +static void propagate_liveness(const struct bpf_verifier_state *state, >> + struct bpf_verifier_state *parent) > > here the name 'parent' is very confusing, since for the first > iteration of the loop below it transfers lives from 'neighbor' > state to the current state and only then traverses the link > of parents in the current. > Would be good to document it, since I was struggling the most > with this name until I realized that the way you build parent link list > in is_state_visited() is actual sequence of roughly basic blocks and > the name 'parent' applies there, but not for the first iteration > of this function. In the way I think about it, it really is a parent, by which I mean "a block whose output registers are our inputs". When we call propagate_liveness(), we're saying "here's another block whose outputs could be our inputs". So while the 'state->parent' datastructure is a linked list, the 'parentage' relationship is really a DAG. While 'state->parent' always has to point at an explored_state (which are roughly immutable), the 'parent' we pass into propagate_liveness is just env->cur_state which is mutable. The weird "it's not actually state->parent" comes from (a) there's only room for one state->parent and (b) we didn't create a new_sl for it because we're pruning. I agree this is not at all explained in the code or comments, except for the glib "Registers read by the continuation are read by us". I will try to write some comments and/or documentation explaining how and why the liveness tracking works, because it _is_ subtle and a week from now _I_ probably won't understand it either. >> @@ -3407,6 +3501,14 @@ static int is_state_visited(struct bpf_verifier_env >> *env, int insn_idx) >> memcpy(_sl->state, >cur_state, sizeof(env->cur_state)); >> new_sl->next = env->explored_states[insn_idx]; >> env->explored_states[insn_idx] = new_sl; >> +/* connect new state to parentage chain */ >> +env->cur_state.parent = _sl->state; >> +/* clear liveness marks in current state */ >> +for (i = 0; i < BPF_REG_FP; i++) >> +env->cur_state.regs[i].live = REG_LIVE_NONE; >> +for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++) >> +if (env->cur_state.stack_slot_type[i * BPF_REG_SIZE] == STACK_SPILL) >> +env->cur_state.spilled_regs[i].live = REG_LIVE_NONE; > > and this part I don't get at all. The idea behind the liveness marks is that 'writes go down and reads go up and up'. That is, each state 'tells' its parent state 'I read from you' (which can then end up recursing), but it remembers for itself that it wrote to a register and therefore should swallow subsequent reads rather than forwarding them to its parent. While a block is being walked, its liveness marks _only_ record writes, and then only writes _made in that block_. The read marks go to the parent, which is "some block that has been walked", but whose continuations haven't all been walked yet so (by looplessness) won't show up as a pruning opportunity. An explored_state's liveness marks record the writes _done in reaching that state_ from its parent, but the reads _done by the state's children_. A cur_state's liveness marks do the same, but it doesn't have any children yet so it never gets read-marks (until it gets turned into an explored_state, of course). We clear our liveness marks because the writes our parent block did are not writes we did, so they don't screen off our reads from our parent; they only screen off our (or our parent's) reads from our grandparent. > It seems you're trying to sort-of do per-fake-basic block liveness > analysis, but our
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
From: Daniel BorkmannDate: Wed, 16 Aug 2017 00:12:58 +0200 > On 08/15/2017 09:34 PM, Edward Cree wrote: >> State of a register doesn't matter if it wasn't read in reaching an >> exit; >> a write screens off all reads downstream of it from all >> explored_states >> upstream of it. >> This allows us to prune many more branches; here are some processed >> insn >> counts for some Cilium programs: >> Program before after >> bpf_lb_opt_-DLB_L3.o 6515 3361 >> bpf_lb_opt_-DLB_L4.o 8976 5176 >> bpf_lb_opt_-DUNKNOWN.o 2960 1137 >> bpf_lxc_opt_-DDROP_ALL.o 95412 48537 >> bpf_lxc_opt_-DUNKNOWN.o 141706 78718 >> bpf_netdev.o 24251 17995 >> bpf_overlay.o 10999 9385 >> >> The runtime is also improved; here are 'time' results in ms: >> Program before after >> bpf_lb_opt_-DLB_L3.o 24 6 >> bpf_lb_opt_-DLB_L4.o 26 11 >> bpf_lb_opt_-DUNKNOWN.o 11 2 >> bpf_lxc_opt_-DDROP_ALL.o 1288139 >> bpf_lxc_opt_-DUNKNOWN.o1768234 >> bpf_netdev.o 62 31 >> bpf_overlay.o15 13 >> >> Signed-off-by: Edward Cree > > Acked-by: Daniel Borkmann Applied, nice work Edward. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 08/15/2017 09:34 PM, Edward Cree wrote: State of a register doesn't matter if it wasn't read in reaching an exit; a write screens off all reads downstream of it from all explored_states upstream of it. This allows us to prune many more branches; here are some processed insn counts for some Cilium programs: Program before after bpf_lb_opt_-DLB_L3.o 6515 3361 bpf_lb_opt_-DLB_L4.o 8976 5176 bpf_lb_opt_-DUNKNOWN.o 2960 1137 bpf_lxc_opt_-DDROP_ALL.o 95412 48537 bpf_lxc_opt_-DUNKNOWN.o 141706 78718 bpf_netdev.o 24251 17995 bpf_overlay.o 10999 9385 The runtime is also improved; here are 'time' results in ms: Program before after bpf_lb_opt_-DLB_L3.o 24 6 bpf_lb_opt_-DLB_L4.o 26 11 bpf_lb_opt_-DUNKNOWN.o 11 2 bpf_lxc_opt_-DDROP_ALL.o 1288139 bpf_lxc_opt_-DUNKNOWN.o1768234 bpf_netdev.o 62 31 bpf_overlay.o15 13 Signed-off-by: Edward CreeAcked-by: Daniel Borkmann ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev