Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Alexei Starovoitov via iovisor-dev

On 8/21/17 2:00 PM, Daniel Borkmann wrote:

On 08/21/2017 10:44 PM, Edward Cree wrote:

On 21/08/17 21:27, Daniel Borkmann wrote:

On 08/21/2017 08:36 PM, Edward Cree wrote:

On 19/08/17 00:37, Alexei Starovoitov wrote:

[...]

I'm tempted to just rip out env->varlen_map_value_access and always
check
   the whole thing, because honestly I don't know what it was meant
to do
   originally or how it can ever do any useful pruning.  While
drastic, it
   does cause your test case to pass.


Original intention from 484611357c19 ("bpf: allow access into map
value arrays") was that it wouldn't potentially make pruning worse
if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need
to take reg state's min_value and max_value into account for state
checking; this was basically due to min_value / max_value is being
adjusted/tracked on every alu/jmp ops for involved regs (e.g.
adjust_reg_min_max_vals() and others that mangle them) even if we
have the case that no actual dynamic map access is used throughout
the program. To give an example on net tree, the bpf_lxc.o prog's
section increases from 36,386 to 68,226 when
env->varlen_map_value_access
is always true, so it does have an effect. Did you do some checks
on this on net-next?

I tested with the cilium progs and saw no change in insn count.  I
  suspect that for the normal case I already killed this optimisation
  when I did my unification patch, it was previously about ignoring
  min/max values on all regs (including scalars), whereas on net-next
  it only ignores them on map_value pointers; in practice this is
  useless because we tend to still have the offset scalar sitting in
  a register somewhere.  (Come to think of it, this may have been
  behind a large chunk of the #insn increase that my patches caused.)


Yeah, this would seem plausible.


Since we use umax_value in find_good_pkt_pointers() now (to check
  against MAX_PACKET_OFF and ensure our reg->range is really ok), we
  can't just stop caring about all min/max values just because we
  haven't done any variable map accesses.
I don't see a way around this.


Agree, was thinking the same. If there's not really a regression in
terms of complexity, then lets kill the flag.


+1

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2489e67b65f6..908d13b2a2aa 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3582,7 +3582,7 @@ static int do_check(struct bpf_verifier_env *env)
init_reg_state(regs);
state->parent = NULL;
insn_idx = 0;
-   env->varlen_map_value_access = false;
+   env->varlen_map_value_access = true;

makes _zero_ difference on cilium*.o tests, so let's just kill
that workaround.

___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Daniel Borkmann via iovisor-dev

On 08/21/2017 10:44 PM, Edward Cree wrote:

On 21/08/17 21:27, Daniel Borkmann wrote:

On 08/21/2017 08:36 PM, Edward Cree wrote:

On 19/08/17 00:37, Alexei Starovoitov wrote:

[...]

I'm tempted to just rip out env->varlen_map_value_access and always check
   the whole thing, because honestly I don't know what it was meant to do
   originally or how it can ever do any useful pruning.  While drastic, it
   does cause your test case to pass.


Original intention from 484611357c19 ("bpf: allow access into map
value arrays") was that it wouldn't potentially make pruning worse
if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need
to take reg state's min_value and max_value into account for state
checking; this was basically due to min_value / max_value is being
adjusted/tracked on every alu/jmp ops for involved regs (e.g.
adjust_reg_min_max_vals() and others that mangle them) even if we
have the case that no actual dynamic map access is used throughout
the program. To give an example on net tree, the bpf_lxc.o prog's
section increases from 36,386 to 68,226 when env->varlen_map_value_access
is always true, so it does have an effect. Did you do some checks
on this on net-next?

I tested with the cilium progs and saw no change in insn count.  I
  suspect that for the normal case I already killed this optimisation
  when I did my unification patch, it was previously about ignoring
  min/max values on all regs (including scalars), whereas on net-next
  it only ignores them on map_value pointers; in practice this is
  useless because we tend to still have the offset scalar sitting in
  a register somewhere.  (Come to think of it, this may have been
  behind a large chunk of the #insn increase that my patches caused.)


Yeah, this would seem plausible.


Since we use umax_value in find_good_pkt_pointers() now (to check
  against MAX_PACKET_OFF and ensure our reg->range is really ok), we
  can't just stop caring about all min/max values just because we
  haven't done any variable map accesses.
I don't see a way around this.


Agree, was thinking the same. If there's not really a regression in
terms of complexity, then lets kill the flag.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Edward Cree via iovisor-dev
On 18/08/17 15:16, Edward Cree wrote:
> On 18/08/17 04:21, Alexei Starovoitov wrote:
>> It seems you're trying to sort-of do per-fake-basic block liveness
>> analysis, but our state_list_marks are not correct if we go with
>> canonical basic block definition, since we mark the jump insn and
>> not insn after the branch and not every basic block boundary is
>> properly detected.
> I think the reason this works is that jump insns can't do writes.
> [snip]
> the sl->state will never have any write marks and it'll all just work.
> But I should really test that!
I tested this, and found that, no, sl->state can have write marks, and the
 algorithm will get the wrong answer in that case.  So I've got a patch to
 make the first iteration ignore write marks, as part of a series which I
 will post shortly.  When I do so, please re-do your tests with adding
 state_list_marks in strange and exciting places; it should work wherever
 you put them.  Like you say, it "magically doesn't depend on proper basic
 block boundaries", and that's because really pruning is just a kind of
 checkpointing that just happens to be most effective when done just after
 a jump (pop_stack).

Can I have a SOB for your "grr" test program, so I can include it in the
 series?

-Ed
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-21 Thread Edward Cree via iovisor-dev
On 19/08/17 00:37, Alexei Starovoitov wrote:
> that '14: safe' above is not correct.
>
> Disabling liveness as:
> @@ -3282,7 +3288,7 @@ static bool regsafe(struct bpf_reg_state *rold,
> struct bpf_reg_state *rcur,
> bool varlen_map_access, struct idpair *idmap)
>  {
> -   if (!(rold->live & REG_LIVE_READ))
> +   if (0 && !(rold->live & REG_LIVE_READ))
>
> makes the test work properly and proper verifier output is:
> from 9 to 11: R0=map_value(id=0,off=0,ks=8,vs=48,imm=0) 
> R1=inv(id=0,smax_value=10) R2=inv11 R10=fp0
> 11: (64) (u32) r1 <<= (u32) 2
> 12: (0f) r0 += r1
> 13: (05) goto pc+0
> 14: (7a) *(u64 *)(r0 +0) = 4
>
> R0=map_value(id=0,off=0,ks=8,vs=48,umax_value=17179869180,var_off=(0x0; 
> 0x3fffc)) R1=inv(id=0,umax_value=17179869180,var_off=(0x0; 0x3fffc)) 
> R2=inv11 R10=fp0
> R0 unbounded memory access, make sure to bounds check any array access into a 
> map
>
> I don't yet understand the underlying reason. R0 should have been
> marked as LIVE_READ by ST_MEM...
> Please help debug it further.
>
Having added a bunch of debugging, I found out that indeed R0 _had_ been
 marked as LIVE_READ.  The problem was that env->varlen_map_value_access
 wasn't set, because the access was at a constant offset (imm=0), but then
 when we compare register states we just say "oh yeah, it's a map_value,
 we don't need to look at the var_off".
This probably results from my unifying PTR_TO_MAP_VALUE with
 PTR_TO_MAP_VALUE_ADJ; before that the old and new R0 would have different
 reg->type so wouldn't match.
I'm tempted to just rip out env->varlen_map_value_access and always check
 the whole thing, because honestly I don't know what it was meant to do
 originally or how it can ever do any useful pruning.  While drastic, it
 does cause your test case to pass.

I'm not quite sure why your test passed when you disabled liveness, though;
 that I can't explain.

-Ed
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-18 Thread Alexei Starovoitov via iovisor-dev

On 8/18/17 7:16 AM, Edward Cree wrote:

On 18/08/17 04:21, Alexei Starovoitov wrote:

On 8/15/17 12:34 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
 a write screens off all reads downstream of it from all explored_states
 upstream of it.
This allows us to prune many more branches; here are some processed insn
 counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288139
bpf_lxc_opt_-DUNKNOWN.o1768234
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 


this is one ingenious hack. Love it!
I took me whole day to understand most of it, but I still have
few questions:


+
+static void propagate_liveness(const struct bpf_verifier_state *state,
+   struct bpf_verifier_state *parent)


here the name 'parent' is very confusing, since for the first
iteration of the loop below it transfers lives from 'neighbor'
state to the current state and only then traverses the link
of parents in the current.
Would be good to document it, since I was struggling the most
with this name until I realized that the way you build parent link list
in is_state_visited() is actual sequence of roughly basic blocks and
the name 'parent' applies there, but not for the first iteration
of this function.

In the way I think about it, it really is a parent, by which I mean "a
 block whose output registers are our inputs".  When we call
 propagate_liveness(), we're saying "here's another block whose outputs
 could be our inputs".  So while the 'state->parent' datastructure is a
 linked list, the 'parentage' relationship is really a DAG.
While 'state->parent' always has to point at an explored_state (which are
 roughly immutable), the 'parent' we pass into propagate_liveness is just
 env->cur_state which is mutable.  The weird "it's not actually
 state->parent" comes from (a) there's only room for one state->parent and
 (b) we didn't create a new_sl for it because we're pruning.
I agree this is not at all explained in the code or comments, except for
 the glib "Registers read by the continuation are read by us".  I will try
 to write some comments and/or documentation explaining how and why the
 liveness tracking works, because it _is_ subtle and a week from now _I_
 probably won't understand it either.


a bit twisted, but yeah agree with this angle of thinking about it :)




@@ -3407,6 +3501,14 @@ static int is_state_visited(struct bpf_verifier_env 
*env, int insn_idx)
 memcpy(_sl->state, >cur_state, sizeof(env->cur_state));
 new_sl->next = env->explored_states[insn_idx];
 env->explored_states[insn_idx] = new_sl;
+/* connect new state to parentage chain */
+env->cur_state.parent = _sl->state;
+/* clear liveness marks in current state */
+for (i = 0; i < BPF_REG_FP; i++)
+env->cur_state.regs[i].live = REG_LIVE_NONE;
+for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++)
+if (env->cur_state.stack_slot_type[i * BPF_REG_SIZE] == STACK_SPILL)
+env->cur_state.spilled_regs[i].live = REG_LIVE_NONE;


and this part I don't get at all.

The idea behind the liveness marks is that 'writes go down and reads go up
 and up'.  That is, each state 'tells' its parent state 'I read from you'
 (which can then end up recursing), but it remembers for itself that it
 wrote to a register and therefore should swallow subsequent reads rather
 than forwarding them to its parent.
While a block is being walked, its liveness marks _only_ record writes, and
 then only writes _made in that block_.  The read marks go to the parent,
 which is "some block that has been walked", but whose continuations haven't
 all been walked yet so (by looplessness) won't show up as a pruning
 opportunity.  An explored_state's liveness marks record the writes _done in
 reaching that state_ from its parent, but the reads _done by the state's
 children_.  A cur_state's liveness marks do the same, but it doesn't have
 any children yet so it never gets read-marks (until it gets turned into an
 explored_state, of course).
We clear our liveness marks because the writes our parent block did are not
 writes we did, so they don't screen off our reads from our parent;


got it. makes sense. would be good to document it.


It seems you're trying to sort-of do per-fake-basic block liveness
analysis, but our state_list_marks are not correct if we go with
canonical 

Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-18 Thread Edward Cree via iovisor-dev
On 18/08/17 04:21, Alexei Starovoitov wrote:
> On 8/15/17 12:34 PM, Edward Cree wrote:
>> State of a register doesn't matter if it wasn't read in reaching an exit;
>>  a write screens off all reads downstream of it from all explored_states
>>  upstream of it.
>> This allows us to prune many more branches; here are some processed insn
>>  counts for some Cilium programs:
>> Program  before  after
>> bpf_lb_opt_-DLB_L3.o   6515   3361
>> bpf_lb_opt_-DLB_L4.o   8976   5176
>> bpf_lb_opt_-DUNKNOWN.o 2960   1137
>> bpf_lxc_opt_-DDROP_ALL.o  95412  48537
>> bpf_lxc_opt_-DUNKNOWN.o  141706  78718
>> bpf_netdev.o  24251  17995
>> bpf_overlay.o 10999   9385
>>
>> The runtime is also improved; here are 'time' results in ms:
>> Program  before  after
>> bpf_lb_opt_-DLB_L3.o 24  6
>> bpf_lb_opt_-DLB_L4.o 26 11
>> bpf_lb_opt_-DUNKNOWN.o   11  2
>> bpf_lxc_opt_-DDROP_ALL.o   1288139
>> bpf_lxc_opt_-DUNKNOWN.o1768234
>> bpf_netdev.o 62 31
>> bpf_overlay.o15 13
>>
>> Signed-off-by: Edward Cree 
>
> this is one ingenious hack. Love it!
> I took me whole day to understand most of it, but I still have
> few questions:
>
>> +
>> +static void propagate_liveness(const struct bpf_verifier_state *state,
>> +   struct bpf_verifier_state *parent)
>
> here the name 'parent' is very confusing, since for the first
> iteration of the loop below it transfers lives from 'neighbor'
> state to the current state and only then traverses the link
> of parents in the current.
> Would be good to document it, since I was struggling the most
> with this name until I realized that the way you build parent link list
> in is_state_visited() is actual sequence of roughly basic blocks and
> the name 'parent' applies there, but not for the first iteration
> of this function.
In the way I think about it, it really is a parent, by which I mean "a
 block whose output registers are our inputs".  When we call
 propagate_liveness(), we're saying "here's another block whose outputs
 could be our inputs".  So while the 'state->parent' datastructure is a
 linked list, the 'parentage' relationship is really a DAG.
While 'state->parent' always has to point at an explored_state (which are
 roughly immutable), the 'parent' we pass into propagate_liveness is just
 env->cur_state which is mutable.  The weird "it's not actually
 state->parent" comes from (a) there's only room for one state->parent and
 (b) we didn't create a new_sl for it because we're pruning.
I agree this is not at all explained in the code or comments, except for
 the glib "Registers read by the continuation are read by us".  I will try
 to write some comments and/or documentation explaining how and why the
 liveness tracking works, because it _is_ subtle and a week from now _I_
 probably won't understand it either.

>> @@ -3407,6 +3501,14 @@ static int is_state_visited(struct bpf_verifier_env 
>> *env, int insn_idx)
>>  memcpy(_sl->state, >cur_state, sizeof(env->cur_state));
>>  new_sl->next = env->explored_states[insn_idx];
>>  env->explored_states[insn_idx] = new_sl;
>> +/* connect new state to parentage chain */
>> +env->cur_state.parent = _sl->state;
>> +/* clear liveness marks in current state */
>> +for (i = 0; i < BPF_REG_FP; i++)
>> +env->cur_state.regs[i].live = REG_LIVE_NONE;
>> +for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++)
>> +if (env->cur_state.stack_slot_type[i * BPF_REG_SIZE] == STACK_SPILL)
>> +env->cur_state.spilled_regs[i].live = REG_LIVE_NONE;
>
> and this part I don't get at all.
The idea behind the liveness marks is that 'writes go down and reads go up
 and up'.  That is, each state 'tells' its parent state 'I read from you'
 (which can then end up recursing), but it remembers for itself that it
 wrote to a register and therefore should swallow subsequent reads rather
 than forwarding them to its parent.
While a block is being walked, its liveness marks _only_ record writes, and
 then only writes _made in that block_.  The read marks go to the parent,
 which is "some block that has been walked", but whose continuations haven't
 all been walked yet so (by looplessness) won't show up as a pruning
 opportunity.  An explored_state's liveness marks record the writes _done in
 reaching that state_ from its parent, but the reads _done by the state's
 children_.  A cur_state's liveness marks do the same, but it doesn't have
 any children yet so it never gets read-marks (until it gets turned into an
 explored_state, of course).
We clear our liveness marks because the writes our parent block did are not
 writes we did, so they don't screen off our reads from our parent; they
 only screen off our (or our parent's) reads from our grandparent.
> It seems you're trying to sort-of do per-fake-basic block liveness
> analysis, but our 

Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread David Miller via iovisor-dev
From: Daniel Borkmann 
Date: Wed, 16 Aug 2017 00:12:58 +0200

> On 08/15/2017 09:34 PM, Edward Cree wrote:
>> State of a register doesn't matter if it wasn't read in reaching an
>> exit;
>>   a write screens off all reads downstream of it from all
>>   explored_states
>>   upstream of it.
>> This allows us to prune many more branches; here are some processed
>> insn
>>   counts for some Cilium programs:
>> Program  before  after
>> bpf_lb_opt_-DLB_L3.o   6515   3361
>> bpf_lb_opt_-DLB_L4.o   8976   5176
>> bpf_lb_opt_-DUNKNOWN.o 2960   1137
>> bpf_lxc_opt_-DDROP_ALL.o  95412  48537
>> bpf_lxc_opt_-DUNKNOWN.o  141706  78718
>> bpf_netdev.o  24251  17995
>> bpf_overlay.o 10999   9385
>>
>> The runtime is also improved; here are 'time' results in ms:
>> Program  before  after
>> bpf_lb_opt_-DLB_L3.o 24  6
>> bpf_lb_opt_-DLB_L4.o 26 11
>> bpf_lb_opt_-DUNKNOWN.o   11  2
>> bpf_lxc_opt_-DDROP_ALL.o   1288139
>> bpf_lxc_opt_-DUNKNOWN.o1768234
>> bpf_netdev.o 62 31
>> bpf_overlay.o15 13
>>
>> Signed-off-by: Edward Cree 
> 
> Acked-by: Daniel Borkmann 

Applied, nice work Edward.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH v3 net-next] bpf/verifier: track liveness for pruning

2017-08-15 Thread Daniel Borkmann via iovisor-dev

On 08/15/2017 09:34 PM, Edward Cree wrote:

State of a register doesn't matter if it wasn't read in reaching an exit;
  a write screens off all reads downstream of it from all explored_states
  upstream of it.
This allows us to prune many more branches; here are some processed insn
  counts for some Cilium programs:
Program  before  after
bpf_lb_opt_-DLB_L3.o   6515   3361
bpf_lb_opt_-DLB_L4.o   8976   5176
bpf_lb_opt_-DUNKNOWN.o 2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o  24251  17995
bpf_overlay.o 10999   9385

The runtime is also improved; here are 'time' results in ms:
Program  before  after
bpf_lb_opt_-DLB_L3.o 24  6
bpf_lb_opt_-DLB_L4.o 26 11
bpf_lb_opt_-DUNKNOWN.o   11  2
bpf_lxc_opt_-DDROP_ALL.o   1288139
bpf_lxc_opt_-DUNKNOWN.o1768234
bpf_netdev.o 62 31
bpf_overlay.o15 13

Signed-off-by: Edward Cree 


Acked-by: Daniel Borkmann 
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev