Re: Alignment in BPF verifier

2017-05-25 Thread David Miller
From: Daniel Borkmann 
Date: Tue, 23 May 2017 23:27:20 +0200

> On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:
>> On 5/23/17 7:41 AM, Edward Cree wrote:
>>> Hmm, that means that we can't do arithmetic on a
>>>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>>>  first by NULL-checking it.  That's probably fine, but I can just about
>>>  imagine some compiler optimisation reordering them.  Any reason not to
>>>  split this out into a different reg->field, rather than overloading
>>>  id?
>>
>> 'id' is sort of like 'version' of a pointer and has the same meaning
>> in
>> both cases. How exactly do you see this split?
> 
> Also, same id is never reused once generated and later propagated
> through regs. So far we haven't run into this kind of optimization
> from llvm side yet, but others which led to requiring the id marker
> (see 57a09bf0a416). I could imagine it might be needed at some point,
> though where we later transition directly to PTR_TO_MAP_VALUE_ADJ
> after NULL check. Out of curiosity, did you run into it with llvm?

We could handle this issue in find_good_pkt_pointers(), nothing prevents
us from advancing state there for cases like Edward notes above.


Re: Alignment in BPF verifier

2017-05-24 Thread Alexei Starovoitov

On 5/24/17 6:46 AM, Edward Cree wrote:

On 23/05/17 22:27, Daniel Borkmann wrote:

On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:

On 5/23/17 7:41 AM, Edward Cree wrote:

Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?


'id' is sort of like 'version' of a pointer and has the same meaning in
both cases. How exactly do you see this split?

I was thinking there would be reg->id and reg->map_id.  Both could share the
 env->id_gen, since that's not likely to run out, but they'd be separate
 fields so that a PTR_TO_MAP_VALUE_OR_NULL could say "this is either map_value
 plus a 4-byte-aligned offset less than 24, or NULL plus that same offset",
 and then if another pointer with the same map_id and no variable-offset part
 was NULL-checked, we could convert both pointers to PTR_TO_MAP_VALUE.  (I'm
 getting rid of PTR_TO_MAP_VALUE_ADJ in my patch, along with several other
 types, by taking the 'we have an offset' part out of the bpf_reg_type.)


got it. makes sense.


So far we haven't run into this kind of optimization
from llvm side yet[...] Out of curiosity, did you run into it with llvm?

No, purely theoretical.  I haven't even built/installed llvm yet, I'm just
 working with the bytecode in test_verifier.c for now.  I'm merely trying to
 not have restrictions that are unnecessary; but since allowing this kind of
 construct would take a non-zero amount of work, I'll file it for later.


modern fedora/ubuntu come with llvm that has bpf backend by default.



Re: Alignment in BPF verifier

2017-05-24 Thread Edward Cree
On 23/05/17 22:27, Daniel Borkmann wrote:
> On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:
>> On 5/23/17 7:41 AM, Edward Cree wrote:
>>> Hmm, that means that we can't do arithmetic on a
>>>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>>>  first by NULL-checking it.  That's probably fine, but I can just about
>>>  imagine some compiler optimisation reordering them.  Any reason not to
>>>  split this out into a different reg->field, rather than overloading id?
>>
>> 'id' is sort of like 'version' of a pointer and has the same meaning in
>> both cases. How exactly do you see this split?
I was thinking there would be reg->id and reg->map_id.  Both could share the
 env->id_gen, since that's not likely to run out, but they'd be separate
 fields so that a PTR_TO_MAP_VALUE_OR_NULL could say "this is either map_value
 plus a 4-byte-aligned offset less than 24, or NULL plus that same offset",
 and then if another pointer with the same map_id and no variable-offset part
 was NULL-checked, we could convert both pointers to PTR_TO_MAP_VALUE.  (I'm
 getting rid of PTR_TO_MAP_VALUE_ADJ in my patch, along with several other
 types, by taking the 'we have an offset' part out of the bpf_reg_type.)
> So far we haven't run into this kind of optimization
> from llvm side yet[...] Out of curiosity, did you run into it with llvm?
No, purely theoretical.  I haven't even built/installed llvm yet, I'm just
 working with the bytecode in test_verifier.c for now.  I'm merely trying to
 not have restrictions that are unnecessary; but since allowing this kind of
 construct would take a non-zero amount of work, I'll file it for later.

-Ed


Re: Alignment in BPF verifier

2017-05-23 Thread Alexei Starovoitov

On 5/23/17 10:43 AM, Edward Cree wrote:

Another issue: it looks like the min/max_value handling for subtraction is
 bogus.  In adjust_reg_min_max_vals() we have
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
dst_reg->min_value -= min_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
dst_reg->max_value -= max_val;
 where min_val and max_val refer to the src_reg.
But surely they should be used the other way round; if (say) 2 <= R1 <= 6
 and 1 <= R2 <= 4, then this will claim 1 <= (R1 - R2) <= 2, whereas really
 (R1 - R2) could be anything from -2 to 5.
This also means that the code just above the switch,
if (min_val == BPF_REGISTER_MIN_RANGE)
dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
if (max_val == BPF_REGISTER_MAX_RANGE)
dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 is wrong, since e.g. subtracting MAX_RANGE needs to blow our min_value,
 not our max_value.


right. good catch. I have a feeling we discussed similar thing before.
May be some patch felt through the cracks.
That's the reason the fancy verifier analysis is root only.
I'm assuming you're going to send a fix?
Thanks!


Re: Alignment in BPF verifier

2017-05-23 Thread Daniel Borkmann

On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:

On 5/23/17 7:41 AM, Edward Cree wrote:

I'm still plugging away at this... it's going to be quite a big patch and
 rewrite a lot of stuff (and I'm not sure I'll be able to break it into
 smaller bisectable patches).
And of course I have more questions.  In check_packet_ptr_add(), we
 forbid adding a negative constant to a packet ptr.  Is there some
 principled reason for that, or is it just because the bounds checking is
 hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
 to avoid overflow etc.), then the subtraction should be legal.  Indeed,
 even if the reg->off (fixed part of offset) is zero, if the variable part
 is known (min_value) to be >= -imm, the subtraction should be safe.


adding negative imm to pkt_ptr is ok, but what is the use case?
Do you see llvm generating such code?


Btw, currently, you kind of have it in a limited way via BPF_STX_MEM()
and BPF_LDX_MEM() when accessing pkt data through the offset, which
can be negative on the insn level.


I think if we try to track everything with the current shape of
state pruning, the verifier will stop accepting old programs
because it reaches complexity limit.

I think we need to rearchitect the whole thing.
I was thinking of doing it compiler-style. Convert to ssa and
do traditional data flow analysis, use-def chains, register liveness
then pruning heuristics won't be necessary and verifier should be
able to check everything in more or less single pass.
Things like register liveness can be done without ssa. It can
be used to augment existing pruning, since it will know which
registers are dead, so they don't have to be compared, but it
feels half-way. I'd rather go all the way.


On 20/05/17 00:05, Daniel Borkmann wrote:

Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
track all registers (incl. spilled ones) with the same reg->id
that originated from the same map lookup. After the reg type is
then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
for map in map) or UNKNOWN_VALUE depending on the branch, the
reg->id is then reset to 0 again. Whole reason for this is that
LLVM generates code where it can move and/or spill a reg of type
PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
test on it, and later on it expects that the spilled or moved
regs work wrt access. So they're marked with an id and then all
of them are type migrated. So here meaning of reg->id is different
than in PTR_TO_PACKET case.

Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?


'id' is sort of like 'version' of a pointer and has the same meaning in
both cases. How exactly do you see this split?


Also, same id is never reused once generated and later propagated
through regs. So far we haven't run into this kind of optimization
from llvm side yet, but others which led to requiring the id marker
(see 57a09bf0a416). I could imagine it might be needed at some point,
though where we later transition directly to PTR_TO_MAP_VALUE_ADJ
after NULL check. Out of curiosity, did you run into it with llvm?


Of course that would need (more) caution wrt. states_equal(), but it
 looks like I'll be mangling that a lot anyway - for instance, we don't
 want to just use memcmp() to compare alignments, we want to check that
 our alignment is stricter than the old alignment.  (Of course memcmp()
 is a conservative check, so the "memcmp() the whole reg_state" fast
 path can remain.)


yes. that would be good improvement. Not sure how much it will help
the pruning though.


Re: Alignment in BPF verifier

2017-05-23 Thread Alexei Starovoitov

On 5/23/17 7:41 AM, Edward Cree wrote:

I'm still plugging away at this... it's going to be quite a big patch and
 rewrite a lot of stuff (and I'm not sure I'll be able to break it into
 smaller bisectable patches).
And of course I have more questions.  In check_packet_ptr_add(), we
 forbid adding a negative constant to a packet ptr.  Is there some
 principled reason for that, or is it just because the bounds checking is
 hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
 to avoid overflow etc.), then the subtraction should be legal.  Indeed,
 even if the reg->off (fixed part of offset) is zero, if the variable part
 is known (min_value) to be >= -imm, the subtraction should be safe.


adding negative imm to pkt_ptr is ok, but what is the use case?
Do you see llvm generating such code?
I think if we try to track everything with the current shape of
state pruning, the verifier will stop accepting old programs
because it reaches complexity limit.

I think we need to rearchitect the whole thing.
I was thinking of doing it compiler-style. Convert to ssa and
do traditional data flow analysis, use-def chains, register liveness
then pruning heuristics won't be necessary and verifier should be
able to check everything in more or less single pass.
Things like register liveness can be done without ssa. It can
be used to augment existing pruning, since it will know which
registers are dead, so they don't have to be compared, but it
feels half-way. I'd rather go all the way.


On 20/05/17 00:05, Daniel Borkmann wrote:

Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
track all registers (incl. spilled ones) with the same reg->id
that originated from the same map lookup. After the reg type is
then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
for map in map) or UNKNOWN_VALUE depending on the branch, the
reg->id is then reset to 0 again. Whole reason for this is that
LLVM generates code where it can move and/or spill a reg of type
PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
test on it, and later on it expects that the spilled or moved
regs work wrt access. So they're marked with an id and then all
of them are type migrated. So here meaning of reg->id is different
than in PTR_TO_PACKET case.

Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?


'id' is sort of like 'version' of a pointer and has the same meaning in
both cases. How exactly do you see this split?


Of course that would need (more) caution wrt. states_equal(), but it
 looks like I'll be mangling that a lot anyway - for instance, we don't
 want to just use memcmp() to compare alignments, we want to check that
 our alignment is stricter than the old alignment.  (Of course memcmp()
 is a conservative check, so the "memcmp() the whole reg_state" fast
 path can remain.)


yes. that would be good improvement. Not sure how much it will help
the pruning though.



Re: Alignment in BPF verifier

2017-05-23 Thread Edward Cree
Another issue: it looks like the min/max_value handling for subtraction is
 bogus.  In adjust_reg_min_max_vals() we have
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
dst_reg->min_value -= min_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
dst_reg->max_value -= max_val;
 where min_val and max_val refer to the src_reg.
But surely they should be used the other way round; if (say) 2 <= R1 <= 6
 and 1 <= R2 <= 4, then this will claim 1 <= (R1 - R2) <= 2, whereas really
 (R1 - R2) could be anything from -2 to 5.
This also means that the code just above the switch,
if (min_val == BPF_REGISTER_MIN_RANGE)
dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
if (max_val == BPF_REGISTER_MAX_RANGE)
dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 is wrong, since e.g. subtracting MAX_RANGE needs to blow our min_value,
 not our max_value.

-Ed


Re: Alignment in BPF verifier

2017-05-23 Thread Edward Cree
I'm still plugging away at this... it's going to be quite a big patch and
 rewrite a lot of stuff (and I'm not sure I'll be able to break it into
 smaller bisectable patches).
And of course I have more questions.  In check_packet_ptr_add(), we
 forbid adding a negative constant to a packet ptr.  Is there some
 principled reason for that, or is it just because the bounds checking is
 hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
 to avoid overflow etc.), then the subtraction should be legal.  Indeed,
 even if the reg->off (fixed part of offset) is zero, if the variable part
 is known (min_value) to be >= -imm, the subtraction should be safe.

On 20/05/17 00:05, Daniel Borkmann wrote:
> Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
> track all registers (incl. spilled ones) with the same reg->id
> that originated from the same map lookup. After the reg type is
> then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
> for map in map) or UNKNOWN_VALUE depending on the branch, the
> reg->id is then reset to 0 again. Whole reason for this is that
> LLVM generates code where it can move and/or spill a reg of type
> PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
> test on it, and later on it expects that the spilled or moved
> regs work wrt access. So they're marked with an id and then all
> of them are type migrated. So here meaning of reg->id is different
> than in PTR_TO_PACKET case.
Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?
Of course that would need (more) caution wrt. states_equal(), but it
 looks like I'll be mangling that a lot anyway - for instance, we don't
 want to just use memcmp() to compare alignments, we want to check that
 our alignment is stricter than the old alignment.  (Of course memcmp()
 is a conservative check, so the "memcmp() the whole reg_state" fast
 path can remain.)

-Ed


Re: Alignment in BPF verifier

2017-05-19 Thread Daniel Borkmann

On 05/19/2017 10:39 PM, David Miller wrote:

From: Edward Cree 
Date: Fri, 19 May 2017 21:00:13 +0100


Well, I've managed to get somewhat confused by reg->id.
In particular, I'm unsure which bpf_reg_types can have an id, and what
  exactly it means.  There seems to be some code that checks around map value
  pointers, which seems strange as maps have fixed sizes (and the comments in
  enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this


Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
track all registers (incl. spilled ones) with the same reg->id
that originated from the same map lookup. After the reg type is
then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
for map in map) or UNKNOWN_VALUE depending on the branch, the
reg->id is then reset to 0 again. Whole reason for this is that
LLVM generates code where it can move and/or spill a reg of type
PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
test on it, and later on it expects that the spilled or moved
regs work wrt access. So they're marked with an id and then all
of them are type migrated. So here meaning of reg->id is different
than in PTR_TO_PACKET case. Example:

0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c0
6: (85) call 1 //map lookup
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
 R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0


  maybe because of map-of-maps support, can the contained maps have differing
  element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
  was appropriately bounds-checked?

Does the 'id' identify the variable that was added to an object pointer, or
  the object itself?  Or does it blur these and identify (what the comment in
  enum bpf_reg_type calls) "skb->data + (u16) var"?


The reg->id value changes any time a variable gets added to a packet
pointer.

You will also notice right now that only packet pointers have their
alignment tracked.

I have changes pending that will do that for MAP pointers too, but
it needs more work.





Re: Alignment in BPF verifier

2017-05-19 Thread David Miller
From: Edward Cree 
Date: Fri, 19 May 2017 21:00:13 +0100

> Here's what I'm thinking of doing:
> struct bpf_reg_state {
> enum bpf_reg_type type;
> union {
> /* valid when type == PTR_TO_PACKET */
> u16 range;
> 
> /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
>  *   PTR_TO_MAP_VALUE_OR_NULL
>  */
> struct bpf_map *map_ptr;
> };
> /* Used to find other pointers with the same variable base, so they
>  * can share range and align knowledge.
>  */
> u32 id;
> u32 off; /* fixed part of pointer offset */
> /* For scalar types (CONST_IMM | UNKNOWN_VALUE), this represents our
>  * knowledge of the actual value.
>  * For pointer types, this represents the variable part of the offset
>  * from the pointed-to object, and is shared with all bpf_reg_states
>  * with the same id as us.
>  */
> struct tnum align;
> /* Used to determine if any memory access using this register will
>  * result in a bad access. These two fields must be last.
>  * See states_equal()
>  * These refer to the same value as align, not necessarily the actual
>  * contents of the register.
>  */
> s64 min_value;
> u64 max_value;
> };

Be very careful with the layout of bpf_reg_state.

There are layout dependencies in the state pruning.  Please take a look
at states_equal().  It is walking the set of registers at two snapshot
locations and trying to see if they are "equivalent".

What's happening here is that the verifier makes a stack of all branch
points in the program.  On the first pass it analyzes the register
values taking one of the two paths a branch takes.  Then when it hits
the end of the program, on that path, to BPF_EXIT it starts popping
the entries on the stack.

The naive implementation would pop each stack entry, and then traverse
the other arm of the branch.  But for programs with lots of branches
this gets very expensive.

So at each stack pop, the verifier tries to determine if it can skip
traversing the branch's other path.  And it does this by analyzing
register state.

The tests are basically:

if (memcmp(rold, rcur, sizeof(*rold)) == 0)
continue;

exact equivalent, then we're fine.

/* If the ranges were not the same, but everything else was and
 * we didn't do a variable access into a map then we are a-ok.
 */
if (!varlen_map_access &&
memcmp(rold, rcur, offsetofend(struct bpf_reg_state, id)) 
== 0)
continue;

We didn't do any variable MAP accesses, and everything in the register
"up to and including member ID" is the same, we're fine.

And then we drop down into some packet pointer specific tests to try
and optimize things further.

So you have to be careful what you place before and/or after 'id'.

Probably we need to put the alignment stuff before 'id' so that it
is considered by the offsetofend() length memcmp().

Hope that helps.


Re: Alignment in BPF verifier

2017-05-19 Thread David Miller
From: Edward Cree 
Date: Fri, 19 May 2017 21:00:13 +0100

> Well, I've managed to get somewhat confused by reg->id.
> In particular, I'm unsure which bpf_reg_types can have an id, and what
>  exactly it means.  There seems to be some code that checks around map value
>  pointers, which seems strange as maps have fixed sizes (and the comments in
>  enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this
>  maybe because of map-of-maps support, can the contained maps have differing
>  element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
>  was appropriately bounds-checked?
> 
> Does the 'id' identify the variable that was added to an object pointer, or
>  the object itself?  Or does it blur these and identify (what the comment in
>  enum bpf_reg_type calls) "skb->data + (u16) var"?

The reg->id value changes any time a variable gets added to a packet
pointer.

You will also notice right now that only packet pointers have their
alignment tracked.

I have changes pending that will do that for MAP pointers too, but
it needs more work.