Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Alexei Starovoitov

On 5/19/17 4:16 PM, David Miller wrote:

From: Alexei Starovoitov 
Date: Fri, 19 May 2017 14:37:56 -0700


On 5/19/17 1:41 PM, David Miller wrote:

From: Edward Cree 
Date: Fri, 19 May 2017 18:17:42 +0100


One question: is there a way to build the verifier as userland code
 (or at least as a module), or will I have to reboot every time I
 want to test a change?


There currently is no such machanism, you will have to reboot every
time.

I have considered working on making the code buildable outside of the
kernel.  It shouldn't be too hard.


it's not hard.
We did it twice and both times abandoned.
First time to have 'user space verifier' to check programs before
loading and second time for fuzzing via llvm.
Abandoned since it diverges very quickly from kernel.



Well, my idea was the create an environment in which kernel verifier.c
could be built as-is.

Maybe there would be some small compromises in verifier.c such as an
ifdef test or two, but that should be it.


that's exactly what we did the first time. Added few ifdef to verifier.c
Second time we went even further by compiling kernel/bpf/verifier.c
as-is and linking everything magically via user space hooks
all the way that test_verifier.c runs as-is but calling
bpf_check() function that was compiled for user space via clang.
That code is here:
https://github.com/iovisor/bpf-fuzzer
It's definitely possible to refresh it and make it work again.

My point that unless we put such 'lets build verifier.c for user space'
as part of tools/testing/selftests/ or something, such project is
destined to bit rot.



Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 19 May 2017 14:37:56 -0700

> On 5/19/17 1:41 PM, David Miller wrote:
>> From: Edward Cree 
>> Date: Fri, 19 May 2017 18:17:42 +0100
>>
>>> One question: is there a way to build the verifier as userland code
>>>  (or at least as a module), or will I have to reboot every time I
>>>  want to test a change?
>>
>> There currently is no such machanism, you will have to reboot every
>> time.
>>
>> I have considered working on making the code buildable outside of the
>> kernel.  It shouldn't be too hard.
> 
> it's not hard.
> We did it twice and both times abandoned.
> First time to have 'user space verifier' to check programs before
> loading and second time for fuzzing via llvm.
> Abandoned since it diverges very quickly from kernel.
> 

Well, my idea was the create an environment in which kernel verifier.c
could be built as-is.

Maybe there would be some small compromises in verifier.c such as an
ifdef test or two, but that should be it.

It really is just a piece of what amounts to compiler infrastructure
and not very kernel specific.


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Alexei Starovoitov

On 5/19/17 1:41 PM, David Miller wrote:

From: Edward Cree 
Date: Fri, 19 May 2017 18:17:42 +0100


One question: is there a way to build the verifier as userland code
 (or at least as a module), or will I have to reboot every time I
 want to test a change?


There currently is no such machanism, you will have to reboot every
time.

I have considered working on making the code buildable outside of the
kernel.  It shouldn't be too hard.


it's not hard.
We did it twice and both times abandoned.
First time to have 'user space verifier' to check programs before
loading and second time for fuzzing via llvm.
Abandoned since it diverges very quickly from kernel.




Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread David Miller
From: Edward Cree 
Date: Fri, 19 May 2017 18:17:42 +0100

> One question: is there a way to build the verifier as userland code
>  (or at least as a module), or will I have to reboot every time I
>  want to test a change?

There currently is no such machanism, you will have to reboot every
time.

I have considered working on making the code buildable outside of the
kernel.  It shouldn't be too hard.


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Edward Cree
On 19/05/17 15:55, Alexei Starovoitov wrote:
> On 5/19/17 7:21 AM, Edward Cree wrote:
>> I'm currently translating the algos to C.  But for the kernel patch,
>>  I'll need to read & understand the existing verifier code, so it
>>  might take a while :)  (I don't suppose there's any design document
>>  or hacking-notes you could point me at?)
>
> Dave just gave a good overview of the verifier:
> https://www.spinics.net/lists/xdp-newbies/msg00185.html
>
> Few more details in
> Documentation/networking/filter.txt
> and in top comments in kernel/bpf/verifier.c
>
> General bpf arch overview:
> http://docs.cilium.io/en/latest/bpf/#instruction-set 
Thanks for the links!  I'm digging into implementation now.
One question: is there a way to build the verifier as userland code
 (or at least as a module), or will I have to reboot every time I
 want to test a change?
-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Alexei Starovoitov

On 5/19/17 7:21 AM, Edward Cree wrote:

I'm currently translating the algos to C.  But for the kernel patch,
 I'll need to read & understand the existing verifier code, so it
 might take a while :)  (I don't suppose there's any design document
 or hacking-notes you could point me at?)


Dave just gave a good overview of the verifier:
https://www.spinics.net/lists/xdp-newbies/msg00185.html

Few more details in
Documentation/networking/filter.txt
and in top comments in kernel/bpf/verifier.c

General bpf arch overview:
http://docs.cilium.io/en/latest/bpf/#instruction-set


But I'll give it a go for sure.


Thanks!



Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-19 Thread Edward Cree
On 19/05/17 02:22, Alexei Starovoitov wrote:
> In your .py I'd only change __str__(self) to print them in mask,value
> as the order they're passed into constructor to make it easier to read. 
Actually I was going to go the other way and change the ctor to take
 value,mask.  But I agree they're inconsistent right now.

> this mul algo I don't completely understand. It feels correct,
> but I'm not sure we really need it for the kernel. 
You're probably right; I was just driven by a completionist desire to
 cover everything I could.

> What I love about the whole thing that it works for access into
> packet, access into map values and in the future for any other
> variable length access.
Sure, but don't start thinking it subsumes all the other checks.  We
 will still need e.g. max/min tracking, because packet length isn't
 always a power of 2.

> Are you planning to work on the kernel patch for this algo?
> Once we have it the verifier will be smarter regarding
> alignment tracking than any compiler i know :) 
I'm currently translating the algos to C.  But for the kernel patch,
 I'll need to read & understand the existing verifier code, so it
 might take a while :)  (I don't suppose there's any design document
 or hacking-notes you could point me at?)
But I'll give it a go for sure.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Alexei Starovoitov

On 5/18/17 9:38 AM, Edward Cree wrote:

On 18/05/17 15:49, Edward Cree wrote:

Here's one idea that seemed to work when I did a couple of experiments:
let A = (a;am), B = (b;bm) where the m are the masks
Σ = am + bm + a + b
χ = Σ ^ (a + b) /* unknown carries */
μ = χ | am | bm /* mask of result */
then A + B = ((a + b) & ~μ; μ)

The idea is that we find which bits change between the case "all x are
 1" and "all x are 0", and those become xs too.


> https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d

I played with it quite a bit trying to break it and have to
agree that the above algorithm works.
At least for add and sub I think it's solid.
Still feels a bit magical, since it gave me better results
than I could envision for my test vectors.

In your .py I'd only change __str__(self) to print them in mask,value
as the order they're passed into constructor to make it easier to read.
The bin(self) output is the most useful, of course.
We should carry it into the kernel too for debugging.


And now I've found a similar algorithm for subtraction, which (again) I
 can't prove but it seems to work.
α = a + am - b
β = a - b - bm
χ = α ^ β
μ = χ | α | β
then A - B = ((a - b) & ~μ; μ)
Again we're effectively finding the max. and min. values, and XORing
 them to find unknown carries.

Bitwise operations are easy, of course;
/* By assumption, a & am == b & bm == 0 */
A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
A | B = (a | b; (am | bm) & ~(a | b))
/* It bothers me that & and | aren't symmetric, but I can't fix it */
A ^ B = (a ^ b; am | bm)

as are shifts by a constant (just shift 0s into both number and mask).

Multiplication by a constant can be done by decomposing into shifts
 and adds; but it can also be done directly; here we find (a;am) * k.
π = a * k
γ = am * k
then A * k = (π; 0) + (0; γ), for which we use our addition algo.

Multiplication of two unknown values is a nightmare, as unknown bits
 can propagate all over the place.  We can do a shift-add
 decomposition where the adds for unknown bits have all the 1s in
 the addend replaced with xs.  A few experiments suggest that this
 works, regardless of the order of operands.  For instance
 110x * x01 comes out as either
110x
+ xx0x
= 0x
or
 x0x
   x01
+ x01
= 0x
We can slightly optimise this by handling all the 1 bits in one go;
 that is, for (a;am) * (b;bm) we first find (a;am) * b using our
 multiplication-by-a-constant algo above, then for each bit in bm
 we find (a;am) * bit and force all its nonzero bits to unknown;
 finally we add all our components.


this mul algo I don't completely understand. It feels correct,
but I'm not sure we really need it for the kernel.
For all practical cases llvm will likely emit shifts or sequence
of adds and shifts, so multiplies by crazy non-optimizable constant
or variable are rare and likely the end result is going to be
outside of packet boundary, so it will be rejected anyway and
precise alignment tracking doesn't matter much.
What I love about the whole thing that it works for access into
packet, access into map values and in the future for any other
variable length access.


Don't even ask about division; that scrambles bits so hard that the


yeah screw div and mod. We have an option to disable div/mod altogether
under some new 'prog_flags', since it has this ugly 'div by 0'
exception path. We don't even have 'signed division' instruction and
llvm errors like:
errs() << "Unsupport signed division for DAG: ";
errs() << "Please convert to unsigned div/mod.\n";
and no one complained. It just means that division is extremely rare.

Are you planning to work on the kernel patch for this algo?
Once we have it the verifier will be smarter regarding
alignment tracking than any compiler i know :)



Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree
Implementations (still in Python for now) at
https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d
(I left out division, because it's so weak.)

I still can't prove + and - are correct, but they've passed every test
 case I've come up with so far.  * seems pretty obviously correct as long
 as the + it uses is.  Bitwise ops and shifts are trivial to prove.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree
On 18/05/17 15:49, Edward Cree wrote:
> Here's one idea that seemed to work when I did a couple of experiments:
> let A = (a;am), B = (b;bm) where the m are the masks
> Σ = am + bm + a + b
> χ = Σ ^ (a + b) /* unknown carries */
> μ = χ | am | bm /* mask of result */
> then A + B = ((a + b) & ~μ; μ)
>
> The idea is that we find which bits change between the case "all x are
>  1" and "all x are 0", and those become xs too.
And now I've found a similar algorithm for subtraction, which (again) I
 can't prove but it seems to work.
α = a + am - b
β = a - b - bm
χ = α ^ β
μ = χ | α | β
then A - B = ((a - b) & ~μ; μ)
Again we're effectively finding the max. and min. values, and XORing
 them to find unknown carries.

Bitwise operations are easy, of course;
/* By assumption, a & am == b & bm == 0 */
A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
A | B = (a | b; (am | bm) & ~(a | b))
/* It bothers me that & and | aren't symmetric, but I can't fix it */
A ^ B = (a ^ b; am | bm)

as are shifts by a constant (just shift 0s into both number and mask).

Multiplication by a constant can be done by decomposing into shifts
 and adds; but it can also be done directly; here we find (a;am) * k.
π = a * k
γ = am * k
then A * k = (π; 0) + (0; γ), for which we use our addition algo.

Multiplication of two unknown values is a nightmare, as unknown bits
 can propagate all over the place.  We can do a shift-add
 decomposition where the adds for unknown bits have all the 1s in
 the addend replaced with xs.  A few experiments suggest that this
 works, regardless of the order of operands.  For instance
 110x * x01 comes out as either
110x
+ xx0x
= 0x
or
 x0x
   x01
+ x01
= 0x
We can slightly optimise this by handling all the 1 bits in one go;
 that is, for (a;am) * (b;bm) we first find (a;am) * b using our
 multiplication-by-a-constant algo above, then for each bit in bm
 we find (a;am) * bit and force all its nonzero bits to unknown;
 finally we add all our components.

Don't even ask about division; that scrambles bits so hard that the
 only thing you can say for sure is that the leading 0s in the
 numerator stay 0 in the result.  The only exception is divisions
 by a constant which can be converted into a shift, or divisions
 of a constant by another constant; if the numerator has any xs and
 the denominator has more than one 1, everything to the right of the
 first x is totally unknown in general.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree
On 18/05/17 03:48, Alexei Starovoitov wrote:
> Would it be easier to represent this logic via (mask_of_unknown, value)
> instead of (mask0, mask1) ?
Yes, I like this.
> As far as upper bits we can tweak the algorithm to eat into
> one or more bits of known bits due to carry.
> Like
> 00xx11 + 00xx11 = 0xxx10
> we will eat only one bit (second from left) and the highest bit
> is known to stay zero, since carry can only compromise 2nd from left.
> Such logic should work for sparse representation of unknown bits too
> Like:
> 10xx01xx10 +
> 01xx01xx00 =
> 1xxx10
> both upper two bits would be unknown, but only one middle bit becomes
> unknown.
Yes, that is the behaviour we want.  But it's unclear how to efficiently
 compute it, without just iterating over the bits and computing carry
 possibilities.
Here's one idea that seemed to work when I did a couple of experiments:
let A = (a;am), B = (b;bm) where the m are the masks
Σ = am + bm + a + b
χ = Σ ^ (a + b) /* unknown carries */
μ = χ | am | bm /* mask of result */
then A + B = ((a + b) & ~μ; μ)

The idea is that we find which bits change between the case "all x are
 1" and "all x are 0", and those become xs too.  But I'm not certain
 that that's always going to cover all possible values in between.
It worked on the tests I came up with, and also your example above, but
 I can't quite prove it'll always work.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree
On 18/05/17 01:16, David Miller wrote:
> So, in C, addition (a += b) is something like:
>
> struct bpf_reg_bits {
> u64 zero_bits;
> u64 one_bits;
> };
>
> static void add_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
> {
> u64 m_zeros, m_ones, m_all;
>
> m_zeros = a->zero_bits ^ b->zero_bits;
> m_ones = a->one_bits ^ b->one_bits;
> m_all = m_zeros | m_ones;
No, this should be
u64 m_a, m_b, m_all;

m_a = a->zero_bits ^ a->one_bits; /* unknown bits in a */
m_b = b->zero_bits ^ b->one_bits; /* unknown bits in b */
m_all = m_a | m_b; /* unknown bits in result */
> a->zero_bits = (a->zero_bits + b->zero_bits) | m_all;
> a->one_bits = (a->one_bits + b->zero_bits) & ~m_all;
> }
>
> Then, is subtraction merely:
>
> static void sub_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
> {
> u64 m_zeros, m_ones, m_all;
>
> m_zeros = a->zero_bits ^ b->zero_bits;
> m_ones = a->one_bits ^ b->one_bits;
> m_all = m_zeros | m_ones;
>
> a->zero_bits = (a->zero_bits - b->zero_bits) | m_all;
> a->one_bits = (a->one_bits - b->zero_bits) & ~m_all;
> }
>
> Or is something different needed?
I suspect it's something different, just because I worry about what
 carries will do.

But I think Alexei's idea (mask and value) is better anyway; at the
 least it's easier to think about.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread Alexei Starovoitov

On 5/17/17 8:33 AM, Edward Cree wrote:

On 17/05/17 15:00, Edward Cree wrote:

OTOH the 'track known 1s as well' might work in a nice generic way
 and cover all bases, I'll have to experiment a bit with that.

-Ed


So I did some experiments (in Python, script follows) and found that
 indeed this does appear to work, at least for addition and shifts.
The idea is that we have a 0s mask and a 1s mask; for bits that are
 unknown, the 0s mask is set and the 1s mask is cleared.  So a
 completely unknown variable has masks (~0, 0), then if you shift it
 left 2 you get (~3, 0) - just shift both masks.  A constant x has
 masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
Addition is a bit more complicated: we compute the 'unknown bits'
 mask, by XORing the 0s and 1s masks together, of each addend.  Then
 we add the corresponding masks from each addend together, and force
 the 'unknown' bits to the appropriate values in each mask.
So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
 m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
 b = (b1 + b2) & ~m.
As a worked example, 2 + (x << 2) + 14:
 2 => (2, 2) constant
 x => (~0, 0) unknown
 x << 2 => (~3, 0)
 2 + (x << 2): add (2, 2) with (~3, 0)
m1 = 0, m2 = ~3, m = ~3
a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
b = (2 + 0) & ~~3 = 2 & 3 = 2
so (~1, 2), which means "...xx10"
now add 14: add (~1, 2) with (14, 14)
m1 = ~3, m2 = 0, m = ~3
a = (~1 + 14) | ~3 = 12 | ~3 = ~3
b = (2 + 14) & ~~3 = 16 & 3 = 0
so (~3, 0), which means "...xx00"
and the result is 4-byte aligned.

-Ed

PS. Beware of bugs in the following code; I have only tested it, not
 proved it correct.

--8<--

#!/usr/bin/python2

def cpl(x):
return (~x)&0xff

class AlignedNumber(object):
def __init__(self, mask0=0xff, mask1=0):
"""mask0 has 0s for bits known to be 0, 1s otherwise.
mask1 has 1s for bits known to be 1, 0s otherwise.
Thus a bit which is set in mask0 and cleared in mask1 is an 'unknown'
bit, while a bit which is cleared in mask0 and set in mask1 is a bug.
"""
self.masks = (mask0 & 0xff, mask1 & 0xff)
self.validate()
@classmethod
def const(cls, value):
"""All bits are known, so both masks equal value."""
return cls(value, value)
def validate(self):
# Check for bits 'known' to be both 0 and 1
assert not (cpl(self.masks[0]) & self.masks[1]), self.masks
# Check unknown bits don't follow known bits
assert self.mx | ((self.mx - 1) & 0xff) == 0xff, self.masks
def __str__(self):
return ':'.join(map(bin, self.masks))
def __lshift__(self, sh):
"""Shift both masks left, low bits become 'known 0'"""
return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
def __rshift__(self, sh):
"""Shift 1s into mask0; high bits become 'unknown'.
While strictly speaking they may be known 0 if we're tracking the full
word and doing unsigned shifts, having known bits before unknown bits
breaks the addition code."""
return self.__class__(cpl(cpl(self.masks[0]) >> sh), self.masks[1] >> 
sh)
@property
def mx(self):
"""Mask of unknown bits"""
return self.masks[0] ^ self.masks[1]
def __add__(self, other):
"""OR the mx values together.  Unknown bits could cause carries, so we
just assume that they can carry all the way to the left (thus we keep
our mx masks in the form 1...10...0.
Then, add our 0- and 1-masks, and force the bits of the combined mx
mask to the unknown state."""
if isinstance(other, int):
return self + AlignedNumber.const(other)
assert isinstance(other, AlignedNumber), other
m = self.mx | other.mx
return self.__class__((self.masks[0] + other.masks[0]) | m,
  (self.masks[1] + other.masks[1]) & cpl(m))
def is_aligned(self, bits):
"""We are 2^n-aligned iff the bottom n bits are known-0."""
mask = (1 << bits) - 1
return not (self.masks[0] & mask)

if __name__ == '__main__':
a = AlignedNumber.const(2)
b = AlignedNumber() << 2
c = AlignedNumber.const(14)
print a, b, c
print a + b, a + b + c
assert (a + b + c).is_aligned(2)


that bit was confusing to me.
is_aligned(2) checks that number is 4 byte aligned :)

Would it be easier to represent this logic via (mask_of_unknown, value)
instead of (mask0, mask1) ?
Where mask_of_unknown has 1s for unknown bits and 0 for known bits
in the value. Then arithmetic operations will be easier to visualize.
Like:
(mask1, val1) + (mask2, val2) = (mask1 | mask2, val1 + val2)

Here is your modified script:
#!/usr/bin/python2

class AlignedNumber(object):
def __init__(self, mask=0xff, value=0):
# mask has 1s for unknown bits, 0s for known bits in value
self.masks = (mask & 0xff, value & 0xff)
@classmethod
def const(cls, 

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread Alexei Starovoitov

On 5/17/17 5:16 PM, David Miller wrote:

BTW, we track something similar already in reg->imm which tracks how
many high order bits are known to be cleared in the register.  It is
used to avoid potential overflow for packet pointer accesses.  I bet
we can subsume that into this facility as well.


yeah, reg->imm tracks zero upper bits in very simplistic way.
For alignment checking it seems we need to track lower zeros
and ones. If Edward's algorithm can be adopted to track both
that would be double win indeed.



Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread David Miller
From: Edward Cree 
Date: Wed, 17 May 2017 16:33:27 +0100

> So I did some experiments (in Python, script follows) and found that
>  indeed this does appear to work, at least for addition and shifts.
> The idea is that we have a 0s mask and a 1s mask; for bits that are
>  unknown, the 0s mask is set and the 1s mask is cleared.  So a
>  completely unknown variable has masks (~0, 0), then if you shift it
>  left 2 you get (~3, 0) - just shift both masks.  A constant x has
>  masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
> Addition is a bit more complicated: we compute the 'unknown bits'
>  mask, by XORing the 0s and 1s masks together, of each addend.  Then
>  we add the corresponding masks from each addend together, and force
>  the 'unknown' bits to the appropriate values in each mask.
> So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
>  m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
>  b = (b1 + b2) & ~m.
> As a worked example, 2 + (x << 2) + 14:
>  2 => (2, 2) constant
>  x => (~0, 0) unknown
>  x << 2 => (~3, 0)
>  2 + (x << 2): add (2, 2) with (~3, 0)
> m1 = 0, m2 = ~3, m = ~3
> a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
> b = (2 + 0) & ~~3 = 2 & 3 = 2
> so (~1, 2), which means "...xx10"
> now add 14: add (~1, 2) with (14, 14)
> m1 = ~3, m2 = 0, m = ~3
> a = (~1 + 14) | ~3 = 12 | ~3 = ~3
> b = (2 + 14) & ~~3 = 16 & 3 = 0
> so (~3, 0), which means "...xx00"
> and the result is 4-byte aligned.

Ok, I'm starting to digest this.  If it works it's a nice universal
way to handle alignment tracking, that's for sure.

So, in C, addition (a += b) is something like:

struct bpf_reg_bits {
u64 zero_bits;
u64 one_bits;
};

static void add_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
{
u64 m_zeros, m_ones, m_all;

m_zeros = a->zero_bits ^ b->zero_bits;
m_ones = a->one_bits ^ b->one_bits;
m_all = m_zeros | m_ones;

a->zero_bits = (a->zero_bits + b->zero_bits) | m_all;
a->one_bits = (a->one_bits + b->zero_bits) & ~m_all;
}

Then, is subtraction merely:

static void sub_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
{
u64 m_zeros, m_ones, m_all;

m_zeros = a->zero_bits ^ b->zero_bits;
m_ones = a->one_bits ^ b->one_bits;
m_all = m_zeros | m_ones;

a->zero_bits = (a->zero_bits - b->zero_bits) | m_all;
a->one_bits = (a->one_bits - b->zero_bits) & ~m_all;
}

Or is something different needed?

What about multiplication?  AND should be easy too.

BTW, we track something similar already in reg->imm which tracks how
many high order bits are known to be cleared in the register.  It is
used to avoid potential overflow for packet pointer accesses.  I bet
we can subsume that into this facility as well.

Thanks for all of your suggestions so far.


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread David Miller
From: Edward Cree 
Date: Wed, 17 May 2017 18:00:03 +0100

> Did you see the algorithms I posted with two masks?

I'll take a look.


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread Edward Cree
On 17/05/17 17:13, David Miller wrote:
> Both cases are common in real BPF programs.  The offsets really are
> necessary.  It's funny because initially I tried to implement this
> without the auxiliary offset and it simply doesn't work. :-)
> 
> We always have to track when you've seen the offset that cancels out
> the NET_IP_ALIGN.  And as stated it can occur both before and after
> variable offsets have been introduced.
> 
> You have to catch both:
> 
>   ptr += variable;
>   ptr += 14;
> 
> and:
> 
>   ptr += 14;
>   ptr += variable; /* align = 4 */
> 
> And always see at the end that "NET_IP_ALIGN + offsets" will
> be properly 4 byte aligned.

Did you see the algorithms I posted with two masks?  Effectively they
 are tracking the 'quotient' and 'remainder', but in a was that might
 be easier to manipulate (bit-twiddly etc.) than the aux offset.  I
 think they handle both the above cases, in a nicely general way.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread David Miller
From: Edward Cree 
Date: Wed, 17 May 2017 15:00:04 +0100

> On 16/05/17 23:53, Alexei Starovoitov wrote:
>> following this line of thinking it feels that it should be possible
>> to get rid of 'aux_off' and 'aux_off_align' and simplify the code.
>> I mean we can always do
>> dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);
>>
>> and don't use 'off' as part of alignment checks at all.
> Problem with this approach, of course, is that (say) NET_IP_ALIGN +
>  sizeof(ethhdr) = 16 is muchly aligned, whereas if you turn all
>  constants into alignments you think you're only 2-byte aligned.
> I think you have to track exact offsets when you can, and only turn
>  into an alignment when you introduce a variable.
> Of course it can still be fooled by e.g. 2 + (x << 2) + 14, which it
>  will think is only 2-aligned when really it's 4-aligned, but unless
>  you want to start tracking 'bits known to be 1' as well as 'bits
>  known to be 0', I think you just accept that alignment tracking
>  isn't commutative.  The obvious cases (ihl << 2 and so) will work
>  when written the obvious way, unless the compiler does something
>  perverse.
> OTOH the 'track known 1s as well' might work in a nice generic way
>  and cover all bases, I'll have to experiment a bit with that.

Both cases are common in real BPF programs.  The offsets really are
necessary.  It's funny because initially I tried to implement this
without the auxiliary offset and it simply doesn't work. :-)

We always have to track when you've seen the offset that cancels out
the NET_IP_ALIGN.  And as stated it can occur both before and after
variable offsets have been introduced.

You have to catch both:

ptr += variable;
ptr += 14;

and:

ptr += 14;
ptr += variable; /* align = 4 */

And always see at the end that "NET_IP_ALIGN + offsets" will
be properly 4 byte aligned.


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread Edward Cree
On 17/05/17 15:00, Edward Cree wrote:
> OTOH the 'track known 1s as well' might work in a nice generic way
>  and cover all bases, I'll have to experiment a bit with that.
> 
> -Ed

So I did some experiments (in Python, script follows) and found that
 indeed this does appear to work, at least for addition and shifts.
The idea is that we have a 0s mask and a 1s mask; for bits that are
 unknown, the 0s mask is set and the 1s mask is cleared.  So a
 completely unknown variable has masks (~0, 0), then if you shift it
 left 2 you get (~3, 0) - just shift both masks.  A constant x has
 masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
Addition is a bit more complicated: we compute the 'unknown bits'
 mask, by XORing the 0s and 1s masks together, of each addend.  Then
 we add the corresponding masks from each addend together, and force
 the 'unknown' bits to the appropriate values in each mask.
So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
 m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
 b = (b1 + b2) & ~m.
As a worked example, 2 + (x << 2) + 14:
 2 => (2, 2) constant
 x => (~0, 0) unknown
 x << 2 => (~3, 0)
 2 + (x << 2): add (2, 2) with (~3, 0)
m1 = 0, m2 = ~3, m = ~3
a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
b = (2 + 0) & ~~3 = 2 & 3 = 2
so (~1, 2), which means "...xx10"
now add 14: add (~1, 2) with (14, 14)
m1 = ~3, m2 = 0, m = ~3
a = (~1 + 14) | ~3 = 12 | ~3 = ~3
b = (2 + 14) & ~~3 = 16 & 3 = 0
so (~3, 0), which means "...xx00"
and the result is 4-byte aligned.

-Ed

PS. Beware of bugs in the following code; I have only tested it, not
 proved it correct.

--8<--

#!/usr/bin/python2

def cpl(x):
return (~x)&0xff

class AlignedNumber(object):
def __init__(self, mask0=0xff, mask1=0):
"""mask0 has 0s for bits known to be 0, 1s otherwise.
mask1 has 1s for bits known to be 1, 0s otherwise.
Thus a bit which is set in mask0 and cleared in mask1 is an 'unknown'
bit, while a bit which is cleared in mask0 and set in mask1 is a bug.
"""
self.masks = (mask0 & 0xff, mask1 & 0xff)
self.validate()
@classmethod
def const(cls, value):
"""All bits are known, so both masks equal value."""
return cls(value, value)
def validate(self):
# Check for bits 'known' to be both 0 and 1
assert not (cpl(self.masks[0]) & self.masks[1]), self.masks
# Check unknown bits don't follow known bits
assert self.mx | ((self.mx - 1) & 0xff) == 0xff, self.masks
def __str__(self):
return ':'.join(map(bin, self.masks))
def __lshift__(self, sh):
"""Shift both masks left, low bits become 'known 0'"""
return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
def __rshift__(self, sh):
"""Shift 1s into mask0; high bits become 'unknown'.
While strictly speaking they may be known 0 if we're tracking the full
word and doing unsigned shifts, having known bits before unknown bits
breaks the addition code."""
return self.__class__(cpl(cpl(self.masks[0]) >> sh), self.masks[1] >> 
sh)
@property
def mx(self):
"""Mask of unknown bits"""
return self.masks[0] ^ self.masks[1]
def __add__(self, other):
"""OR the mx values together.  Unknown bits could cause carries, so we
just assume that they can carry all the way to the left (thus we keep
our mx masks in the form 1...10...0.
Then, add our 0- and 1-masks, and force the bits of the combined mx
mask to the unknown state."""
if isinstance(other, int):
return self + AlignedNumber.const(other)
assert isinstance(other, AlignedNumber), other
m = self.mx | other.mx
return self.__class__((self.masks[0] + other.masks[0]) | m,
  (self.masks[1] + other.masks[1]) & cpl(m))
def is_aligned(self, bits):
"""We are 2^n-aligned iff the bottom n bits are known-0."""
mask = (1 << bits) - 1
return not (self.masks[0] & mask)

if __name__ == '__main__':
a = AlignedNumber.const(2)
b = AlignedNumber() << 2
c = AlignedNumber.const(14)
print a, b, c
print a + b, a + b + c
assert (a + b + c).is_aligned(2)
d = (AlignedNumber() << 4) >> 2
print d
assert d.is_aligned(2)


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-17 Thread Edward Cree
On 16/05/17 23:53, Alexei Starovoitov wrote:
> following this line of thinking it feels that it should be possible
> to get rid of 'aux_off' and 'aux_off_align' and simplify the code.
> I mean we can always do
> dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);
>
> and don't use 'off' as part of alignment checks at all.
Problem with this approach, of course, is that (say) NET_IP_ALIGN +
 sizeof(ethhdr) = 16 is muchly aligned, whereas if you turn all
 constants into alignments you think you're only 2-byte aligned.
I think you have to track exact offsets when you can, and only turn
 into an alignment when you introduce a variable.
Of course it can still be fooled by e.g. 2 + (x << 2) + 14, which it
 will think is only 2-aligned when really it's 4-aligned, but unless
 you want to start tracking 'bits known to be 1' as well as 'bits
 known to be 0', I think you just accept that alignment tracking
 isn't commutative.  The obvious cases (ihl << 2 and so) will work
 when written the obvious way, unless the compiler does something
 perverse.
OTOH the 'track known 1s as well' might work in a nice generic way
 and cover all bases, I'll have to experiment a bit with that.

-Ed


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-16 Thread Alexei Starovoitov

On 5/16/17 5:37 AM, Edward Cree wrote:

On 15/05/17 17:04, David Miller wrote:

If we use 1<<31, then sequences like:

R1 = 0
R1 <<= 2

do silly things.

Hmm.  It might be a bit late for this, but I wonder if, instead of handling
 alignments as (1 << align), you could store them as -(1 << align), i.e.
 leading 1s followed by 'align' 0s.
Now the alignment of 0 is 0 (really 1 << 32), which doesn't change when
 left-shifted some more.  Shifts of other numbers' alignments also do the
 right thing, e.g. align(6) << 2 = (-2) << 2 = -8 = align(6 << 2).  Of
 course you do all this in unsigned, to make sure right shifts work.
This also makes other arithmetic simple to track; for instance, align(a + b)
 is at worst align(a) | align(b).  (Of course, this bound isn't tight.)
A number is 2^(n+1)-aligned if the 2^n bit of its alignment is cleared.
Considered as unsigned numbers, smaller values are stricter alignments.


following this line of thinking it feels that it should be possible
to get rid of 'aux_off' and 'aux_off_align' and simplify the code.
I mean we can always do
dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);

and don't use 'off' as part of alignment checks at all.
So this bit:
if ((ip_align + reg_off + off) % size != 0) {
can be removed
and replaced with
a = alignof(ip_align)
a = min(a, reg->align)
if (a % size != 0)
and do this check always and not only after if (reg->id)

In check_packet_ptr_add():
- if (had_id)
-  dst_reg->aux_off_align = min(dst_reg->aux_off_align,
-   src_reg->min_align);
- else
-  dst_reg->aux_off_align = src_reg->min_align;

+ if (had_id)
+  dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);
+ else
+  dst_reg->min_align = src_reg->min_align;

in that sense packet_ptr_add() will be no different than
align logic we do in adjust_reg_min_max_vals()

Thoughts?



Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-16 Thread David Miller
From: Edward Cree 
Date: Tue, 16 May 2017 13:37:42 +0100

> On 15/05/17 17:04, David Miller wrote:
>> If we use 1<<31, then sequences like:
>>
>>  R1 = 0
>>  R1 <<= 2
>>
>> do silly things.
> Hmm.  It might be a bit late for this, but I wonder if, instead of handling
>  alignments as (1 << align), you could store them as -(1 << align), i.e.
>  leading 1s followed by 'align' 0s.
> Now the alignment of 0 is 0 (really 1 << 32), which doesn't change when
>  left-shifted some more.  Shifts of other numbers' alignments also do the
>  right thing, e.g. align(6) << 2 = (-2) << 2 = -8 = align(6 << 2).  Of
>  course you do all this in unsigned, to make sure right shifts work.
> This also makes other arithmetic simple to track; for instance, align(a + b)
>  is at worst align(a) | align(b).  (Of course, this bound isn't tight.)
> A number is 2^(n+1)-aligned if the 2^n bit of its alignment is cleared.
> Considered as unsigned numbers, smaller values are stricter alignments.

Thanks for the bit twiddling suggestion, I'll take a look!


Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-16 Thread Edward Cree
On 15/05/17 17:04, David Miller wrote:
> If we use 1<<31, then sequences like:
>
>   R1 = 0
>   R1 <<= 2
>
> do silly things.
Hmm.  It might be a bit late for this, but I wonder if, instead of handling
 alignments as (1 << align), you could store them as -(1 << align), i.e.
 leading 1s followed by 'align' 0s.
Now the alignment of 0 is 0 (really 1 << 32), which doesn't change when
 left-shifted some more.  Shifts of other numbers' alignments also do the
 right thing, e.g. align(6) << 2 = (-2) << 2 = -8 = align(6 << 2).  Of
 course you do all this in unsigned, to make sure right shifts work.
This also makes other arithmetic simple to track; for instance, align(a + b)
 is at worst align(a) | align(b).  (Of course, this bound isn't tight.)
A number is 2^(n+1)-aligned if the 2^n bit of its alignment is cleared.
Considered as unsigned numbers, smaller values are stricter alignments.

-Ed


[PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-15 Thread David Miller

If we use 1<<31, then sequences like:

R1 = 0
R1 <<= 2

do silly things.  Examples of this actually exist in
the MAP tests of test_verifier.c

Update test_align.c expectation strings.

Signed-off-by: David S. Miller 
---
 kernel/bpf/verifier.c|  2 +-
 tools/testing/selftests/bpf/test_align.c | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 39f2dcb..0f33db0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1715,7 +1715,7 @@ static void check_reg_overflow(struct bpf_reg_state *reg)
 static u32 calc_align(u32 imm)
 {
if (!imm)
-   return 1U << 31;
+   return 1U << 16;
return imm - ((imm - 1) & imm);
 }
 
diff --git a/tools/testing/selftests/bpf/test_align.c 
b/tools/testing/selftests/bpf/test_align.c
index 9644d4e..afaa297 100644
--- a/tools/testing/selftests/bpf/test_align.c
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -235,12 +235,12 @@ static struct bpf_align_test tests[] = {
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.matches = {
-   "4: 
R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx 
R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=0,r=0) R10=fp",
-   "5: 
R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx 
R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=14,r=0) R10=fp",
-   "6: 
R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx 
R2=pkt(id=0,off=0,r=0) R3=pkt_end R4=pkt(id=0,off=14,r=0) 
R5=pkt(id=0,off=14,r=0) R10=fp",
-   "10: 
R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx 
R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv56 R5=pkt(id=0,off=14,r=18) R10=fp",
-   "14: 
R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx 
R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
-   "15: 
R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx 
R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
+   "4: R0=imm0,min_value=0,max_value=0,min_align=65536 
R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=0,r=0) R10=fp",
+   "5: R0=imm0,min_value=0,max_value=0,min_align=65536 
R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=14,r=0) R10=fp",
+   "6: R0=imm0,min_value=0,max_value=0,min_align=65536 
R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R4=pkt(id=0,off=14,r=0) 
R5=pkt(id=0,off=14,r=0) R10=fp",
+   "10: R0=imm0,min_value=0,max_value=0,min_align=65536 
R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv56 R5=pkt(id=0,off=14,r=18) 
R10=fp",
+   "14: R0=imm0,min_value=0,max_value=0,min_align=65536 
R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) 
R10=fp",
+   "15: R0=imm0,min_value=0,max_value=0,min_align=65536 
R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) 
R10=fp",
},
},
{
-- 
2.1.2.532.g19b5d50