Re: BPF memstore and bpf_validate_ext()

2014-01-01 Thread Mindaugas Rasiukevicius
Alexander Nasonov al...@yandex.ru wrote:
 Mindaugas Rasiukevicius wrote:
  Moreover, the usual byte-code produced by tcpdump/pcap does not
  even use the memory store so you optimisations would most of the
  time be applicable anyway!
 
 This is not always the case. For instance,
 
 # tcpdump -y IEEE802_11 -i urtwn0 -d not tcp
 tcpdump: data link type IEEE802_11
 ...

I am aware of these cases, but I optimise for 80%, not 20%.

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-29 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Moreover, the usual byte-code produced by tcpdump/pcap does not
 even use the memory store so you optimisations would most of the
 time be applicable anyway!

This is not always the case. For instance,

# tcpdump -y IEEE802_11 -i urtwn0 -d not tcp
tcpdump: data link type IEEE802_11
(000) ldx  #0x0
(001) txa
(002) add  #24
(003) st   M[0]
(004) ldb  [x + 0]
(005) jset #0x8 jt 6jf 11
(006) jset #0x4 jt 11   jf 7
(007) jset #0x80jt 8jf 11
(008) ld   M[0]
(009) add  #2
(010) st   M[0]
(011) ldb  [0]
(012) jset #0x4 jt 27   jf 13
(013) ldb  [0]
(014) jset #0x8 jt 15   jf 27
(015) ldx  M[0]
(016) ldh  [x + 6]
(017) jeq  #0x86dd  jt 18   jf 27
(018) ldx  M[0]
(019) ldb  [x + 14]
(020) jeq  #0x6 jt 37   jf 21
(021) ldx  M[0]
(022) ldb  [x + 14]
(023) jeq  #0x2cjt 24   jf 27
(024) ldx  M[0]
(025) ldb  [x + 48]
(026) jeq  #0x6 jt 37   jf 27
(027) ldb  [0]
(028) jset #0x4 jt 38   jf 29
(029) ldb  [0]
(030) jset #0x8 jt 31   jf 38
(031) ldx  M[0]
(032) ldh  [x + 6]
(033) jeq  #0x800   jt 34   jf 38
(034) ldx  M[0]
(035) ldb  [x + 17]
(036) jeq  #0x6 jt 37   jf 38
(037) ret  #0
(038) ret  #65535

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-20 Thread Alexander Nasonov
Sorry for top-posting. I'm replying from my phone.

I've not looked at linux bpf before. I remember taking a quick look at 
bpf_jit_compile function but I didn't like emitting binary machine code with 
macro commands.

I spent few minutes today looking at linux code and I noticed few interesting 
things:

- They use negative offsets to access auxiliary data. So, there is a clear 
distinction between local memory store and external data. I don't think it's a 
new addition, though.
- They have a big enum of commands. Many of them translate to bpf commands but 
there are also special commands like load protocol number into A. There is a 
decoder from bpf but I have no clue how it works.
- Those commands are adapted to work with skbuf data.

Alex


20.12.13, 04:16, David Laight da...@l8s.co.uk:
 
 On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote:
  Alexander Nasonov al...@yandex.ru wrote:
   
   Well, if it wasn't needed for many year in bpf, why do we need it now? ;-)
   
  
  Because it was decided to use BPF byte-code for more applications and that
  meant there is a need for improvements.  It is called evolution. :)
 
 Has anyone here looked closely at the changes linux is making to bpf?
 
   David
 
 -- 
 David Laight: da...@l8s.co.uk


-- 
Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-20 Thread Justin Cormack
On Fri, Dec 20, 2013 at 1:13 PM, Alexander Nasonov al...@yandex.ru wrote:
 Sorry for top-posting. I'm replying from my phone.

 I've not looked at linux bpf before. I remember taking a quick look at 
 bpf_jit_compile function but I didn't like emitting binary machine code with 
 macro commands.

 I spent few minutes today looking at linux code and I noticed few interesting 
 things:

 - They use negative offsets to access auxiliary data. So, there is a clear 
 distinction between local memory store and external data. I don't think it's 
 a new addition, though.
 - They have a big enum of commands. Many of them translate to bpf commands 
 but there are also special commands like load protocol number into A. There 
 is a decoder from bpf but I have no clue how it works.
 - Those commands are adapted to work with skbuf data.

I have used some of the non traditional uses of bpf in Linux, in
particularly the syscall filtering code, which is designed to be a bit
like the packet filtering code. But I don't think it is a great model,
and I think the jit compiler is rather different, as it compiles to
asm. And the current route they are going seems to be validation
rather than in kernel jitting, see https://lwn.net/Articles/575531/

Justin


 Alex


 20.12.13, 04:16, David Laight da...@l8s.co.uk:

 On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote:
  Alexander Nasonov al...@yandex.ru wrote:
  
   Well, if it wasn't needed for many year in bpf, why do we need it now? 
   ;-)
  
 
  Because it was decided to use BPF byte-code for more applications and that
  meant there is a need for improvements.  It is called evolution. :)

 Has anyone here looked closely at the changes linux is making to bpf?

   David

 --
 David Laight: da...@l8s.co.uk


 --
 Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-19 Thread Mindaugas Rasiukevicius
Alexander Nasonov al...@yandex.ru wrote:
  The external memory store can be used as an argument (and the initial
  values determined as proposed in this thread).  If you want, we can pass
  it as a third argument, I just think it is a pointless indirection
  level. Would even need extra wrapping i.e. more work in bpfjit, but if
  you want that separated - fine.
 
 I'll tell you more. Mixing mem with arguments will not only save
 me one register/one indirection level but it will allow to use
 memory addressing. It's very attractive solution from a performance
 perspective. But it's clearly a hack and I don't want to have hacks
 in public API.

We have discussed this before and I wrote quite long paragraph trying to
explain why I think such split does not really have real merits.  In a
nutshell: such separation does not solve any problem (does not reduce
complexity, does not help to honour the information hiding principle)
and the logical justification to have bpf_stack_t is a bit weak.

I am fine/neutral about adding a third uint32_t *mem argument.

  That is great, but we are going circles here.  If a program just needs
  some values stored in the memory store - why would you create a COP to
  get few integers instead of simply letting the caller to pass them.  It
  is the thing which actually has those numbers.
 
 What if your program just needs something else? Does it mean we have
 to add a new feature to bpf? No. We have to stop somewhere. ...

We should not stop because of the quantity.  We should consider whether it
*makes sense* to add such feature, looking at its virtues and implications.

I have already explained the benefits of the external memory store in this
thread (simple, straightforward way to pass values and have their cache).
Your main argument seems to be that the external memstore makes it more
difficult for bpfjit to optimise certain corner cases (which, as been
pointed out, are not common).

Given that BPF did not have JIT compiler for many years and even a very
simplistic JIT compilation is a huge win - I do not understand why do you
consider corner case optimisations as a higher value than the benefits
provided by the external memstore.

 BPF_COP
 is powerful enough for getting external data. It's a bit slower but I
 can give you SLJIT_FAST_CALL interface. If it's still slow for you,
 you should stop using sljit and consider other alternatives.

That is wonderful, but not the point..

   It's easy to
   suggest to have a flag but it's actually a lot or work. You need to
   write several lines of C code to generate a single instruction.
  
  The flag would basically say treat the memstore as internal i.e. just
  do all the optimisations, because I assure there are no side effects.
  It is a green light for what you said you already want to implement.
  That would be one-liner check or I miss something?
 
 Ok, with my recent change on github, it's indeed one-line change
 but having two different modes can introduce subtle bugs. If you
 forget to set the mode properly, you may have a situation when your
 memory is in a bad state only for packets of length 131 (or whatever)
 and it's consistent for all other lengths.
 I'm as a maintainer of bpfjit is responsible for a robust public
 interface.

Just keep the flag off by default?  Really, we have only few BPF users
in the NetBSD tree.  It is not rocket science to correctly set a flag.

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-19 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
   That is great, but we are going circles here.  If a program just needs
   some values stored in the memory store - why would you create a COP to
   get few integers instead of simply letting the caller to pass them.  It
   is the thing which actually has those numbers.

Well, if it wasn't needed for many year in bpf, why do we need it now? ;-)

To answer your question about a program that needs some values stored,
you want a protocol to pass some values from the host C environment to
embedded BPF environment. You do it in a very ad-hoc manner. It's
not possible to say by looking at bpf program which memwords are passed
from outside and which are really internal.

BPF is a language and all its extensions should be designed with
this in mind.

 I have already explained the benefits of the external memory store in this
 thread (simple, straightforward way to pass values and have their cache).
 Your main argument seems to be that the external memstore makes it more
 difficult for bpfjit to optimise certain corner cases (which, as been
 pointed out, are not common).

Well, I was arguing about performance because you were concerned about
performance of your bpf programs.

Your solution lacks a concept. You're mixing things together without
bothering much about how will they interact to achieve a common
goal. I already pointed out that your COP is powerful enough to
copy external data to a memory store. To add to my other point
above, you're mixing local and global memory. Or, if you use an
analogy with Lua (that uses a byte code to interpret program),
there are local and global variables with a very clear distinction
between them. With your proposal, there is no clear distinction
inside bpf program which memory words are internal and which are
external.

Though, I don't think I want to see global memory in bpf.

 Given that BPF did not have JIT compiler for many years and even a very
 simplistic JIT compilation is a huge win - I do not understand why do you
 consider corner case optimisations as a higher value than the benefits
 provided by the external memstore.

I personally see no benefits in external memstore at all. You can copy
external data with a special copfunc.

  BPF_COP
  is powerful enough for getting external data. It's a bit slower but I
  can give you SLJIT_FAST_CALL interface. If it's still slow for you,
  you should stop using sljit and consider other alternatives.
 
 That is wonderful, but not the point..

Ok, let me make a point them. If you want to make significant changes to
bpf, you'd better start from scrach and design whatever you want. I'd be
easier than arguing about bpf changes forever.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-19 Thread Mindaugas Rasiukevicius
Alexander Nasonov al...@yandex.ru wrote:
 
 Well, if it wasn't needed for many year in bpf, why do we need it now? ;-)
 

Because it was decided to use BPF byte-code for more applications and that
meant there is a need for improvements.  It is called evolution. :)

 To answer your question about a program that needs some values stored,
 you want a protocol to pass some values from the host C environment to
 embedded BPF environment. You do it in a very ad-hoc manner.

We are talking about byte-code i.e. quite low level environment.  The
protocol (contract) is specified at the caller level.  Can you explain
what kind of generic protocol do you expect to see?  Let me put it this
way - what *problem* are you trying to solve?

 It's
 not possible to say by looking at bpf program which memwords are passed
 from outside and which are really internal.

How is it different when using BPF coprocessor?  It is as clear (or not)
as using the external memory store.  It is a contract between the caller
and the BPF program.

 BPF is a language and all its extensions should be designed with
 this in mind.
 
  I have already explained the benefits of the external memory store in
  this thread (simple, straightforward way to pass values and have their
  cache). Your main argument seems to be that the external memstore makes
  it more difficult for bpfjit to optimise certain corner cases (which,
  as been pointed out, are not common).
 
 Well, I was arguing about performance because you were concerned about
 performance of your bpf programs.
 
 Your solution lacks a concept. You're mixing things together without
 bothering much about how will they interact to achieve a common
 goal. I already pointed out that your COP is powerful enough to
 copy external data to a memory store. To add to my other point
 above, you're mixing local and global memory. Or, if you use an
 analogy with Lua (that uses a byte code to interpret program),
 there are local and global variables with a very clear distinction
 between them. With your proposal, there is no clear distinction
 inside bpf program which memory words are internal and which are
 external.

Huh?  What does it have to do with local and global variables?  We barely
have a concept of variable.  We are discussing very simple byte-code which
is not even Turing-complete.  It is not Lua, it is not Haskell, it is tiny
assembly language.  I am all for well-thought abstractions and interfacing,
but how far do you want to go in abstracting 16 words?

I have been trying to figure out what existing or future practical problems
you are trying to solve?  Not metaphysical use cases or fantasy languages
based on BPF byte-code.. they do not exist, they make no sense.  Sometimes
array is just an array, not an object.  The sword is double-edged and the
other edge is called over-engineering.

It is disappointing that we are going ad infinitum..

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-19 Thread David Laight
On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote:
 Alexander Nasonov al...@yandex.ru wrote:
  
  Well, if it wasn't needed for many year in bpf, why do we need it now? ;-)
  
 
 Because it was decided to use BPF byte-code for more applications and that
 meant there is a need for improvements.  It is called evolution. :)

Has anyone here looked closely at the changes linux is making to bpf?

David

-- 
David Laight: da...@l8s.co.uk


Re: BPF memstore and bpf_validate_ext()

2013-12-18 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Alexander Nasonov al...@yandex.ru wrote:
  My point is that you mix argument pack with something else. They
  should be separeted.
 
 The external memory store can be used as an argument (and the initial
 values determined as proposed in this thread).  If you want, we can pass
 it as a third argument, I just think it is a pointless indirection level.
 Would even need extra wrapping i.e. more work in bpfjit, but if you want
 that separated - fine.

I'll tell you more. Mixing mem with arguments will not only save
me one register/one indirection level but it will allow to use
memory addressing. It's very attractive solution from a performance
perspective. But it's clearly a hack and I don't want to have hacks
in public API.

  I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit.
  They're much faster than regular copfuncs.  But that's mean you
  will need to emit sljit code and you will have a limited number of
  sljit registers and all other limitations of sljit. You still should
  be able to copy data from auxiliary argument to memstore and you
  can do it quite fast. ...
 
 That is great, but we are going circles here.  If a program just needs
 some values stored in the memory store - why would you create a COP to
 get few integers instead of simply letting the caller to pass them.  It
 is the thing which actually has those numbers.

What if your program just needs something else? Does it mean we have
to add a new feature to bpf? No. We have to stop somewhere. BPF_COP
is powerful enough for getting external data. It's a bit slower but I
can give you SLJIT_FAST_CALL interface. If it's still slow for you,
you should stop using sljit and consider other alternatives.

  It's easy to
  suggest to have a flag but it's actually a lot or work. You need to
  write several lines of C code to generate a single instruction.
 
 The flag would basically say treat the memstore as internal i.e. just
 do all the optimisations, because I assure there are no side effects.
 It is a green light for what you said you already want to implement.
 That would be one-liner check or I miss something?

Ok, with my recent change on github, it's indeed one-line change
but having two different modes can introduce subtle bugs. If you
forget to set the mode properly, you may have a situation when your
memory is in a bad state only for packets of length 131 (or whatever)
and it's consistent for all other lengths.
I'm as a maintainer of bpfjit is responsible for a robust public
interface.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-16 Thread Darren Reed
On 16/12/2013 3:20 AM, Alexander Nasonov wrote:

 ..
 This discussion is going to nowhere. At this point, someone from
 outside should review our opinions and help us to make a decision.

It is one hack after another to support NPF using tables in BPF.

Darren



Re: BPF memstore and bpf_validate_ext()

2013-12-16 Thread Mindaugas Rasiukevicius
Alexander Nasonov al...@yandex.ru wrote:
 Mindaugas Rasiukevicius wrote:
  Also, it was you who proposed sljit.
 
 Proposed for what? I implemented bpfjit using sljit if that's what
 you mean. I offered you a help with implementing jit compiler for
 npfcode. It was your idea to add COP/COPX and I agreed to implement
 a support for it in bpfjit. I never agreed on implementing external
 memory.

Fair enough.  I still lack clear understanding why are you unhappy with
the external memory store..

  It can optimise *most* practical
  cases (80-20 rule) and I am happy with that.  I do not understand why
  are you concerned about those rare/unusual cases.  Do you have some
  particular application in mind?  Something else than in our tree?
 
 I don't have any application in mind but I don't understand why are you
 pushing two extentions to bpf solely to get performance benefit for
 your cases and you don't care that bpf looses performance even if there
 are no cop instructions in a program at all.

I *do* care about performance of regular BPF usage i.e. tcpdump/libpcap.
However, I have been trying to explain that it is trivial to avoid the
performance penalty.. perhaps I miss something, but you did not explain
why is it problematic.

  We can pass the memstore pointer as a separate argument (it would be
  three arguments, fine for sljit), but what's the point..
 
 My point is that you mix argument pack with something else. They
 should be separeted.

The external memory store can be used as an argument (and the initial
values determined as proposed in this thread).  If you want, we can pass
it as a third argument, I just think it is a pointless indirection level.
Would even need extra wrapping i.e. more work in bpfjit, but if you want
that separated - fine.

  Why are you ignoring the fact that your optimisations can still be added
  and be effective?  I already suggested - we can add a flag to indicate
  that the caller does not care about the result in the memory store.
 
 I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit.
 They're much faster than regular copfuncs.  But that's mean you
 will need to emit sljit code and you will have a limited number of
 sljit registers and all other limitations of sljit. You still should
 be able to copy data from auxiliary argument to memstore and you
 can do it quite fast. ...

That is great, but we are going circles here.  If a program just needs
some values stored in the memory store - why would you create a COP to
get few integers instead of simply letting the caller to pass them.  It
is the thing which actually has those numbers.

It straightforward, it is simpler, it is faster, and it does not make
the performance suffer for other programs.  BPF_COP optimisations are
brilliant, but.. we are talking about different issue.

 It's easy to
 suggest to have a flag but it's actually a lot or work. You need to
 write several lines of C code to generate a single instruction.

The flag would basically say treat the memstore as internal i.e. just
do all the optimisations, because I assure there are no side effects.
It is a green light for what you said you already want to implement.
That would be one-liner check or I miss something?

 I don't want to maintain two different modes of code generation.
 If you want this flag, go ahead, write the code, write the tests
 and everyone will be happy.

I was simply trying to make suggestions which would make your life
easier, certainly not harder.  Alternatively, I can write patches.

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-16 Thread Mindaugas Rasiukevicius
Darren Reed darr...@netbsd.org wrote:
 On 16/12/2013 3:20 AM, Alexander Nasonov wrote:
 
  ..
  This discussion is going to nowhere. At this point, someone from
  outside should review our opinions and help us to make a decision.
 
 It is one hack after another to support NPF using tables in BPF.
 

Nice try, but you missed it.  NPF tables are already supported and have
been happily working for a good while.  The discussion is not really about
them.  Try again.

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-15 Thread Mindaugas Rasiukevicius
Alexander Nasonov al...@yandex.ru wrote:
 All your recent changes to adapt bpfjit for npf show that you're hitting
 sljit limitation all the time. Your expectations of sljit are probably
 too high.

We are discussing the external memory store which is useful regardless
whether BPF program is JIT-compiled or not.  It is a win in both cases.

Also, it was you who proposed sljit.  It can optimise *most* practical
cases (80-20 rule) and I am happy with that.  I do not understand why
are you concerned about those rare/unusual cases.  Do you have some
particular application in mind?  Something else than in our tree?

 ...
 
 When I later was implementing copfunc calls in bpfjit, I din't know
 how you gonna use external memory and I felt that it wasn't necessary
 (and I still do), so I changed your structs.

We can pass the memstore pointer as a separate argument (it would be
three arguments, fine for sljit), but what's the point..

 ...
 
 By not making memory external I avoided doing a lot of changes in
 bpfjit to adapt to a new execution model.
 
 Now you want to have *both* external memory and copfuncs for a faster
 execution of your code without considering other uses of bpfjit.
 
 To be honest, I'm not very intersted in implementing external memory
 in bpfjit. Quite the opposite, after all these discussions I want
 to improve my index check optimization and make filters like 'host xxx'
 run a bit faster. Needless to say that this optimization is the most
 effective if bpf memory isn't visible from outside.

Why are you ignoring the fact that your optimisations can still be added
and be effective?  I already suggested - we can add a flag to indicate that
the caller does not care about the result in the memory store.  Moreover,
the usual byte-code produced by tcpdump/pcap does not even use the memory
store so you optimisations would most of the time be applicable anyway!

 This discussion is going to nowhere. At this point, someone from
 outside should review our opinions and help us to make a decision.
 
 Alex

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-15 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Also, it was you who proposed sljit.

Proposed for what? I implemented bpfjit using sljit if that's what
you mean. I offered you a help with implementing jit compiler for
npfcode. It was your idea to add COP/COPX and I agreed to implement
a support for it in bpfjit. I never agreed on implementing external
memory.

 It can optimise *most* practical
 cases (80-20 rule) and I am happy with that.  I do not understand why
 are you concerned about those rare/unusual cases.  Do you have some
 particular application in mind?  Something else than in our tree?

I don't have any application in mind but I don't understand why are you
pushing two extentions to bpf solely to get performance benefit for
your cases and you don't care that bpf looses performance even if there
are no cop instructions in a program at all.

 We can pass the memstore pointer as a separate argument (it would be
 three arguments, fine for sljit), but what's the point..

My point is that you mix argument pack with something else. They
should be separeted.

 Why are you ignoring the fact that your optimisations can still be added
 and be effective?  I already suggested - we can add a flag to indicate that
 the caller does not care about the result in the memory store.

I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit.
They're much faster than regular copfuncs.  But that's mean you
will need to emit sljit code and you will have a limited number of
sljit registers and all other limitations of sljit. You still should
be able to copy data from auxiliary argument to memstore and you
can do it quite fast.

You didn't respond to me about it. If you ingored it because you don't
want to deal with sljit than you're pulling a blanket. It's easy to
suggest to have a flag but it's actually a lot or work. You need to
write several lines of C code to generate a single instruction.

I don't want to maintain two different modes of code generation.
If you want this flag, go ahead, write the code, write the tests
and everyone will be happy.


 Moreover,
 the usual byte-code produced by tcpdump/pcap does not even use the memory
 store so you optimisations would most of the time be applicable anyway!

Maybe in this case. I don't know all use cases. There are some
IDS/IPS that use bpf but I never looked at them.

In any case, this functionality will have to be tested.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-11 Thread Mindaugas Rasiukevicius
Alexander Nasonov al...@yandex.ru wrote:
   
   If this is the case, you don't need BPF_COP at all.
  
  For this particular case - yes, correct.
  
   Except of course you may run out or memwords one day and you want
   to have BPF_COP as a fallback.
  
  I still need BPF_COP for other tasks (e.g. NPF_COP_TABLE).  Running out
  of words is unlikely, but COP can certainly be used to handle that too.
 
 I wonder why do you need two different features when one is a
 superset of the other if you could use BPF_COP?

The fact that you can abuse one feature to achieve the result of another
does not mean that it is a good solution.  Using the external memory store
and initialising/passing some values is just simple and straightforward,
besides being faster.

 ... If the only reason
 is performance, external memory is not a win-win proposition.

You mean because using EREG penalises the access of the memstore?  I doubt
it is a big deal, but can we just benchmark and see?

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-10 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Alexander Nasonov al...@yandex.ru wrote:
  In your case BPF_COP is always the first instruction (or in the
  first linear block). Because it's the first and it's a function
  call, it can be moved outside of bpf program and inlined.
  
  If this is the case, you don't need BPF_COP at all.
 
 For this particular case - yes, correct.
 
  Except of course you may run out or memwords one day and you want
  to have BPF_COP as a fallback.
 
 I still need BPF_COP for other tasks (e.g. NPF_COP_TABLE).  Running out
 of words is unlikely, but COP can certainly be used to handle that too.

I wonder why do you need two different features when one is a
superset of the other if you could use BPF_COP? If the only reason
is performance, external memory is not a win-win proposition.

PS sljit has a non-standard fast call mechanism. If you're really
concerned about performance, you can generate copfuncs and I can
call them from bpfjit.

Alex


BPF memstore and bpf_validate_ext()

2013-12-09 Thread Mindaugas Rasiukevicius
Hello,

Now that the BPF memory store can be external and can be provided by the
caller of the BPF program, we can use to it pass some values or reuse it
as a cache.  However, this has to be supported by bpf_validate() - we have
to tell it which words are going to be initialised by the caller, since it
currently checks and prevents loads of the uninitialised memstore words.

Hence, here is the proposed API function for this:

int bpf_set_initwords(bpf_ctx_t *bc, uint8_t *wordmask, size_t len);

Note that we already have an extended bpf_validate() variation:

int bpf_validate_ext(bpf_ctx_t *bc, const struct bpf_insn *f, int len);

The original filter/validate API stays as is, only the extended routines
support the external memory store and its initialisation.

Thanks.

-- 
Mindaugas


Re: BPF memstore and bpf_validate_ext()

2013-12-09 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Now that the BPF memory store can be external and can be provided by the
 caller of the BPF program, we can use to it pass some values or reuse it
 as a cache.

I don't think that external memory is needed. You added an auxiliary
agrument, what is it for if it's not for passing some values?

External memory disables several optimizations in bpfjit for most
filter programs even if they don't use BPF_COP.

1. sljit has a limited number of registers especially memory addressable
   registers. I will have to allocate a register to a memstore pointer.
   Because all registers are already assigned, I will have to start
   moving data. The best I can think of is assigning BPFJIT_TMP2 to a
   new pointer but I will need to switch to EREG register in 32bit
   BPF_LD. This register is emulated on some arches, for instance on
   i386.
2. I have (or plan to have) some simple optimizations which aren't
   possible by rewriting a bpf program. For instance, if there are
   multiple loads in a linear block, I will only generate one index
   check.
   Here is an example: LD+W(0) ST(0) LD+W(4) ST(1) LD+W(8) ST(2) ...
   It loads packet words with increasing offsets and stores them
   in memwords for later processing. I generate only only one index
   check for the first LD instruction (return 0 is packet is shorter
   than 12 bytes). I can exit early without storing any word because
   I know that memory will not be available after that bpf program
   returns.
   If you make it external, I will have to generate three index
   checks to make sure that stores are visible to a caller for all
   possible packet lengths.

 However, this has to be supported by bpf_validate() - we have
 to tell it which words are going to be initialised by the caller, since it
 currently checks and prevents loads of the uninitialised memstore words.
 
 Hence, here is the proposed API function for this:
 
 int bpf_set_initwords(bpf_ctx_t *bc, uint8_t *wordmask, size_t len);

Your description of the new API is too terse to be absolutely
certain but it doesn't look like my proposal I sent to you privately.

I think you can add a mask to each copfunc which indicates which
of the words it loads and which words are stored by the copfunc.

For instance, your npf_cop_l3 will look like

{ .copfunc = npf_cop_l3,
  .loads  = BPF_COP_LOAD_NONE,
  .stores = BPF_COP_STORE(BPF_MW_IPVER) |
BPF_COP_STORE(BPF_MW_L4OFF) |
BPF_COP_STORE(BPF_MW_L4PROTO)
},
/* ... other copfuncs ... */

Validation of BPF_COP instruction will look very natural:

/* inside BPF_COP in bpf_validate_ext */
if (bc-copfuncs) {
if (bc-copfuncs[i].inwords  invalid)
goto out;
invalid = ~bc-copfuncs[i].outwords;
}

It's a bit more trickier for BPF_COPX because you need to pre-calculate
loads/stores masks but it's doable.

At the moment, bpf_filter_ext() resets the invalid mask. This
translates to .stores = BPF_COP_STORE_ALL.

Alex