Re: BPF memstore and bpf_validate_ext()
Alexander Nasonov al...@yandex.ru wrote: Mindaugas Rasiukevicius wrote: Moreover, the usual byte-code produced by tcpdump/pcap does not even use the memory store so you optimisations would most of the time be applicable anyway! This is not always the case. For instance, # tcpdump -y IEEE802_11 -i urtwn0 -d not tcp tcpdump: data link type IEEE802_11 ... I am aware of these cases, but I optimise for 80%, not 20%. -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Moreover, the usual byte-code produced by tcpdump/pcap does not even use the memory store so you optimisations would most of the time be applicable anyway! This is not always the case. For instance, # tcpdump -y IEEE802_11 -i urtwn0 -d not tcp tcpdump: data link type IEEE802_11 (000) ldx #0x0 (001) txa (002) add #24 (003) st M[0] (004) ldb [x + 0] (005) jset #0x8 jt 6jf 11 (006) jset #0x4 jt 11 jf 7 (007) jset #0x80jt 8jf 11 (008) ld M[0] (009) add #2 (010) st M[0] (011) ldb [0] (012) jset #0x4 jt 27 jf 13 (013) ldb [0] (014) jset #0x8 jt 15 jf 27 (015) ldx M[0] (016) ldh [x + 6] (017) jeq #0x86dd jt 18 jf 27 (018) ldx M[0] (019) ldb [x + 14] (020) jeq #0x6 jt 37 jf 21 (021) ldx M[0] (022) ldb [x + 14] (023) jeq #0x2cjt 24 jf 27 (024) ldx M[0] (025) ldb [x + 48] (026) jeq #0x6 jt 37 jf 27 (027) ldb [0] (028) jset #0x4 jt 38 jf 29 (029) ldb [0] (030) jset #0x8 jt 31 jf 38 (031) ldx M[0] (032) ldh [x + 6] (033) jeq #0x800 jt 34 jf 38 (034) ldx M[0] (035) ldb [x + 17] (036) jeq #0x6 jt 37 jf 38 (037) ret #0 (038) ret #65535 Alex
Re: BPF memstore and bpf_validate_ext()
Sorry for top-posting. I'm replying from my phone. I've not looked at linux bpf before. I remember taking a quick look at bpf_jit_compile function but I didn't like emitting binary machine code with macro commands. I spent few minutes today looking at linux code and I noticed few interesting things: - They use negative offsets to access auxiliary data. So, there is a clear distinction between local memory store and external data. I don't think it's a new addition, though. - They have a big enum of commands. Many of them translate to bpf commands but there are also special commands like load protocol number into A. There is a decoder from bpf but I have no clue how it works. - Those commands are adapted to work with skbuf data. Alex 20.12.13, 04:16, David Laight da...@l8s.co.uk: On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) Because it was decided to use BPF byte-code for more applications and that meant there is a need for improvements. It is called evolution. :) Has anyone here looked closely at the changes linux is making to bpf? David -- David Laight: da...@l8s.co.uk -- Alex
Re: BPF memstore and bpf_validate_ext()
On Fri, Dec 20, 2013 at 1:13 PM, Alexander Nasonov al...@yandex.ru wrote: Sorry for top-posting. I'm replying from my phone. I've not looked at linux bpf before. I remember taking a quick look at bpf_jit_compile function but I didn't like emitting binary machine code with macro commands. I spent few minutes today looking at linux code and I noticed few interesting things: - They use negative offsets to access auxiliary data. So, there is a clear distinction between local memory store and external data. I don't think it's a new addition, though. - They have a big enum of commands. Many of them translate to bpf commands but there are also special commands like load protocol number into A. There is a decoder from bpf but I have no clue how it works. - Those commands are adapted to work with skbuf data. I have used some of the non traditional uses of bpf in Linux, in particularly the syscall filtering code, which is designed to be a bit like the packet filtering code. But I don't think it is a great model, and I think the jit compiler is rather different, as it compiles to asm. And the current route they are going seems to be validation rather than in kernel jitting, see https://lwn.net/Articles/575531/ Justin Alex 20.12.13, 04:16, David Laight da...@l8s.co.uk: On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) Because it was decided to use BPF byte-code for more applications and that meant there is a need for improvements. It is called evolution. :) Has anyone here looked closely at the changes linux is making to bpf? David -- David Laight: da...@l8s.co.uk -- Alex
Re: BPF memstore and bpf_validate_ext()
Alexander Nasonov al...@yandex.ru wrote: The external memory store can be used as an argument (and the initial values determined as proposed in this thread). If you want, we can pass it as a third argument, I just think it is a pointless indirection level. Would even need extra wrapping i.e. more work in bpfjit, but if you want that separated - fine. I'll tell you more. Mixing mem with arguments will not only save me one register/one indirection level but it will allow to use memory addressing. It's very attractive solution from a performance perspective. But it's clearly a hack and I don't want to have hacks in public API. We have discussed this before and I wrote quite long paragraph trying to explain why I think such split does not really have real merits. In a nutshell: such separation does not solve any problem (does not reduce complexity, does not help to honour the information hiding principle) and the logical justification to have bpf_stack_t is a bit weak. I am fine/neutral about adding a third uint32_t *mem argument. That is great, but we are going circles here. If a program just needs some values stored in the memory store - why would you create a COP to get few integers instead of simply letting the caller to pass them. It is the thing which actually has those numbers. What if your program just needs something else? Does it mean we have to add a new feature to bpf? No. We have to stop somewhere. ... We should not stop because of the quantity. We should consider whether it *makes sense* to add such feature, looking at its virtues and implications. I have already explained the benefits of the external memory store in this thread (simple, straightforward way to pass values and have their cache). Your main argument seems to be that the external memstore makes it more difficult for bpfjit to optimise certain corner cases (which, as been pointed out, are not common). Given that BPF did not have JIT compiler for many years and even a very simplistic JIT compilation is a huge win - I do not understand why do you consider corner case optimisations as a higher value than the benefits provided by the external memstore. BPF_COP is powerful enough for getting external data. It's a bit slower but I can give you SLJIT_FAST_CALL interface. If it's still slow for you, you should stop using sljit and consider other alternatives. That is wonderful, but not the point.. It's easy to suggest to have a flag but it's actually a lot or work. You need to write several lines of C code to generate a single instruction. The flag would basically say treat the memstore as internal i.e. just do all the optimisations, because I assure there are no side effects. It is a green light for what you said you already want to implement. That would be one-liner check or I miss something? Ok, with my recent change on github, it's indeed one-line change but having two different modes can introduce subtle bugs. If you forget to set the mode properly, you may have a situation when your memory is in a bad state only for packets of length 131 (or whatever) and it's consistent for all other lengths. I'm as a maintainer of bpfjit is responsible for a robust public interface. Just keep the flag off by default? Really, we have only few BPF users in the NetBSD tree. It is not rocket science to correctly set a flag. -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: That is great, but we are going circles here. If a program just needs some values stored in the memory store - why would you create a COP to get few integers instead of simply letting the caller to pass them. It is the thing which actually has those numbers. Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) To answer your question about a program that needs some values stored, you want a protocol to pass some values from the host C environment to embedded BPF environment. You do it in a very ad-hoc manner. It's not possible to say by looking at bpf program which memwords are passed from outside and which are really internal. BPF is a language and all its extensions should be designed with this in mind. I have already explained the benefits of the external memory store in this thread (simple, straightforward way to pass values and have their cache). Your main argument seems to be that the external memstore makes it more difficult for bpfjit to optimise certain corner cases (which, as been pointed out, are not common). Well, I was arguing about performance because you were concerned about performance of your bpf programs. Your solution lacks a concept. You're mixing things together without bothering much about how will they interact to achieve a common goal. I already pointed out that your COP is powerful enough to copy external data to a memory store. To add to my other point above, you're mixing local and global memory. Or, if you use an analogy with Lua (that uses a byte code to interpret program), there are local and global variables with a very clear distinction between them. With your proposal, there is no clear distinction inside bpf program which memory words are internal and which are external. Though, I don't think I want to see global memory in bpf. Given that BPF did not have JIT compiler for many years and even a very simplistic JIT compilation is a huge win - I do not understand why do you consider corner case optimisations as a higher value than the benefits provided by the external memstore. I personally see no benefits in external memstore at all. You can copy external data with a special copfunc. BPF_COP is powerful enough for getting external data. It's a bit slower but I can give you SLJIT_FAST_CALL interface. If it's still slow for you, you should stop using sljit and consider other alternatives. That is wonderful, but not the point.. Ok, let me make a point them. If you want to make significant changes to bpf, you'd better start from scrach and design whatever you want. I'd be easier than arguing about bpf changes forever. Alex
Re: BPF memstore and bpf_validate_ext()
Alexander Nasonov al...@yandex.ru wrote: Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) Because it was decided to use BPF byte-code for more applications and that meant there is a need for improvements. It is called evolution. :) To answer your question about a program that needs some values stored, you want a protocol to pass some values from the host C environment to embedded BPF environment. You do it in a very ad-hoc manner. We are talking about byte-code i.e. quite low level environment. The protocol (contract) is specified at the caller level. Can you explain what kind of generic protocol do you expect to see? Let me put it this way - what *problem* are you trying to solve? It's not possible to say by looking at bpf program which memwords are passed from outside and which are really internal. How is it different when using BPF coprocessor? It is as clear (or not) as using the external memory store. It is a contract between the caller and the BPF program. BPF is a language and all its extensions should be designed with this in mind. I have already explained the benefits of the external memory store in this thread (simple, straightforward way to pass values and have their cache). Your main argument seems to be that the external memstore makes it more difficult for bpfjit to optimise certain corner cases (which, as been pointed out, are not common). Well, I was arguing about performance because you were concerned about performance of your bpf programs. Your solution lacks a concept. You're mixing things together without bothering much about how will they interact to achieve a common goal. I already pointed out that your COP is powerful enough to copy external data to a memory store. To add to my other point above, you're mixing local and global memory. Or, if you use an analogy with Lua (that uses a byte code to interpret program), there are local and global variables with a very clear distinction between them. With your proposal, there is no clear distinction inside bpf program which memory words are internal and which are external. Huh? What does it have to do with local and global variables? We barely have a concept of variable. We are discussing very simple byte-code which is not even Turing-complete. It is not Lua, it is not Haskell, it is tiny assembly language. I am all for well-thought abstractions and interfacing, but how far do you want to go in abstracting 16 words? I have been trying to figure out what existing or future practical problems you are trying to solve? Not metaphysical use cases or fantasy languages based on BPF byte-code.. they do not exist, they make no sense. Sometimes array is just an array, not an object. The sword is double-edged and the other edge is called over-engineering. It is disappointing that we are going ad infinitum.. -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) Because it was decided to use BPF byte-code for more applications and that meant there is a need for improvements. It is called evolution. :) Has anyone here looked closely at the changes linux is making to bpf? David -- David Laight: da...@l8s.co.uk
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: My point is that you mix argument pack with something else. They should be separeted. The external memory store can be used as an argument (and the initial values determined as proposed in this thread). If you want, we can pass it as a third argument, I just think it is a pointless indirection level. Would even need extra wrapping i.e. more work in bpfjit, but if you want that separated - fine. I'll tell you more. Mixing mem with arguments will not only save me one register/one indirection level but it will allow to use memory addressing. It's very attractive solution from a performance perspective. But it's clearly a hack and I don't want to have hacks in public API. I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit. They're much faster than regular copfuncs. But that's mean you will need to emit sljit code and you will have a limited number of sljit registers and all other limitations of sljit. You still should be able to copy data from auxiliary argument to memstore and you can do it quite fast. ... That is great, but we are going circles here. If a program just needs some values stored in the memory store - why would you create a COP to get few integers instead of simply letting the caller to pass them. It is the thing which actually has those numbers. What if your program just needs something else? Does it mean we have to add a new feature to bpf? No. We have to stop somewhere. BPF_COP is powerful enough for getting external data. It's a bit slower but I can give you SLJIT_FAST_CALL interface. If it's still slow for you, you should stop using sljit and consider other alternatives. It's easy to suggest to have a flag but it's actually a lot or work. You need to write several lines of C code to generate a single instruction. The flag would basically say treat the memstore as internal i.e. just do all the optimisations, because I assure there are no side effects. It is a green light for what you said you already want to implement. That would be one-liner check or I miss something? Ok, with my recent change on github, it's indeed one-line change but having two different modes can introduce subtle bugs. If you forget to set the mode properly, you may have a situation when your memory is in a bad state only for packets of length 131 (or whatever) and it's consistent for all other lengths. I'm as a maintainer of bpfjit is responsible for a robust public interface. Alex
Re: BPF memstore and bpf_validate_ext()
On 16/12/2013 3:20 AM, Alexander Nasonov wrote: .. This discussion is going to nowhere. At this point, someone from outside should review our opinions and help us to make a decision. It is one hack after another to support NPF using tables in BPF. Darren
Re: BPF memstore and bpf_validate_ext()
Alexander Nasonov al...@yandex.ru wrote: Mindaugas Rasiukevicius wrote: Also, it was you who proposed sljit. Proposed for what? I implemented bpfjit using sljit if that's what you mean. I offered you a help with implementing jit compiler for npfcode. It was your idea to add COP/COPX and I agreed to implement a support for it in bpfjit. I never agreed on implementing external memory. Fair enough. I still lack clear understanding why are you unhappy with the external memory store.. It can optimise *most* practical cases (80-20 rule) and I am happy with that. I do not understand why are you concerned about those rare/unusual cases. Do you have some particular application in mind? Something else than in our tree? I don't have any application in mind but I don't understand why are you pushing two extentions to bpf solely to get performance benefit for your cases and you don't care that bpf looses performance even if there are no cop instructions in a program at all. I *do* care about performance of regular BPF usage i.e. tcpdump/libpcap. However, I have been trying to explain that it is trivial to avoid the performance penalty.. perhaps I miss something, but you did not explain why is it problematic. We can pass the memstore pointer as a separate argument (it would be three arguments, fine for sljit), but what's the point.. My point is that you mix argument pack with something else. They should be separeted. The external memory store can be used as an argument (and the initial values determined as proposed in this thread). If you want, we can pass it as a third argument, I just think it is a pointless indirection level. Would even need extra wrapping i.e. more work in bpfjit, but if you want that separated - fine. Why are you ignoring the fact that your optimisations can still be added and be effective? I already suggested - we can add a flag to indicate that the caller does not care about the result in the memory store. I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit. They're much faster than regular copfuncs. But that's mean you will need to emit sljit code and you will have a limited number of sljit registers and all other limitations of sljit. You still should be able to copy data from auxiliary argument to memstore and you can do it quite fast. ... That is great, but we are going circles here. If a program just needs some values stored in the memory store - why would you create a COP to get few integers instead of simply letting the caller to pass them. It is the thing which actually has those numbers. It straightforward, it is simpler, it is faster, and it does not make the performance suffer for other programs. BPF_COP optimisations are brilliant, but.. we are talking about different issue. It's easy to suggest to have a flag but it's actually a lot or work. You need to write several lines of C code to generate a single instruction. The flag would basically say treat the memstore as internal i.e. just do all the optimisations, because I assure there are no side effects. It is a green light for what you said you already want to implement. That would be one-liner check or I miss something? I don't want to maintain two different modes of code generation. If you want this flag, go ahead, write the code, write the tests and everyone will be happy. I was simply trying to make suggestions which would make your life easier, certainly not harder. Alternatively, I can write patches. -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Darren Reed darr...@netbsd.org wrote: On 16/12/2013 3:20 AM, Alexander Nasonov wrote: .. This discussion is going to nowhere. At this point, someone from outside should review our opinions and help us to make a decision. It is one hack after another to support NPF using tables in BPF. Nice try, but you missed it. NPF tables are already supported and have been happily working for a good while. The discussion is not really about them. Try again. -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Alexander Nasonov al...@yandex.ru wrote: All your recent changes to adapt bpfjit for npf show that you're hitting sljit limitation all the time. Your expectations of sljit are probably too high. We are discussing the external memory store which is useful regardless whether BPF program is JIT-compiled or not. It is a win in both cases. Also, it was you who proposed sljit. It can optimise *most* practical cases (80-20 rule) and I am happy with that. I do not understand why are you concerned about those rare/unusual cases. Do you have some particular application in mind? Something else than in our tree? ... When I later was implementing copfunc calls in bpfjit, I din't know how you gonna use external memory and I felt that it wasn't necessary (and I still do), so I changed your structs. We can pass the memstore pointer as a separate argument (it would be three arguments, fine for sljit), but what's the point.. ... By not making memory external I avoided doing a lot of changes in bpfjit to adapt to a new execution model. Now you want to have *both* external memory and copfuncs for a faster execution of your code without considering other uses of bpfjit. To be honest, I'm not very intersted in implementing external memory in bpfjit. Quite the opposite, after all these discussions I want to improve my index check optimization and make filters like 'host xxx' run a bit faster. Needless to say that this optimization is the most effective if bpf memory isn't visible from outside. Why are you ignoring the fact that your optimisations can still be added and be effective? I already suggested - we can add a flag to indicate that the caller does not care about the result in the memory store. Moreover, the usual byte-code produced by tcpdump/pcap does not even use the memory store so you optimisations would most of the time be applicable anyway! This discussion is going to nowhere. At this point, someone from outside should review our opinions and help us to make a decision. Alex -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Also, it was you who proposed sljit. Proposed for what? I implemented bpfjit using sljit if that's what you mean. I offered you a help with implementing jit compiler for npfcode. It was your idea to add COP/COPX and I agreed to implement a support for it in bpfjit. I never agreed on implementing external memory. It can optimise *most* practical cases (80-20 rule) and I am happy with that. I do not understand why are you concerned about those rare/unusual cases. Do you have some particular application in mind? Something else than in our tree? I don't have any application in mind but I don't understand why are you pushing two extentions to bpf solely to get performance benefit for your cases and you don't care that bpf looses performance even if there are no cop instructions in a program at all. We can pass the memstore pointer as a separate argument (it would be three arguments, fine for sljit), but what's the point.. My point is that you mix argument pack with something else. They should be separeted. Why are you ignoring the fact that your optimisations can still be added and be effective? I already suggested - we can add a flag to indicate that the caller does not care about the result in the memory store. I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit. They're much faster than regular copfuncs. But that's mean you will need to emit sljit code and you will have a limited number of sljit registers and all other limitations of sljit. You still should be able to copy data from auxiliary argument to memstore and you can do it quite fast. You didn't respond to me about it. If you ingored it because you don't want to deal with sljit than you're pulling a blanket. It's easy to suggest to have a flag but it's actually a lot or work. You need to write several lines of C code to generate a single instruction. I don't want to maintain two different modes of code generation. If you want this flag, go ahead, write the code, write the tests and everyone will be happy. Moreover, the usual byte-code produced by tcpdump/pcap does not even use the memory store so you optimisations would most of the time be applicable anyway! Maybe in this case. I don't know all use cases. There are some IDS/IPS that use bpf but I never looked at them. In any case, this functionality will have to be tested. Alex
Re: BPF memstore and bpf_validate_ext()
Alexander Nasonov al...@yandex.ru wrote: If this is the case, you don't need BPF_COP at all. For this particular case - yes, correct. Except of course you may run out or memwords one day and you want to have BPF_COP as a fallback. I still need BPF_COP for other tasks (e.g. NPF_COP_TABLE). Running out of words is unlikely, but COP can certainly be used to handle that too. I wonder why do you need two different features when one is a superset of the other if you could use BPF_COP? The fact that you can abuse one feature to achieve the result of another does not mean that it is a good solution. Using the external memory store and initialising/passing some values is just simple and straightforward, besides being faster. ... If the only reason is performance, external memory is not a win-win proposition. You mean because using EREG penalises the access of the memstore? I doubt it is a big deal, but can we just benchmark and see? -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: In your case BPF_COP is always the first instruction (or in the first linear block). Because it's the first and it's a function call, it can be moved outside of bpf program and inlined. If this is the case, you don't need BPF_COP at all. For this particular case - yes, correct. Except of course you may run out or memwords one day and you want to have BPF_COP as a fallback. I still need BPF_COP for other tasks (e.g. NPF_COP_TABLE). Running out of words is unlikely, but COP can certainly be used to handle that too. I wonder why do you need two different features when one is a superset of the other if you could use BPF_COP? If the only reason is performance, external memory is not a win-win proposition. PS sljit has a non-standard fast call mechanism. If you're really concerned about performance, you can generate copfuncs and I can call them from bpfjit. Alex
BPF memstore and bpf_validate_ext()
Hello, Now that the BPF memory store can be external and can be provided by the caller of the BPF program, we can use to it pass some values or reuse it as a cache. However, this has to be supported by bpf_validate() - we have to tell it which words are going to be initialised by the caller, since it currently checks and prevents loads of the uninitialised memstore words. Hence, here is the proposed API function for this: int bpf_set_initwords(bpf_ctx_t *bc, uint8_t *wordmask, size_t len); Note that we already have an extended bpf_validate() variation: int bpf_validate_ext(bpf_ctx_t *bc, const struct bpf_insn *f, int len); The original filter/validate API stays as is, only the extended routines support the external memory store and its initialisation. Thanks. -- Mindaugas
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Now that the BPF memory store can be external and can be provided by the caller of the BPF program, we can use to it pass some values or reuse it as a cache. I don't think that external memory is needed. You added an auxiliary agrument, what is it for if it's not for passing some values? External memory disables several optimizations in bpfjit for most filter programs even if they don't use BPF_COP. 1. sljit has a limited number of registers especially memory addressable registers. I will have to allocate a register to a memstore pointer. Because all registers are already assigned, I will have to start moving data. The best I can think of is assigning BPFJIT_TMP2 to a new pointer but I will need to switch to EREG register in 32bit BPF_LD. This register is emulated on some arches, for instance on i386. 2. I have (or plan to have) some simple optimizations which aren't possible by rewriting a bpf program. For instance, if there are multiple loads in a linear block, I will only generate one index check. Here is an example: LD+W(0) ST(0) LD+W(4) ST(1) LD+W(8) ST(2) ... It loads packet words with increasing offsets and stores them in memwords for later processing. I generate only only one index check for the first LD instruction (return 0 is packet is shorter than 12 bytes). I can exit early without storing any word because I know that memory will not be available after that bpf program returns. If you make it external, I will have to generate three index checks to make sure that stores are visible to a caller for all possible packet lengths. However, this has to be supported by bpf_validate() - we have to tell it which words are going to be initialised by the caller, since it currently checks and prevents loads of the uninitialised memstore words. Hence, here is the proposed API function for this: int bpf_set_initwords(bpf_ctx_t *bc, uint8_t *wordmask, size_t len); Your description of the new API is too terse to be absolutely certain but it doesn't look like my proposal I sent to you privately. I think you can add a mask to each copfunc which indicates which of the words it loads and which words are stored by the copfunc. For instance, your npf_cop_l3 will look like { .copfunc = npf_cop_l3, .loads = BPF_COP_LOAD_NONE, .stores = BPF_COP_STORE(BPF_MW_IPVER) | BPF_COP_STORE(BPF_MW_L4OFF) | BPF_COP_STORE(BPF_MW_L4PROTO) }, /* ... other copfuncs ... */ Validation of BPF_COP instruction will look very natural: /* inside BPF_COP in bpf_validate_ext */ if (bc-copfuncs) { if (bc-copfuncs[i].inwords invalid) goto out; invalid = ~bc-copfuncs[i].outwords; } It's a bit more trickier for BPF_COPX because you need to pre-calculate loads/stores masks but it's doable. At the moment, bpf_filter_ext() resets the invalid mask. This translates to .stores = BPF_COP_STORE_ALL. Alex