Re: DPI for pf(4)
Hi all, adhering to the basic rule of not reinventing the wheel has sort of crippled the efforts to come up with an elegant solution for the topic at hand. Two approaches have been proposed earlier, so let's go through them: (1) Diverting traffic to userspace That's generally a good idea, but defeats the purpose of having zero-latency functionality in pf(4) itself, because going through the scheduler isn't optimal (scheduler people, don't hate me). Worse still, the way TCP incorporates handshakes makes loosely- coupled DPI worthless, because the divert cannot happen before the payload is seen. The only way around this is not diverting at all -- that can only happen with a pf(4) that's completely contained in userspace. I understand the requirement of not doing anything reckless in the kernel and I don't think it's a wise decision to try it anyway. Remember that the goal was to keep consistency and utilise the base functionality in the firewall code itself. (2) bpf(4)-based filters The BPF-VM is neat and the idea of its filters in accordance with the current requirements for the proposed code. However, the amount of work and infrastructure to be built around bpf(4) to avoid any kind of unwanted complexity inside the DPI code is -- at least for me -- not feasible. Instead, the route to take at this point is a userspace library, which can grow, try different things, stumble, explode, adapt, and some day may even be the base of a firewall away from the restriction of the kernel. Others can still implement (1). I don't think (2) will be of much interest in real world applications. Feel free to contact me on and off-list if you have any further questions. :) Thank you all for your participation, Franco
Re: DPI for pf(4)
On Wed, 1 May 2013, Franco Fichtner wrote: Not sure if that's a fitting comparison; and I know too little OSPF to answer. Let me try another route. The logic consists of an array of application detection functions, which can be invoked via their respective IP types. I don't like this approach at all - it leads to a proliferation (as demonstrated by your already long list) of kernel-side parsers that will be a maintenance, and possibly security, nightmare. The last thing we want it a rotting pile of protocol parsing code like wireshark. On May 1, 2013, at 1:14 AM, Ted Unangst t...@tedunangst.com wrote: My thoughts on the matter have always been that it would be cool to integrate bpf into pf (though other developers surely have other opinions). Then you get filtering for as many protocols as you care to write bpf matchers for. You mean externalising the DPI? People[1] have tried to work on such ideas, but the general drift is that there are not enough interested individuals in the field to drive second tier development for application detections. So if there is not enough interest to develop app protocol detectors/ disectors in bpf then why should C be any different? I find C to be quite flexible and empowering if one doesn't overcomplicate[2]. [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c That's complicated and scary code for a kernel, e.g. multiple opportunities for unsigned overflow that don't seem to be checked for. I agree with tedu - it safer and more flexible to write parsers in bpf. If that isn't desirable then maybe we could consider some other automata classifier, but I think it is a bad, bad idea to do it in C. -d
Re: DPI for pf(4)
Hi Damien, On May 2, 2013, at 10:03 AM, Damien Miller d...@mindrot.org wrote: On Wed, 1 May 2013, Franco Fichtner wrote: Not sure if that's a fitting comparison; and I know too little OSPF to answer. Let me try another route. The logic consists of an array of application detection functions, which can be invoked via their respective IP types. I don't like this approach at all - it leads to a proliferation (as demonstrated by your already long list) of kernel-side parsers that will be a maintenance, and possibly security, nightmare. as stated before, breaking down complexity to the bare minimum is my requirement for this to be happening at all. You all get to be the judges. I'm just trying to work on something worth doing. The last thing we want it a rotting pile of protocol parsing code like wireshark. Case closed then? I don't know how to argue with that. On May 1, 2013, at 1:14 AM, Ted Unangst t...@tedunangst.com wrote: My thoughts on the matter have always been that it would be cool to integrate bpf into pf (though other developers surely have other opinions). Then you get filtering for as many protocols as you care to write bpf matchers for. You mean externalising the DPI? People[1] have tried to work on such ideas, but the general drift is that there are not enough interested individuals in the field to drive second tier development for application detections. So if there is not enough interest to develop app protocol detectors/ disectors in bpf then why should C be any different? Because it takes complexity out of the system for one. Plus, pf(4) is at the core of OpenBSD. There's not much noise about bpf(4) here. I find C to be quite flexible and empowering if one doesn't overcomplicate[2]. [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c That's complicated and scary code for a kernel, e.g. multiple opportunities for unsigned overflow that don't seem to be checked for. You are absolutely right. And it's *not* my code, it was merely an example of how the TLS code can be broken down to the bare minimum[1]. I agree with tedu - it safer and more flexible to write parsers in bpf. If that isn't desirable then maybe we could consider some other automata classifier, but I think it is a bad, bad idea to do it in C. Again, I don't know how to argue with that. :) Kind regards, Franco [1] http://marc.info/?l=openbsd-techm=136739531914555w=2
Re: DPI for pf(4)
On Thu, 2 May 2013, Franco Fichtner wrote: as stated before, breaking down complexity to the bare minimum is my requirement for this to be happening at all. You all get to be the judges. I'm just trying to work on something worth doing. Well, bare minimum complexity per-protocol * large_number_of_protocols = a lot of complexity. The incentive is always going to be to add more protocols and never retire them. Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do near zero-overhead DPI completely in userspace? -d
Re: DPI for pf(4)
On 2013/05/02 18:03, Damien Miller wrote: I find C to be quite flexible and empowering if one doesn't overcomplicate[2]. [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c That's complicated and scary code for a kernel, e.g. multiple opportunities for unsigned overflow that don't seem to be checked for. Here Franco is giving an example of the overcomplicated code that other dpi is using - this is not what he is proposing for PF...
Re: DPI for pf(4)
On Thu, May 02, 2013 at 10:35:19AM +0200, Franco Fichtner wrote: as stated before, breaking down complexity to the bare minimum is my requirement for this to be happening at all. You all get to be the judges. I'm just trying to work on something worth doing. The last thing we want it a rotting pile of protocol parsing code like wireshark. Case closed then? I don't know how to argue with that. IMHO, don't ask and don't argue. If you need DPI in pf (or whatever), write it *for you*, then use it for *your needs*. If one day you feel it could be useful to others, share the code and someone may like it. Speaking of complexity, OpenBSD already has plenty of complicated kernel code that could run in user-mode but it's in the kernel because it was easier that way, or the author thought it's faster that way or ports expect it to be that way. -- Alexandre
Re: DPI for pf(4)
On May 2, 2013, at 10:45 AM, Damien Miller d...@mindrot.org wrote: On Thu, 2 May 2013, Franco Fichtner wrote: as stated before, breaking down complexity to the bare minimum is my requirement for this to be happening at all. You all get to be the judges. I'm just trying to work on something worth doing. Well, bare minimum complexity per-protocol * large_number_of_protocols = a lot of complexity. The incentive is always going to be to add more protocols and never retire them. I guess that's true for most software projects. Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do near zero-overhead DPI completely in userspace? Wouldn't that mean pf.conf(5) syntax extensions cannot be implemented? It's not full-blown DPI analysis for extracting all kinds of events from a flow -- it's merely a tagging tool, and if that sits in user space, it's really not helpful except for logging / accounting. One could do that with a simple pcap(3) binding as well. Stuart made a good point for divert-packet being able to pick up applications without the need for any other information (ports, interfaces, addresses). I'm sorry for not being able to make it more clear at this time. Next step for me is to write a comprehensive description. In any case, the input on tech@ has been very helpful so far. Thanks guys! :) Franco
Re: DPI for pf(4)
On Thu, 2 May 2013, Franco Fichtner wrote: Well, bare minimum complexity per-protocol * large_number_of_protocols = a lot of complexity. The incentive is always going to be to add more protocols and never retire them. I guess that's true for most software projects. We try not to implement an effectively unbounded number of protocol parsers in the kernel. Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do near zero-overhead DPI completely in userspace? Wouldn't that mean pf.conf(5) syntax extensions cannot be implemented? It doesn't mean that - you'd just need some way for userspace to signal information to pf. E.g add a SO_PF_TAG to set the pf tag. Then you could use some program that used SO_BINDANY to inspect the beginning of the session, set a pf tag using setsockopt, SO_SPLICE to avoid further need to copy the session in userspace and control the traffic in pf using the tagged keyword. It's not full-blown DPI analysis for extracting all kinds of events from a flow -- it's merely a tagging tool, and if that sits in user space, it's really not helpful except for logging / accounting. One could do that with a simple pcap(3) binding as well. Why not do the tagging in userspace using the existing facilities? -d
Re: DPI for pf(4)
On May 2, 2013, at 1:23 PM, Damien Miller d...@mindrot.org wrote: On Thu, 2 May 2013, Franco Fichtner wrote: Well, bare minimum complexity per-protocol * large_number_of_protocols = a lot of complexity. The incentive is always going to be to add more protocols and never retire them. I guess that's true for most software projects. We try not to implement an effectively unbounded number of protocol parsers in the kernel. Agreed. Let's put a hard limit on it. 5, 10, 20, 50? Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do near zero-overhead DPI completely in userspace? Wouldn't that mean pf.conf(5) syntax extensions cannot be implemented? It doesn't mean that - you'd just need some way for userspace to signal information to pf. E.g add a SO_PF_TAG to set the pf tag. Then you could use some program that used SO_BINDANY to inspect the beginning of the session, set a pf tag using setsockopt, SO_SPLICE to avoid further need to copy the session in userspace and control the traffic in pf using the tagged keyword. That sounds a bit too complex as well, but would likely work. I'll read into this some more, thanks. It's not full-blown DPI analysis for extracting all kinds of events from a flow -- it's merely a tagging tool, and if that sits in user space, it's really not helpful except for logging / accounting. One could do that with a simple pcap(3) binding as well. Why not do the tagging in userspace using the existing facilities? Mainly to avoid any kind of introduction of latency, buffering, asynchronous behaviour, packet reordering, not invoking the scheduler, avoiding cache line bouncing, and being generally prone to multithreading issues in a perfect world where multiple CPUs could drive the networking stack. Also not having to reimplement certain packet parsing code, state tracking, and so on and so forth. Look, I have written all that stuff in user space, but redundancy and complicated architectures are not suitable for forwarding large loads of traffic. User space is that magical place that can do anything, even throw off your packet throughput by invoking a syscall to pull the current time stamp. Moving implementations to user space does not necessarily make them better or less of a problem. That's my concern. :) Franco
Re: DPI for pf(4)
On Thu, 2 May 2013, Franco Fichtner wrote: Moving implementations to user space does not necessarily make them better or less of a problem. The big difference is that its possible to sandbox a userspace implementation so that small integer overflow bugs or length checking failures don't become arbitrary kmem reads or, worse, RCE. -d
Re: DPI for pf(4)
On Thu, 2 May 2013, Franco Fichtner wrote: OK, the implementation only pulls a couple of bytes from the packet's payload. It will never pull bytes that are not verified. It will never allocate anything. It will never test against something that's neither hard-coded nor available in the range of the approved payload. It will never return more than unsigned int with a number describing the actual application. It will never manipulate any input value, lest of all the packet itself. It will never run into endless loops. And I'll gladly zap everything that could still considered be a potential risk. You've just described bpf, right down to no endless loops and the amount of data it returns. For a little more code that it takes to write one packet parser (basically: loading bpf rules from pf and making the bpf_filter()'s return value available to it) you get everything you described above and more. -d
Re: DPI for pf(4)
On May 2, 2013, at 2:40 PM, Damien Miller d...@mindrot.org wrote: On Thu, 2 May 2013, Franco Fichtner wrote: Moving implementations to user space does not necessarily make them better or less of a problem. The big difference is that its possible to sandbox a userspace implementation so that small integer overflow bugs or length checking failures don't become arbitrary kmem reads or, worse, RCE. OK, the implementation only pulls a couple of bytes from the packet's payload. It will never pull bytes that are not verified. It will never allocate anything. It will never test against something that's neither hard-coded nor available in the range of the approved payload. It will never return more than unsigned int with a number describing the actual application. It will never manipulate any input value, lest of all the packet itself. It will never run into endless loops. And I'll gladly zap everything that could still considered be a potential risk. Parsing TCP options is still more complex than what this particular DPI code is supposed to be doing. This comes from personal experience. ;) IMHO, the only issue that remains is a potentially unlimited number of applications. That's a strong point against the idea. Franco
Re: DPI for pf(4)
On May 2, 2013, at 3:20 PM, Damien Miller d...@mindrot.org wrote: On Thu, 2 May 2013, Franco Fichtner wrote: OK, the implementation only pulls a couple of bytes from the packet's payload. It will never pull bytes that are not verified. It will never allocate anything. It will never test against something that's neither hard-coded nor available in the range of the approved payload. It will never return more than unsigned int with a number describing the actual application. It will never manipulate any input value, lest of all the packet itself. It will never run into endless loops. And I'll gladly zap everything that could still considered be a potential risk. You've just described bpf, right down to no endless loops and the amount of data it returns. For a little more code that it takes to write one packet parser (basically: loading bpf rules from pf and making the bpf_filter()'s return value available to it) you get everything you described above and more. I yield. I'm working on making DPI more human-readable and maintainable, and struct bpf_insn is not an option for me, personally. Worse still, searching for bpf+dpi in google already brings up this mail thread as a top ten hit, which may be a good indicator of how successful this approach has been the last couple of years. ;) Franco
Re: DPI for pf(4)
fOn Thu, May 02, 2013 at 04:03:05PM +0200, Franco Fichtner wrote: On May 2, 2013, at 3:20 PM, Damien Miller d...@mindrot.org wrote: On Thu, 2 May 2013, Franco Fichtner wrote: OK, the implementation only pulls a couple of bytes from the packet's payload. It will never pull bytes that are not verified. It will never allocate anything. It will never test against something that's neither hard-coded nor available in the range of the approved payload. It will never return more than unsigned int with a number describing the actual application. It will never manipulate any input value, lest of all the packet itself. It will never run into endless loops. And I'll gladly zap everything that could still considered be a potential risk. You've just described bpf, right down to no endless loops and the amount of data it returns. For a little more code that it takes to write one packet parser (basically: loading bpf rules from pf and making the bpf_filter()'s return value available to it) you get everything you described above and more. I yield. I'm working on making DPI more human-readable and maintainable, and struct bpf_insn is not an option for me, personally. libpcap has a fairly simple parser to turn expressions into bpf instructions. It is used by tcpdump. Worse still, searching for bpf+dpi in google already brings up this mail thread as a top ten hit, which may be a good indicator of how successful this approach has been the last couple of years. ;) Franco
Re: DPI for pf(4)
On Thu, 2 May 2013, Damien Miller wrote: You've just described bpf, right down to no endless loops and the amount of data it returns. For a little more code that it takes to write one packet parser (basically: loading bpf rules from pf and making the bpf_filter()'s return value available to it) you get everything you described above and more. Actually, you could even make the bpf inspection stateful and bi-directional if you preserved its scratch memory between packets. -d
Re: DPI for pf(4)
Hi Stuart, On May 1, 2013, at 1:11 AM, Stuart Henderson st...@openbsd.org wrote: On 2013/05/01 00:16, Franco Fichtner wrote: Yes, I am proposing a lightweight approach: hard-wired regex-like code, no allocations, no reassembly or state machines. I've seen far worse things being put into Kernels and I assure you that I do refrain from putting in anything that could cause segmentation faults, sleeps, or other non-suitable behaviour. Would it be fair to describe it as a bit more complex than osfp, but not hugely so? Not sure if that's a fitting comparison; and I know too little OSPF to answer. Let me try another route. The logic consists of an array of application detection functions, which can be invoked via their respective IP types. There's 32 bits of external state for the table and a single hook into the application detection. And the detection for TLS/SSL3.0 follows. I have really tried to condense it down to the bare minimum. LI_DESCRIBE_APP(tls) { struct tls { uint8_t record_type; uint16_t version; uint16_t data_length; } __packed *ptr = (void *)packet-app.raw; uint16_t decoded; if (packet-app_len sizeof(struct tls)) { return (0); } decoded = be16dec(ptr-data_length); if (!decoded || decoded 0x4000) { /* no empty records possible, also = 2^14 */ return (0); } switch (ptr-record_type) { case 20:/* change_cipher_spec */ case 21:/* alert */ case 22:/* handshake */ case 23:/* application_data */ break; default: return (0); } switch (be16dec(ptr-version)) { case 0x0300:/* SSL 3.0 */ case 0x0301:/* TLS 1.0 */ case 0x0302:/* TLS 1.1 */ case 0x0303:/* TLS 1.2 */ break; default: return (0); } return (1); } Would a protocol like BGP have a bright future in relayd(8)? I don't know enough, maybe Reyk can clear this up? L7 filtering is cute, but ipfw-classifyd isn't maintained, DPI in Linux netfilter is not hitting it off, and there really is no BSD DPI. Franky, I don't care which way to go, but I believe that pf(4) is a suitable candidate. I especially like the one- rule-to-rule-them-all approach. Adding a keyword app to pf.conf(5) seems like the simplest solution -- much like proto does deal with IP types. And talking about complexity: 1000 LOC for 25 protocols. I'm afraid it can't be simplified any more than this. What sort of protocols do you think could be reasonably handled by this approach, and what would be too complicated? Good question! Text protocols are easy, RFCs and open implementations are generally easy. Anything too commercial/proprietary, especially in binary, is more guessing than anything else and may not be worth the effort. I don't see world of warcraft happening as a supported application. This is what I have done so far (by no means free of errors, though): -- BitTorrent -- Gnutella -- Network Basic Input Output System -- Telecommunication Network -- Hypertext Transfer Protocol -- Post Office Protocol (Version 3) -- Internet Message Access Protocol -- Simple Mail Transfer Protocol -- Session Traversal Utilities for NAT -- Dynamic Host Configuration Protocol -- Point-to-Point Tunneling Protocol -- Lightweight Directory Access Protocol -- Simple Network Management Protocol -- Secure Shell -- File Transfer Protocol -- Session Initiation Protocol -- Domain Name System -- Real-time Transport Control Protocol -- Real-time Transport Protocol -- Routing Information Protocol -- Boarder Gateway Protocol -- Internet Key Exchange -- Datagram Transport Layer Security -- Transport Layer Security -- Concurrent Versions System There is definitely something appealing about being able to say, for example, 'block proto tcp on port 443; pass proto tcp on port 443 app tls', or 'block app ssh; pass proto tcp from somehosts to port 22 app ssh' without a bunch more complexity involved in passing across to a separate proxy (which would then need to implement its own completely separate filtering and would, I think, not really be able to integrate with things like PF tags and queue assignment)... Yes, that would be one scenario. I like to think of lightweight packet inspection as application tagging. That's the first stage. Second stage is a real parser/proxy/endpoint. It's not a security functionality per se, but it can help to break down the workload. It doesn't care aboute IP versions, ports (mostly ;) ), different flavours (netbios could be session, datagram, and name service as one for example), and so forth. Basically what I'm wondering if it's possible to go far enough to be useful whilst keeping the complexity down to a level which is sane and
Re: DPI for pf(4)
Hi Ted, On May 1, 2013, at 1:14 AM, Ted Unangst t...@tedunangst.com wrote: On Wed, May 01, 2013 at 00:16, Franco Fichtner wrote: Yes, I am proposing a lightweight approach: hard-wired regex-like code, no allocations, no reassembly or state machines. I've seen far worse things being put into Kernels and I assure you that I do refrain from putting in anything that could cause segmentation faults, sleeps, or other non-suitable behaviour. And talking about complexity: 1000 LOC for 25 protocols. I'm afraid it can't be simplified any more than this. Well, it's really hard to comment on code we can't see. I understand. The code is hooked up to a library feeding off of recorded network traces at the moment. The idea doesn't feel mature enough to me at this time, not knowing where to put it. So there's no point in releasing a half-done code blob that does nothing on its own, but I'm willing to share it off-list with OpenBSD developers. My thoughts on the matter have always been that it would be cool to integrate bpf into pf (though other developers surely have other opinions). Then you get filtering for as many protocols as you care to write bpf matchers for. You mean externalising the DPI? People[1] have tried to work on such ideas, but the general drift is that there are not enough interested individuals in the field to drive second tier development for application detections. I find C to be quite flexible and empowering if one doesn't overcomplicate[2]. Franco [1] https://code.google.com/p/appid/source/browse/trunk/apps/aim [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c
Re: DPI for pf(4)
On 2013/05/01 09:01, Franco Fichtner wrote: Hi Stuart, On May 1, 2013, at 1:11 AM, Stuart Henderson st...@openbsd.org wrote: On 2013/05/01 00:16, Franco Fichtner wrote: Yes, I am proposing a lightweight approach: hard-wired regex-like code, no allocations, no reassembly or state machines. I've seen far worse things being put into Kernels and I assure you that I do refrain from putting in anything that could cause segmentation faults, sleeps, or other non-suitable behaviour. Would it be fair to describe it as a bit more complex than osfp, but not hugely so? Not sure if that's a fitting comparison; and I know too little OSPF to answer. I should have expanded the acronum to make it clear - osfp i.e. the OS fingerprinting code (pf_osfp.c). Let me try another route. The logic consists of an array of application detection functions, which can be invoked via their respective IP types. There's 32 bits of external state for the table and a single hook into the application detection. And the detection for TLS/SSL3.0 follows. I have really tried to condense it down to the bare minimum. LI_DESCRIBE_APP(tls) { struct tls { uint8_t record_type; uint16_t version; uint16_t data_length; } __packed *ptr = (void *)packet-app.raw; uint16_t decoded; if (packet-app_len sizeof(struct tls)) { return (0); } decoded = be16dec(ptr-data_length); if (!decoded || decoded 0x4000) { /* no empty records possible, also = 2^14 */ return (0); } switch (ptr-record_type) { case 20:/* change_cipher_spec */ case 21:/* alert */ case 22:/* handshake */ case 23:/* application_data */ break; default: return (0); } switch (be16dec(ptr-version)) { case 0x0300:/* SSL 3.0 */ case 0x0301:/* TLS 1.0 */ case 0x0302:/* TLS 1.1 */ case 0x0303:/* TLS 1.2 */ break; default: return (0); } return (1); } This type of thing looks sane to me, but others will want to comment. (I'll point others at your posts at http://lastsummer.de/category/technology/ too :-) Would a protocol like BGP have a bright future in relayd(8)? I don't know enough, maybe Reyk can clear this up? L7 filtering is cute, but ipfw-classifyd isn't maintained, DPI in Linux netfilter is not hitting it off, and there really is no BSD DPI. Franky, I don't care which way to go, but I believe that pf(4) is a suitable candidate. I especially like the one- rule-to-rule-them-all approach. Adding a keyword app to pf.conf(5) seems like the simplest solution -- much like proto does deal with IP types. And talking about complexity: 1000 LOC for 25 protocols. I'm afraid it can't be simplified any more than this. What sort of protocols do you think could be reasonably handled by this approach, and what would be too complicated? Good question! Text protocols are easy, RFCs and open implementations are generally easy. Anything too commercial/proprietary, especially in binary, is more guessing than anything else and may not be worth the effort. I don't see world of warcraft happening as a supported application. This is what I have done so far (by no means free of errors, though): -- BitTorrent -- Gnutella -- Network Basic Input Output System -- Telecommunication Network -- Hypertext Transfer Protocol -- Post Office Protocol (Version 3) -- Internet Message Access Protocol -- Simple Mail Transfer Protocol -- Session Traversal Utilities for NAT -- Dynamic Host Configuration Protocol -- Point-to-Point Tunneling Protocol -- Lightweight Directory Access Protocol -- Simple Network Management Protocol -- Secure Shell -- File Transfer Protocol -- Session Initiation Protocol -- Domain Name System -- Real-time Transport Control Protocol -- Real-time Transport Protocol -- Routing Information Protocol -- Boarder Gateway Protocol -- Internet Key Exchange -- Datagram Transport Layer Security -- Transport Layer Security -- Concurrent Versions System There is definitely something appealing about being able to say, for example, 'block proto tcp on port 443; pass proto tcp on port 443 app tls', or 'block app ssh; pass proto tcp from somehosts to port 22 app ssh' without a bunch more complexity involved in passing across to a separate proxy (which would then need to implement its own completely separate filtering and would, I think, not really be able to integrate with things like PF tags and queue assignment)... Yes, that would be one scenario. I like to think of lightweight packet inspection as application tagging. That's the first stage. Second
Re: DPI for pf(4)
On Tue, Apr 30, 2013 at 07:14:50PM -0400, Ted Unangst wrote: On Wed, May 01, 2013 at 00:16, Franco Fichtner wrote: Yes, I am proposing a lightweight approach: hard-wired regex-like code, no allocations, no reassembly or state machines. I've seen far worse things being put into Kernels and I assure you that I do refrain from putting in anything that could cause segmentation faults, sleeps, or other non-suitable behaviour. And talking about complexity: 1000 LOC for 25 protocols. I'm afraid it can't be simplified any more than this. Well, it's really hard to comment on code we can't see. My thoughts on the matter have always been that it would be cool to integrate bpf into pf (though other developers surely have other opinions). Then you get filtering for as many protocols as you care to write bpf matchers for. My first thought was why not to have something like squid does (ICAP) you can forward some inspection to other app and it would return you some agreed data (tag) and then you could work with then in pf rules... ???
Re: DPI for pf(4)
On May 1, 2013, at 9:41 AM, Stuart Henderson st...@openbsd.org wrote: I should have expanded the acronum to make it clear - osfp i.e. the OS fingerprinting code (pf_osfp.c). oh, sorry, my mistake. This I can comment on. :) The idea is the same. I'd say at this stage osfp has more complexity due to parsing the TCP header, splitting fields, pulling in external descriptions, etc. Looking beyond the headers is far less structured, because applications do the structuring on their own, which in turn makes external descriptions hard to, er, describe -- hence the hard- wired C approach. The only complexity is the growing amount of application descriptions, but each application function is completely isolated. Here's the DPI hook function (a bit simplified for the context of this discussion): li_get(const struct li_packet *packet, const struct li_flow *flow) { unsigned int i; if (!packet-app_len) { return (LI_UNKNOWN); } for (i = 0; i lengthof(apps); ++i) { if ((apps[i].p1 == flow-type) || (apps[i].p2 == flow-type)) { if (apps[i].function(packet, flow)) { return (apps[i].number); } } } /* * Set 'undefined' right away. Only one chance for * each side of the flow. This makes it easier for * a rules engine to do negation of policies. */ return (LI_UNDEFINED); } apps is an array of all of the available application functions. It looks something like this: static const struct li_apps apps[] = { LI_LIST_APP(LI_PPTP, pptp, IPPROTO_TCP, IPPROTO_GRE), LI_LIST_APP(LI_HTTP, http, IPPROTO_TCP, IPPROTO_MAX), /* more stuff here */ }; Really, that's all there is to it. So another example might be: pass proto tcp app $someapp divert-packet $someproxy, with $someproxy handling the second stage? Yes, that looks reasonable. proto tcp may be zapped as well. If we are talking use cases the biggest ones would be traffic shaping and policy enforcement in general (no SMTP to the outside, blocking non-TLS stuff on port 443, etc.) Yes, this is clearly a less messy approaach than opendpi ;) I probably shouldn't say I worked for these guys a few years ago. Nobody would believe me I never touched the DPI code, but it's the truth! Franco
DPI for pf(4)
Hi misc@, so I have been working on a BSD licensed DPI engine. It's a very lightweight, non-intrusive approach and I know that teasers are boring, but I'd like to know if it's worth the time to work on inclusion for pf(4). So far I have about 25 supported applications and the necessary hooks for the pf.conf(5) parts. The idea is first packet on each side only, no content extraction. It's not meant to be completely accurate, but it might be a good addition to the feature set of pf(4) nonetheless. I have two blog posts with code, and more coming if anyone is interested. Regards, Franco
Re: DPI for pf(4)
Franco Fichtner slashy83 at gmail.com writes: so I have been working on a BSD licensed DPI engine. It's a very lightweight, non-intrusive approach and I know that teasers are boring, but I'd like to know if it's worth the time to work on inclusion for pf(4). So far I have about 25 supported applications and the necessary hooks for the pf.conf(5) parts. If DPI stands for Deep Packet Inspection, than (afaik) it was discussed before: this kind of inspection is too complex to put into a kernel. relayd already supports L7 filtering at least for http, so if something is to be improved in this area, relayd is better place, imo.
Re: DPI for pf(4)
On 2013/05/01 00:16, Franco Fichtner wrote: Hi Alexey, On Apr 30, 2013, at 11:51 PM, Alexey E. Suslikov alexey.susli...@gmail.com wrote: Franco Fichtner slashy83 at gmail.com writes: so I have been working on a BSD licensed DPI engine. It's a very lightweight, non-intrusive approach and I know that teasers are boring, but I'd like to know if it's worth the time to work on inclusion for pf(4). So far I have about 25 supported applications and the necessary hooks for the pf.conf(5) parts. If DPI stands for Deep Packet Inspection, than (afaik) it was discussed before: this kind of inspection is too complex to put into a kernel. Yes, I am proposing a lightweight approach: hard-wired regex-like code, no allocations, no reassembly or state machines. I've seen far worse things being put into Kernels and I assure you that I do refrain from putting in anything that could cause segmentation faults, sleeps, or other non-suitable behaviour. Would it be fair to describe it as a bit more complex than osfp, but not hugely so? relayd already supports L7 filtering at least for http, so if something is to be improved in this area, relayd is better place, imo. Would a protocol like BGP have a bright future in relayd(8)? I don't know enough, maybe Reyk can clear this up? L7 filtering is cute, but ipfw-classifyd isn't maintained, DPI in Linux netfilter is not hitting it off, and there really is no BSD DPI. Franky, I don't care which way to go, but I believe that pf(4) is a suitable candidate. I especially like the one- rule-to-rule-them-all approach. Adding a keyword app to pf.conf(5) seems like the simplest solution -- much like proto does deal with IP types. And talking about complexity: 1000 LOC for 25 protocols. I'm afraid it can't be simplified any more than this. What sort of protocols do you think could be reasonably handled by this approach, and what would be too complicated? There is definitely something appealing about being able to say, for example, 'block proto tcp on port 443; pass proto tcp on port 443 app tls', or 'block app ssh; pass proto tcp from somehosts to port 22 app ssh' without a bunch more complexity involved in passing across to a separate proxy (which would then need to implement its own completely separate filtering and would, I think, not really be able to integrate with things like PF tags and queue assignment)... Basically what I'm wondering if it's possible to go far enough to be useful whilst keeping the complexity down to a level which is sane and simple enough that it can be carefully audited.
Re: DPI for pf(4)
On Wed, May 01, 2013 at 00:16, Franco Fichtner wrote: Yes, I am proposing a lightweight approach: hard-wired regex-like code, no allocations, no reassembly or state machines. I've seen far worse things being put into Kernels and I assure you that I do refrain from putting in anything that could cause segmentation faults, sleeps, or other non-suitable behaviour. And talking about complexity: 1000 LOC for 25 protocols. I'm afraid it can't be simplified any more than this. Well, it's really hard to comment on code we can't see. My thoughts on the matter have always been that it would be cool to integrate bpf into pf (though other developers surely have other opinions). Then you get filtering for as many protocols as you care to write bpf matchers for.