Re: DPI for pf(4)

2013-06-10 Thread Franco Fichtner
Hi all,

adhering to the basic rule of not reinventing the wheel has sort of
crippled the efforts to come up with an elegant solution for the
topic at hand.  Two approaches have been proposed earlier, so let's
go through them:

(1) Diverting traffic to userspace

That's generally a good idea, but defeats the purpose of having
zero-latency functionality in pf(4) itself, because going through
the scheduler isn't optimal (scheduler people, don't hate me).
Worse still, the way TCP incorporates handshakes makes loosely-
coupled DPI worthless, because the divert cannot happen before
the payload is seen.  The only way around this is not diverting
at all -- that can only happen with a pf(4) that's completely
contained in userspace.  I understand the requirement of not doing
anything reckless in the kernel and I don't think it's a wise
decision to try it anyway.  Remember that the goal was to keep
consistency and utilise the base functionality in the firewall
code itself.

(2) bpf(4)-based filters

The BPF-VM is neat and the idea of its filters in accordance with
the current requirements for the proposed code.  However, the amount
of work and infrastructure to be built around bpf(4) to avoid any kind
of unwanted complexity inside the DPI code is -- at least for me -- not
feasible.

Instead, the route to take at this point is a userspace library, which
can grow, try different things, stumble, explode, adapt, and some day
may even be the base of a firewall away from the restriction of the kernel.
Others can still implement (1).  I don't think (2) will be of much interest
in real world applications.  Feel free to contact me on and off-list if you
have any further questions.  :)


Thank you all for your participation,

Franco



Re: DPI for pf(4)

2013-05-02 Thread Damien Miller
On Wed, 1 May 2013, Franco Fichtner wrote:

 Not sure if that's a fitting comparison; and I know too little OSPF
 to answer.  Let me try another route.  The logic consists of an array
 of application detection functions, which can be invoked via their
 respective IP types.

I don't like this approach at all - it leads to a proliferation (as
demonstrated by your already long list) of kernel-side parsers that
will be a maintenance, and possibly security, nightmare.

The last thing we want it a rotting pile of protocol parsing code like
wireshark.

 On May 1, 2013, at 1:14 AM, Ted Unangst t...@tedunangst.com wrote:

  My thoughts on the matter have always been that it would be cool to
  integrate bpf into pf (though other developers surely have other
  opinions). Then you get filtering for as many protocols as you care to
  write bpf matchers for.
 
 You mean externalising the DPI?  People[1] have tried to work on such
 ideas, but the general drift is that there are not enough interested
 individuals in the field to drive second tier development for
 application detections.

So if there is not enough interest to develop app protocol detectors/
disectors in bpf then why should C be any different?

 I find C to be quite flexible and empowering
 if one doesn't overcomplicate[2].

 [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c

That's complicated and scary code for a kernel, e.g. multiple opportunities
for unsigned overflow that don't seem to be checked for.

I agree with tedu - it safer and more flexible to write parsers in bpf.
If that isn't desirable then maybe we could consider some other automata
classifier, but I think it is a bad, bad idea to do it in C.

-d



Re: DPI for pf(4)

2013-05-02 Thread Franco Fichtner
Hi Damien,

On May 2, 2013, at 10:03 AM, Damien Miller d...@mindrot.org wrote:

 On Wed, 1 May 2013, Franco Fichtner wrote:
 
 Not sure if that's a fitting comparison; and I know too little OSPF
 to answer.  Let me try another route.  The logic consists of an array
 of application detection functions, which can be invoked via their
 respective IP types.
 
 I don't like this approach at all - it leads to a proliferation (as
 demonstrated by your already long list) of kernel-side parsers that
 will be a maintenance, and possibly security, nightmare.

as stated before, breaking down complexity to the bare minimum is my
requirement for this to be happening at all.  You all get to be the
judges.  I'm just trying to work on something worth doing.

 The last thing we want it a rotting pile of protocol parsing code like
 wireshark.

Case closed then?  I don't know how to argue with that.

 On May 1, 2013, at 1:14 AM, Ted Unangst t...@tedunangst.com wrote:
 
 My thoughts on the matter have always been that it would be cool to
 integrate bpf into pf (though other developers surely have other
 opinions). Then you get filtering for as many protocols as you care to
 write bpf matchers for.
 
 You mean externalising the DPI?  People[1] have tried to work on such
 ideas, but the general drift is that there are not enough interested
 individuals in the field to drive second tier development for
 application detections.
 
 So if there is not enough interest to develop app protocol detectors/
 disectors in bpf then why should C be any different?

Because it takes complexity out of the system for one.  Plus, pf(4) is
at the core of OpenBSD.  There's not much noise about bpf(4) here.

 I find C to be quite flexible and empowering
 if one doesn't overcomplicate[2].
 
 [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c
 
 That's complicated and scary code for a kernel, e.g. multiple opportunities
 for unsigned overflow that don't seem to be checked for.

You are absolutely right.  And it's *not* my code, it was merely an example
of how the TLS code can be broken down to the bare minimum[1].

 I agree with tedu - it safer and more flexible to write parsers in bpf.
 If that isn't desirable then maybe we could consider some other automata
 classifier, but I think it is a bad, bad idea to do it in C.

Again, I don't know how to argue with that.  :)


Kind regards,
Franco

[1] http://marc.info/?l=openbsd-techm=136739531914555w=2



Re: DPI for pf(4)

2013-05-02 Thread Damien Miller
On Thu, 2 May 2013, Franco Fichtner wrote:

 as stated before, breaking down complexity to the bare minimum is my
 requirement for this to be happening at all.  You all get to be the
 judges.  I'm just trying to work on something worth doing.

Well, bare minimum complexity per-protocol * large_number_of_protocols =
a lot of complexity. The incentive is always going to be to add more
protocols and never retire them.

Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do
near zero-overhead DPI completely in userspace?

-d



Re: DPI for pf(4)

2013-05-02 Thread Stuart Henderson
On 2013/05/02 18:03, Damien Miller wrote:
  I find C to be quite flexible and empowering
  if one doesn't overcomplicate[2].
 
  [2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c
 
 That's complicated and scary code for a kernel, e.g. multiple opportunities
 for unsigned overflow that don't seem to be checked for.

Here Franco is giving an example of the overcomplicated code that other
dpi is using - this is not what he is proposing for PF...




Re: DPI for pf(4)

2013-05-02 Thread Alexandre Ratchov
On Thu, May 02, 2013 at 10:35:19AM +0200, Franco Fichtner wrote:
 
 as stated before, breaking down complexity to the bare minimum is my
 requirement for this to be happening at all.  You all get to be the
 judges.  I'm just trying to work on something worth doing.
 
  The last thing we want it a rotting pile of protocol parsing code like
  wireshark.
 
 Case closed then?  I don't know how to argue with that.
 

IMHO, don't ask and don't argue. If you need DPI in pf (or
whatever), write it *for you*, then use it for *your needs*. If one
day you feel it could be useful to others, share the code and
someone may like it.

Speaking of complexity, OpenBSD already has plenty of complicated
kernel code that could run in user-mode but it's in the kernel
because it was easier that way, or the author thought it's faster
that way or ports expect it to be that way.

-- Alexandre



Re: DPI for pf(4)

2013-05-02 Thread Franco Fichtner
On May 2, 2013, at 10:45 AM, Damien Miller d...@mindrot.org wrote:

 On Thu, 2 May 2013, Franco Fichtner wrote:
 
 as stated before, breaking down complexity to the bare minimum is my
 requirement for this to be happening at all.  You all get to be the
 judges.  I'm just trying to work on something worth doing.
 
 Well, bare minimum complexity per-protocol * large_number_of_protocols =
 a lot of complexity. The incentive is always going to be to add more
 protocols and never retire them.

I guess that's true for most software projects.

 Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do
 near zero-overhead DPI completely in userspace?

Wouldn't that mean pf.conf(5) syntax extensions cannot be implemented?

It's not full-blown DPI analysis for extracting all kinds of events
from a flow -- it's merely a tagging tool, and if that sits in user
space, it's really not helpful except for logging / accounting. One
could do that with a simple pcap(3) binding as well.

Stuart made a good point for divert-packet being able to pick up
applications without the need for any other information (ports,
interfaces, addresses).

I'm sorry for not being able to make it more clear at this time.
Next step for me is to write a comprehensive description. In any case,
the input on tech@ has been very helpful so far. Thanks guys!  :)


Franco



Re: DPI for pf(4)

2013-05-02 Thread Damien Miller
On Thu, 2 May 2013, Franco Fichtner wrote:

  Well, bare minimum complexity per-protocol * large_number_of_protocols =
  a lot of complexity. The incentive is always going to be to add more
  protocols and never retire them.
 
 I guess that's true for most software projects.

We try not to implement an effectively unbounded number of protocol
parsers in the kernel.

  Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do
  near zero-overhead DPI completely in userspace?
 
 Wouldn't that mean pf.conf(5) syntax extensions cannot be implemented?

It doesn't mean that - you'd just need some way for userspace to signal
information to pf. E.g add a SO_PF_TAG to set the pf tag. Then you could
use some program that used SO_BINDANY to inspect the beginning of the
session, set a pf tag using setsockopt, SO_SPLICE to avoid further need
to copy the session in userspace and control the traffic in pf using the
tagged keyword.

 It's not full-blown DPI analysis for extracting all kinds of events
 from a flow -- it's merely a tagging tool, and if that sits in user
 space, it's really not helpful except for logging / accounting. One
 could do that with a simple pcap(3) binding as well.

Why not do the tagging in userspace using the existing facilities?

-d



Re: DPI for pf(4)

2013-05-02 Thread Franco Fichtner
On May 2, 2013, at 1:23 PM, Damien Miller d...@mindrot.org wrote:

 On Thu, 2 May 2013, Franco Fichtner wrote:
 
 Well, bare minimum complexity per-protocol * large_number_of_protocols =
 a lot of complexity. The incentive is always going to be to add more
 protocols and never retire them.
 
 I guess that's true for most software projects.
 
 We try not to implement an effectively unbounded number of protocol
 parsers in the kernel.

Agreed.  Let's put a hard limit on it.  5, 10, 20, 50?

 Also, doesn't IPPROTO_DIVERT or SO_BINDANY+SO_SPLICE allow you to do
 near zero-overhead DPI completely in userspace?
 
 Wouldn't that mean pf.conf(5) syntax extensions cannot be implemented?
 
 It doesn't mean that - you'd just need some way for userspace to signal
 information to pf. E.g add a SO_PF_TAG to set the pf tag. Then you could
 use some program that used SO_BINDANY to inspect the beginning of the
 session, set a pf tag using setsockopt, SO_SPLICE to avoid further need
 to copy the session in userspace and control the traffic in pf using the
 tagged keyword.

That sounds a bit too complex as well, but would likely work.  I'll read
into this some more, thanks.

 It's not full-blown DPI analysis for extracting all kinds of events
 from a flow -- it's merely a tagging tool, and if that sits in user
 space, it's really not helpful except for logging / accounting. One
 could do that with a simple pcap(3) binding as well.
 
 Why not do the tagging in userspace using the existing facilities?

Mainly to avoid any kind of introduction of latency, buffering,
asynchronous behaviour, packet reordering, not invoking the scheduler,
avoiding cache line bouncing, and being generally prone to multithreading
issues in a perfect world where multiple CPUs could drive the networking
stack. Also not having to reimplement certain packet parsing code, state
tracking, and so on and so forth.  Look, I have written all that stuff in
user space, but redundancy and complicated architectures are not suitable
for forwarding large loads of traffic.  User space is that magical place
that can do anything, even throw off your packet throughput by invoking
a syscall to pull the current time stamp.  Moving implementations to
user space does not necessarily make them better or less of a problem.

That's my concern.  :)


Franco



Re: DPI for pf(4)

2013-05-02 Thread Damien Miller
On Thu, 2 May 2013, Franco Fichtner wrote:

 Moving implementations to user space does not necessarily make them
 better or less of a problem.

The big difference is that its possible to sandbox a userspace
implementation so that small integer overflow bugs or length checking
failures don't become arbitrary kmem reads or, worse, RCE.

-d



Re: DPI for pf(4)

2013-05-02 Thread Damien Miller
On Thu, 2 May 2013, Franco Fichtner wrote:

 OK, the implementation only pulls a couple of bytes from the packet's
 payload. It will never pull bytes that are not verified. It will never
 allocate anything. It will never test against something that's neither
 hard-coded nor available in the range of the approved payload. It will
 never return more than unsigned int with a number describing the
 actual application. It will never manipulate any input value, lest of
 all the packet itself. It will never run into endless loops. And I'll
 gladly zap everything that could still considered be a potential risk.

You've just described bpf, right down to no endless loops and the amount
of data it returns.

For a little more code that it takes to write one packet parser
(basically: loading bpf rules from pf and making the bpf_filter()'s
return value available to it) you get everything you described above and
more.

-d



Re: DPI for pf(4)

2013-05-02 Thread Franco Fichtner
On May 2, 2013, at 2:40 PM, Damien Miller d...@mindrot.org wrote:

 On Thu, 2 May 2013, Franco Fichtner wrote:
 
 Moving implementations to user space does not necessarily make them
 better or less of a problem.
 
 The big difference is that its possible to sandbox a userspace
 implementation so that small integer overflow bugs or length checking
 failures don't become arbitrary kmem reads or, worse, RCE.

OK, the implementation only pulls a couple of bytes from the packet's
payload.  It will never pull bytes that are not verified.  It will never
allocate anything.  It will never test against something that's neither
hard-coded nor available in the range of the approved payload.  It will
never return more than unsigned int with a number describing the
actual application.  It will never manipulate any input value, lest of
all the packet itself.  It will never run into endless loops. And I'll
gladly zap everything that could still considered be a potential risk.

Parsing TCP options is still more complex than what this particular DPI
code is supposed to be doing.  This comes from personal experience.  ;)

IMHO, the only issue that remains is a potentially unlimited number of
applications.  That's a strong point against the idea.


Franco



Re: DPI for pf(4)

2013-05-02 Thread Franco Fichtner
On May 2, 2013, at 3:20 PM, Damien Miller d...@mindrot.org wrote:

 On Thu, 2 May 2013, Franco Fichtner wrote:
 
 OK, the implementation only pulls a couple of bytes from the packet's
 payload. It will never pull bytes that are not verified. It will never
 allocate anything. It will never test against something that's neither
 hard-coded nor available in the range of the approved payload. It will
 never return more than unsigned int with a number describing the
 actual application. It will never manipulate any input value, lest of
 all the packet itself. It will never run into endless loops. And I'll
 gladly zap everything that could still considered be a potential risk.
 
 You've just described bpf, right down to no endless loops and the amount
 of data it returns.
 
 For a little more code that it takes to write one packet parser
 (basically: loading bpf rules from pf and making the bpf_filter()'s
 return value available to it) you get everything you described above and
 more.

I yield.  I'm working on making DPI more human-readable and maintainable,
and struct bpf_insn is not an option for me, personally.

Worse still, searching for bpf+dpi in google already brings up this mail
thread as a top ten hit, which may be a good indicator of how successful
this approach has been the last couple of years.  ;)


Franco



Re: DPI for pf(4)

2013-05-02 Thread Otto Moerbeek
fOn Thu, May 02, 2013 at 04:03:05PM +0200, Franco Fichtner wrote:

 On May 2, 2013, at 3:20 PM, Damien Miller d...@mindrot.org wrote:
 
  On Thu, 2 May 2013, Franco Fichtner wrote:
  
  OK, the implementation only pulls a couple of bytes from the packet's
  payload. It will never pull bytes that are not verified. It will never
  allocate anything. It will never test against something that's neither
  hard-coded nor available in the range of the approved payload. It will
  never return more than unsigned int with a number describing the
  actual application. It will never manipulate any input value, lest of
  all the packet itself. It will never run into endless loops. And I'll
  gladly zap everything that could still considered be a potential risk.
  
  You've just described bpf, right down to no endless loops and the amount
  of data it returns.
  
  For a little more code that it takes to write one packet parser
  (basically: loading bpf rules from pf and making the bpf_filter()'s
  return value available to it) you get everything you described above and
  more.
 
 I yield.  I'm working on making DPI more human-readable and maintainable,
 and struct bpf_insn is not an option for me, personally.

libpcap has a fairly simple parser to turn expressions into bpf instructions.
It is used by tcpdump.

 
 Worse still, searching for bpf+dpi in google already brings up this mail
 thread as a top ten hit, which may be a good indicator of how successful
 this approach has been the last couple of years.  ;)
 
 
 Franco



Re: DPI for pf(4)

2013-05-02 Thread Damien Miller
On Thu, 2 May 2013, Damien Miller wrote:

 You've just described bpf, right down to no endless loops and the amount
 of data it returns.
 
 For a little more code that it takes to write one packet parser
 (basically: loading bpf rules from pf and making the bpf_filter()'s
 return value available to it) you get everything you described above and
 more.

Actually, you could even make the bpf inspection stateful and bi-directional
if you preserved its scratch memory between packets.

-d



Re: DPI for pf(4)

2013-05-01 Thread Franco Fichtner
Hi Stuart,

On May 1, 2013, at 1:11 AM, Stuart Henderson st...@openbsd.org wrote:

 On 2013/05/01 00:16, Franco Fichtner wrote:
 
 Yes, I am proposing a lightweight approach: hard-wired regex-like
 code, no allocations, no reassembly or state machines.  I've seen
 far worse things being put into Kernels and I assure you that I do
 refrain from putting in anything that could cause segmentation
 faults, sleeps, or other non-suitable behaviour.
 
 Would it be fair to describe it as a bit more complex than osfp,
 but not hugely so?

Not sure if that's a fitting comparison; and I know too little OSPF
to answer.  Let me try another route.  The logic consists of an array
of application detection functions, which can be invoked via their
respective IP types.  There's 32 bits of external state for the
table and a single hook into the application detection.  And the
detection for TLS/SSL3.0 follows.  I have really tried to condense
it down to the bare minimum.

LI_DESCRIBE_APP(tls)
{
struct tls {
uint8_t record_type;
uint16_t version;
uint16_t data_length;
} __packed *ptr = (void *)packet-app.raw;
uint16_t decoded;

if (packet-app_len  sizeof(struct tls)) {
return (0);
}

decoded = be16dec(ptr-data_length);

if (!decoded || decoded  0x4000) {
/* no empty records possible, also = 2^14 */
return (0);
}

switch (ptr-record_type) {
case 20:/* change_cipher_spec */
case 21:/* alert */
case 22:/* handshake */
case 23:/* application_data */
break;
default:
return (0);
}

switch (be16dec(ptr-version)) {
case 0x0300:/* SSL 3.0 */
case 0x0301:/* TLS 1.0 */
case 0x0302:/* TLS 1.1 */
case 0x0303:/* TLS 1.2 */
break;
default:
return (0);
}

return (1);
}

 Would a protocol like BGP have a bright future in relayd(8)?
 I don't know enough, maybe Reyk can clear this up?
 
 L7 filtering is cute, but ipfw-classifyd isn't maintained, DPI in
 Linux netfilter is not hitting it off, and there really is no
 BSD DPI.  Franky, I don't care which way to go, but I believe
 that pf(4) is a suitable candidate.  I especially like the one-
 rule-to-rule-them-all approach.  Adding a keyword app to
 pf.conf(5) seems like the simplest solution -- much like proto
 does deal with IP types.
 
 And talking about complexity: 1000 LOC for 25 protocols.  I'm afraid
 it can't be simplified any more than this.
 
 What sort of protocols do you think could be reasonably handled by
 this approach, and what would be too complicated?

Good question!  Text protocols are easy, RFCs and open implementations
are generally easy.  Anything too commercial/proprietary, especially
in binary, is more guessing than anything else and may not be worth
the effort.  I don't see world of warcraft happening as a supported
application.  This is what I have done so far (by no means free of
errors, though):

-- BitTorrent
-- Gnutella
-- Network Basic Input Output System
-- Telecommunication Network
-- Hypertext Transfer Protocol
-- Post Office Protocol (Version 3)
-- Internet Message Access Protocol
-- Simple Mail Transfer Protocol
-- Session Traversal Utilities for NAT
-- Dynamic Host Configuration Protocol
-- Point-to-Point Tunneling Protocol
-- Lightweight Directory Access Protocol
-- Simple Network Management Protocol
-- Secure Shell
-- File Transfer Protocol
-- Session Initiation Protocol
-- Domain Name System
-- Real-time Transport Control Protocol
-- Real-time Transport Protocol
-- Routing Information Protocol
-- Boarder Gateway Protocol
-- Internet Key Exchange
-- Datagram Transport Layer Security
-- Transport Layer Security
-- Concurrent Versions System

 There is definitely something appealing about being able to say, for
 example, 'block proto tcp on port 443; pass proto tcp on port 443 app tls',
 or 'block app ssh; pass proto tcp from somehosts to port 22 app ssh'
 without a bunch more complexity involved in passing across to a separate
 proxy (which would then need to implement its own completely separate
 filtering and would, I think, not really be able to integrate with
 things like PF tags and queue assignment)...

Yes, that would be one scenario.  I like to think of lightweight packet
inspection as application tagging.  That's the first stage.  Second
stage is a real parser/proxy/endpoint.  It's not a security functionality
per se, but it can help to break down the workload.  It doesn't care
aboute IP versions, ports (mostly ;) ), different flavours (netbios
could be session, datagram, and name service as one for example), and so
forth.

 Basically what I'm wondering if it's possible to go far enough to be
 useful whilst keeping the complexity down to a level which is sane
 and 

Re: DPI for pf(4)

2013-05-01 Thread Franco Fichtner
Hi Ted,

On May 1, 2013, at 1:14 AM, Ted Unangst t...@tedunangst.com wrote:

 On Wed, May 01, 2013 at 00:16, Franco Fichtner wrote:
 Yes, I am proposing a lightweight approach: hard-wired regex-like
 code, no allocations, no reassembly or state machines.  I've seen
 far worse things being put into Kernels and I assure you that I do
 refrain from putting in anything that could cause segmentation
 faults, sleeps, or other non-suitable behaviour.
 
 And talking about complexity: 1000 LOC for 25 protocols.  I'm afraid
 it can't be simplified any more than this.
 
 Well, it's really hard to comment on code we can't see.

I understand.  The code is hooked up to a library feeding off of
recorded network traces at the moment.  The idea doesn't feel mature
enough to me at this time, not knowing where to put it.  So there's
no point in releasing a half-done code blob that does nothing on its
own, but I'm willing to share it off-list with OpenBSD developers.

 My thoughts on the matter have always been that it would be cool to
 integrate bpf into pf (though other developers surely have other
 opinions). Then you get filtering for as many protocols as you care to
 write bpf matchers for.

You mean externalising the DPI?  People[1] have tried to work on such
ideas, but the general drift is that there are not enough interested
individuals in the field to drive second tier development for
application detections.  I find C to be quite flexible and empowering
if one doesn't overcomplicate[2].


Franco

[1] https://code.google.com/p/appid/source/browse/trunk/apps/aim
[2] https://github.com/fichtner/OpenDPI/blob/master/src/lib/protocols/ssl.c



Re: DPI for pf(4)

2013-05-01 Thread Stuart Henderson
On 2013/05/01 09:01, Franco Fichtner wrote:
 Hi Stuart,
 
 On May 1, 2013, at 1:11 AM, Stuart Henderson st...@openbsd.org wrote:
 
  On 2013/05/01 00:16, Franco Fichtner wrote:
  
  Yes, I am proposing a lightweight approach: hard-wired regex-like
  code, no allocations, no reassembly or state machines.  I've seen
  far worse things being put into Kernels and I assure you that I do
  refrain from putting in anything that could cause segmentation
  faults, sleeps, or other non-suitable behaviour.
  
  Would it be fair to describe it as a bit more complex than osfp,
  but not hugely so?
 
 Not sure if that's a fitting comparison; and I know too little OSPF
 to answer.

I should have expanded the acronum to make it clear - osfp i.e. the
OS fingerprinting code (pf_osfp.c).

 Let me try another route.  The logic consists of an array
 of application detection functions, which can be invoked via their
 respective IP types.  There's 32 bits of external state for the
 table and a single hook into the application detection.  And the
 detection for TLS/SSL3.0 follows.  I have really tried to condense
 it down to the bare minimum.
 
 LI_DESCRIBE_APP(tls)
 {
 struct tls {
 uint8_t record_type;
 uint16_t version;
 uint16_t data_length;
 } __packed *ptr = (void *)packet-app.raw;
 uint16_t decoded;
 
 if (packet-app_len  sizeof(struct tls)) {
 return (0);
 }
 
 decoded = be16dec(ptr-data_length);
 
 if (!decoded || decoded  0x4000) {
 /* no empty records possible, also = 2^14 */
 return (0);
 }
 
 switch (ptr-record_type) {
 case 20:/* change_cipher_spec */
 case 21:/* alert */
 case 22:/* handshake */
 case 23:/* application_data */
 break;
 default:
 return (0);
 }
 
 switch (be16dec(ptr-version)) {
 case 0x0300:/* SSL 3.0 */
 case 0x0301:/* TLS 1.0 */
 case 0x0302:/* TLS 1.1 */
 case 0x0303:/* TLS 1.2 */
 break;
 default:
 return (0);
 }
 
 return (1);
 }

This type of thing looks sane to me, but others will want to comment. (I'll
point others at your posts at http://lastsummer.de/category/technology/
too :-)

  Would a protocol like BGP have a bright future in relayd(8)?
  I don't know enough, maybe Reyk can clear this up?
  
  L7 filtering is cute, but ipfw-classifyd isn't maintained, DPI in
  Linux netfilter is not hitting it off, and there really is no
  BSD DPI.  Franky, I don't care which way to go, but I believe
  that pf(4) is a suitable candidate.  I especially like the one-
  rule-to-rule-them-all approach.  Adding a keyword app to
  pf.conf(5) seems like the simplest solution -- much like proto
  does deal with IP types.
  
  And talking about complexity: 1000 LOC for 25 protocols.  I'm afraid
  it can't be simplified any more than this.
  
  What sort of protocols do you think could be reasonably handled by
  this approach, and what would be too complicated?
 
 Good question!  Text protocols are easy, RFCs and open implementations
 are generally easy.  Anything too commercial/proprietary, especially
 in binary, is more guessing than anything else and may not be worth
 the effort.  I don't see world of warcraft happening as a supported
 application.  This is what I have done so far (by no means free of
 errors, though):
 
 -- BitTorrent
 -- Gnutella
 -- Network Basic Input Output System
 -- Telecommunication Network
 -- Hypertext Transfer Protocol
 -- Post Office Protocol (Version 3)
 -- Internet Message Access Protocol
 -- Simple Mail Transfer Protocol
 -- Session Traversal Utilities for NAT
 -- Dynamic Host Configuration Protocol
 -- Point-to-Point Tunneling Protocol
 -- Lightweight Directory Access Protocol
 -- Simple Network Management Protocol
 -- Secure Shell
 -- File Transfer Protocol
 -- Session Initiation Protocol
 -- Domain Name System
 -- Real-time Transport Control Protocol
 -- Real-time Transport Protocol
 -- Routing Information Protocol
 -- Boarder Gateway Protocol
 -- Internet Key Exchange
 -- Datagram Transport Layer Security
 -- Transport Layer Security
 -- Concurrent Versions System
 
  There is definitely something appealing about being able to say, for
  example, 'block proto tcp on port 443; pass proto tcp on port 443 app tls',
  or 'block app ssh; pass proto tcp from somehosts to port 22 app ssh'
  without a bunch more complexity involved in passing across to a separate
  proxy (which would then need to implement its own completely separate
  filtering and would, I think, not really be able to integrate with
  things like PF tags and queue assignment)...
 
 Yes, that would be one scenario.  I like to think of lightweight packet
 inspection as application tagging.  That's the first stage.  Second
 

Re: DPI for pf(4)

2013-05-01 Thread Jiri B
On Tue, Apr 30, 2013 at 07:14:50PM -0400, Ted Unangst wrote:
 On Wed, May 01, 2013 at 00:16, Franco Fichtner wrote:
  Yes, I am proposing a lightweight approach: hard-wired regex-like
  code, no allocations, no reassembly or state machines.  I've seen
  far worse things being put into Kernels and I assure you that I do
  refrain from putting in anything that could cause segmentation
  faults, sleeps, or other non-suitable behaviour.
 
  And talking about complexity: 1000 LOC for 25 protocols.  I'm afraid
  it can't be simplified any more than this.
 
 Well, it's really hard to comment on code we can't see.
 
 My thoughts on the matter have always been that it would be cool to
 integrate bpf into pf (though other developers surely have other
 opinions). Then you get filtering for as many protocols as you care to
 write bpf matchers for.

My first thought was why not to have something like squid does (ICAP)
you can forward some inspection to other app and it would return
you some agreed data (tag) and then you could work with then in
pf rules... ???



Re: DPI for pf(4)

2013-05-01 Thread Franco Fichtner
On May 1, 2013, at 9:41 AM, Stuart Henderson st...@openbsd.org wrote:

 I should have expanded the acronum to make it clear - osfp i.e. the
 OS fingerprinting code (pf_osfp.c).

oh, sorry, my mistake.  This I can comment on. :)

The idea is the same.  I'd say at this stage osfp has more complexity
due to parsing the TCP header, splitting fields, pulling in external
descriptions, etc.  Looking beyond the headers is far less structured,
because applications do the structuring on their own, which in turn
makes external descriptions hard to, er, describe -- hence the hard-
wired C approach.  The only complexity is the growing amount of
application descriptions, but each application function is completely
isolated.

Here's the DPI hook function (a bit simplified for the context of this
discussion):

li_get(const struct li_packet *packet, const struct li_flow *flow)
{
unsigned int i;

if (!packet-app_len) {
return (LI_UNKNOWN);
}

for (i = 0; i  lengthof(apps); ++i) {
if ((apps[i].p1 == flow-type) ||
(apps[i].p2 == flow-type)) {
if (apps[i].function(packet, flow)) {
return (apps[i].number);
}
}
}

/*
 * Set 'undefined' right away. Only one chance for
 * each side of the flow. This makes it easier for
 * a rules engine to do negation of policies.
 */
return (LI_UNDEFINED);
}

apps is an array of all of the available application functions. It looks
something like this:

static const struct li_apps apps[] = {
LI_LIST_APP(LI_PPTP, pptp, IPPROTO_TCP, IPPROTO_GRE),
LI_LIST_APP(LI_HTTP, http, IPPROTO_TCP, IPPROTO_MAX),
/* more stuff here */
};

Really, that's all there is to it.

 So another example might be: pass proto tcp app $someapp divert-packet
 $someproxy, with $someproxy handling the second stage?

Yes, that looks reasonable.  proto tcp may be zapped as well.
If we are talking use cases the biggest ones would be traffic shaping
and policy enforcement in general (no SMTP to the outside, blocking
non-TLS stuff on port 443, etc.)

 Yes, this is clearly a less messy approaach than opendpi ;)

I probably shouldn't say I worked for these guys a few years ago.
Nobody would believe me I never touched the DPI code, but it's the
truth!


Franco



Re: DPI for pf(4)

2013-04-30 Thread Alexey E. Suslikov
Franco Fichtner slashy83 at gmail.com writes:

 so I have been working on a BSD licensed DPI engine.  It's a
 very lightweight, non-intrusive approach and I know that teasers
 are boring, but I'd like to know if it's worth the time to
 work on inclusion for pf(4).  So far I have about 25 supported
 applications and the necessary hooks for the pf.conf(5) parts.

If DPI stands for Deep Packet Inspection, than (afaik)
it was discussed before: this kind of inspection is too
complex to put into a kernel.

relayd already supports L7 filtering at least for http,
so if something is to be improved in this area, relayd
is better place, imo.



Re: DPI for pf(4)

2013-04-30 Thread Stuart Henderson
On 2013/05/01 00:16, Franco Fichtner wrote:
 Hi Alexey,
 
 On Apr 30, 2013, at 11:51 PM, Alexey E. Suslikov 
 alexey.susli...@gmail.com wrote:
 
  Franco Fichtner slashy83 at gmail.com writes:
  
  so I have been working on a BSD licensed DPI engine.  It's a
  very lightweight, non-intrusive approach and I know that teasers
  are boring, but I'd like to know if it's worth the time to
  work on inclusion for pf(4).  So far I have about 25 supported
  applications and the necessary hooks for the pf.conf(5) parts.
  
  If DPI stands for Deep Packet Inspection, than (afaik)
  it was discussed before: this kind of inspection is too
  complex to put into a kernel.
 
 Yes, I am proposing a lightweight approach: hard-wired regex-like
 code, no allocations, no reassembly or state machines.  I've seen
 far worse things being put into Kernels and I assure you that I do
 refrain from putting in anything that could cause segmentation
 faults, sleeps, or other non-suitable behaviour.

Would it be fair to describe it as a bit more complex than osfp,
but not hugely so?

  relayd already supports L7 filtering at least for http,
  so if something is to be improved in this area, relayd
  is better place, imo.
 
 Would a protocol like BGP have a bright future in relayd(8)?
 I don't know enough, maybe Reyk can clear this up?
 
 L7 filtering is cute, but ipfw-classifyd isn't maintained, DPI in
 Linux netfilter is not hitting it off, and there really is no
 BSD DPI.  Franky, I don't care which way to go, but I believe
 that pf(4) is a suitable candidate.  I especially like the one-
 rule-to-rule-them-all approach.  Adding a keyword app to
 pf.conf(5) seems like the simplest solution -- much like proto
 does deal with IP types.
 
 And talking about complexity: 1000 LOC for 25 protocols.  I'm afraid
 it can't be simplified any more than this.

What sort of protocols do you think could be reasonably handled by
this approach, and what would be too complicated?

There is definitely something appealing about being able to say, for
example, 'block proto tcp on port 443; pass proto tcp on port 443 app tls',
or 'block app ssh; pass proto tcp from somehosts to port 22 app ssh'
without a bunch more complexity involved in passing across to a separate
proxy (which would then need to implement its own completely separate
filtering and would, I think, not really be able to integrate with
things like PF tags and queue assignment)...

Basically what I'm wondering if it's possible to go far enough to be
useful whilst keeping the complexity down to a level which is sane
and simple enough that it can be carefully audited.



Re: DPI for pf(4)

2013-04-30 Thread Ted Unangst
On Wed, May 01, 2013 at 00:16, Franco Fichtner wrote:
 Yes, I am proposing a lightweight approach: hard-wired regex-like
 code, no allocations, no reassembly or state machines.  I've seen
 far worse things being put into Kernels and I assure you that I do
 refrain from putting in anything that could cause segmentation
 faults, sleeps, or other non-suitable behaviour.

 And talking about complexity: 1000 LOC for 25 protocols.  I'm afraid
 it can't be simplified any more than this.

Well, it's really hard to comment on code we can't see.

My thoughts on the matter have always been that it would be cool to
integrate bpf into pf (though other developers surely have other
opinions). Then you get filtering for as many protocols as you care to
write bpf matchers for.