Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-28 Thread Robin Sommer
On Thu, Feb 28, 2019 at 11:35 +0100, Jan Grashöfer wrote:

> The question here would be whether LL-analyzers have to be linked
> dynamically.

Well, the point of the plugin API is being able to add new
functionality externally through an independently compiled shared
library. Excluding link-layer analyzers from that would feel like a
gap to me. That said, we definitely need to benchmark performance to
make sure it's feasible. My hunch is that a lookup table should be
good enough, but we'll see.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-28 Thread Jan Grashöfer
On 27/02/2019 20:40, Robin Sommer wrote:
>> One question here would be whether it makes sense to assume that the set of
>> LL-analyzers tash should be available is known at compile-time?
> 
> The built-in ones can be known, but any added through dynamic plugins
> can't really. We'll know only at runtime what the final set is. But we
> could precompute a lookup table in advance at startup that maps link
> types to analyzers.

The question here would be whether LL-analyzers have to be linked 
dynamically. Another option would be to require users to build Zeek if 
they need additional LL-analyzers. The analyzers would still be modular 
but using some meta programming one might be able to generate efficient 
dispatching code at compile-time. If the focus is on performance we 
could benchmark both approaches and decide based on the results.

Jan
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-27 Thread Robin Sommer



On Wed, Feb 27, 2019 at 16:07 +0100, Jan Grashöfer wrote:

> At first glance it looks like IP-layer multiplexing is done in
> NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled
> in Manager::BuildInitialAnalyzerTree in context of initializing a
> connection.

Well, there, too. :) That's indeed doing the packet dispatching, while
DoNextPacket() sets up state mgmt. It's all not quite clear cut, which
is part of the problem.

> That is the central point. So a first step would be to rely on TCP/IP in the
> "middle" of the stack but allow pluggable Link-layer protocols. Those might
> feed their data to the TCP/IP pipeline or handle them on their own. The next
> step would be the IP-layer.

Yeah, that sounds good to me.

> One question here would be whether it makes sense to assume that the set of
> LL-analyzers tash should be available is known at compile-time?

The built-in ones can be known, but any added through dynamic plugins
can't really. We'll know only at runtime what the final set is. But we
could precompute a lookup table in advance at startup that maps link
types to analyzers.

> I think this would be part of the larger effort to re-think Zeek's notion of
> connections. This could be addressed together with implementing a flexible
> mechanism to make meta data like LL-addresses available in context of a
> connection.

Yep.

> In case we allow to plug in new transport protocols, they might need
> their own PIA to support the analysis of known protocols like HTTP
> etc.

Yeah, or a more generic PIA that provides its own hook for plugins.
The main difference between TCP/UDP PIAs is packet vs stream
semantics, iirc. That might generalize sufficiently, but not sure.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-27 Thread Jan Grashöfer
On 26/02/2019 02:36, Robin Sommer wrote:
> I see three pieces here overall that I think can be tackled
> independently:
> 
> (1) Link-layer: Currently hardcoded in Packet::ProcessLayer2()
> 
> (2) IP-Layer: Currently hardcoded in NetSessions::NextPacket()
> 
> (3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket().

At first glance it looks like IP-layer multiplexing is done in 
NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is 
tackled in Manager::BuildInitialAnalyzerTree in context of initializing 
a connection.

> Case (1) is all about skipping the header to get to IP. There's some
> redundancy across cases, though, and MPLS makes it all more messy.

One thing that comes to my mind here is whether it might be possible to 
pass information such as VLAN tags, MPLS labels or link layer addresses 
to upper layers in a generic way without hardcoding. However, that might 
be out of scope for now.

> With (2), a plugin would be able to add support for non-IP protocols.
> However, due to Bro generally assuming that it is analyzing IP, the
> plugin would either need to take care of such packets completely (like
> ARP does), or eventually get to an IP packet that it can then feed
> back for further analysis (like if it some kind of a tunnel).

The non-IP packet might also contain a Transport-layer PDU. I guess it 
should be possible to pass these on as well.

> There's also a more general version of (2) and (3) where we'd remove
> Bro's assumption of analyzing TCP/IP protocols. But that's a separate,
> large effort by itself.

That is the central point. So a first step would be to rely on TCP/IP in 
the "middle" of the stack but allow pluggable Link-layer protocols. 
Those might feed their data to the TCP/IP pipeline or handle them on 
their own. The next step would be the IP-layer.

> On a technical level, plugging in such low-level analyzers needs to be
> very efficient, in particular if we move the currently hardcoded cases
> into the plugins as well (as I think we should; similar to how
> application-layer analyzers have all moved into internal plugins).
> Then the lookup-the-analyzer-and-dispatch operation will happen
> multiple times for every packet.

One question here would be whether it makes sense to assume that the set 
of LL-analyzers tash should be available is known at compile-time?

>> - What about the concept of connections? For some LL protocols the
>> concept might be counterintuitive.
> 
> Couple cases there:
> 
> - If there's really no sense of a connection, then the plugin will
>need to take complete care of the packets, as the rest of Bro
>assumes connection-semantics.

Maybe there is another general abstraction that is worth to be supported 
as well. I was thinking of request-reply-pairs that can be correlated. 
However, I haven't put much thought into this, yet.

> - If it's just the definition of what defines a connection that is
>different, then I think we could make that more flexible. I've been
>hoping for a while that we can make Bro's notion of connection IDs
>dynamic, so that it's not necessarily just the 5-tuple. There are
>use cases outside of new protocols for this, too. For example, one
>could include the VLAN ID to deal with overlapping IP ranges in
>independent VLANs.

I think this would be part of the larger effort to re-think Zeek's 
notion of connections. This could be addressed together with 
implementing a flexible mechanism to make meta data like LL-addresses 
available in context of a connection.

>> - The interface should support to pass payload to other analyzers. Does
>> it make sense to come up with a generalized DPD-mechanism?
> 
> Not quite sure what you're thinking here, but I believe that fully
> solving this would require addressing Bro's overall assumption of
> analyzing TCP/IP. For now, maybe the best way would be just having the
> analyzer call back into entry points corresponding to the various
> layers where analysis would then proceed as normal. I.e., some
> variation of: ProcessLinkLayer(...), ProcessIP(...),
> ProcessTransport(data), ProcessAppLayer(...). The caller would be
> responsible for providing all the right (meta-)data, like IP headers.
> Were you thinking something different / more general?

While I haven't looked into it, I noticed that there are distinct PIA 
implementations for TCP and UDP. In case we allow to plug in new 
transport protocols, they might need their own PIA to support the 
analysis of known protocols like HTTP etc. However, if we keep a focus 
on TCP/IP as suggested that would be out of scope for now.

Jan
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-25 Thread Robin Sommer
(I realized this slipped through the cracks, sorry for the late
feedback, hope it still helps)

On Thu, Feb 07, 2019 at 11:32 +0100, Jan Grashöfer wrote:

> - What would be the lowest layer to built up on or should everything be 
> pluggable down to the packet source?

I see three pieces here overall that I think can be tackled
independently:

(1) Link-layer: Currently hardcoded in Packet::ProcessLayer2()

(2) IP-Layer: Currently hardcoded in NetSessions::NextPacket()

(3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket().


Case (1) is all about skipping the header to get to IP. There's some
redundancy across cases, though, and MPLS makes it all more messy.

With (2), a plugin would be able to add support for non-IP protocols.
However, due to Bro generally assuming that it is analyzing IP, the
plugin would either need to take care of such packets completely (like
ARP does), or eventually get to an IP packet that it can then feed
back for further analysis (like if it some kind of a tunnel).

Similar for (3): A plugin would be able to add support for further
transport layer protocols, but it'd be mostly about stripping
additional headers to eventually get to TCP/UDP/ICMP.

There's also a more general version of (2) and (3) where we'd remove
Bro's assumption of analyzing TCP/IP protocols. But that's a separate,
large effort by itself.

On a technical level, plugging in such low-level analyzers needs to be
very efficient, in particular if we move the currently hardcoded cases
into the plugins as well (as I think we should; similar to how
application-layer analyzers have all moved into internal plugins).
Then the lookup-the-analyzer-and-dispatch operation will happen
multiple times for every packet.

> - What about the concept of connections? For some LL protocols the 
> concept might be counterintuitive.

Couple cases there:

- If there's really no sense of a connection, then the plugin will
  need to take complete care of the packets, as the rest of Bro
  assumes connection-semantics.

- If it's just the definition of what defines a connection that is
  different, then I think we could make that more flexible. I've been
  hoping for a while that we can make Bro's notion of connection IDs
  dynamic, so that it's not necessarily just the 5-tuple. There are
  use cases outside of new protocols for this, too. For example, one
  could include the VLAN ID to deal with overlapping IP ranges in
  independent VLANs.

> - The interface should support to pass payload to other analyzers. Does 
> it make sense to come up with a generalized DPD-mechanism?

Not quite sure what you're thinking here, but I believe that fully
solving this would require addressing Bro's overall assumption of
analyzing TCP/IP. For now, maybe the best way would be just having the
analyzer call back into entry points corresponding to the various
layers where analysis would then proceed as normal. I.e., some
variation of: ProcessLinkLayer(...), ProcessIP(...),
ProcessTransport(data), ProcessAppLayer(...). The caller would be
responsible for providing all the right (meta-)data, like IP headers.
Were you thinking something different / more general?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-07 Thread Jan Grashöfer
To add a bit more context: The idea is to implement a plugin interface 
for low-level analyzers (see https://github.com/zeek/zeek/issues/248) 
and collect requirements on the list.

Some first thoughts and questions:
- What would be the lowest layer to built up on or should everything be 
pluggable down to the packet source?
- What about the concept of connections? For some LL protocols the 
concept might be counterintuitive.
- The interface should support to pass payload to other analyzers. Does 
it make sense to come up with a generalized DPD-mechanism?

Jan
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev