Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Jon Siwek
On Fri, Feb 1, 2019 at 12:59 PM Vern Paxson  wrote:

> I don't see how it helps with
> deprecating existing parameters (which seems would be better served with
> some sort of  attribute),

Support for  in parameters is part of the changes.

But if we don't allow the user to immediately remove the field, they
are then stuck doing 2 changes:

Step 1: we mark a field 
Step 2: the user sees that, so they remove uses of that parameter from
their body
Step 3: we actually remove the  field
Step 4: if the user was forced to still have the  param in
their handler's param list they now have to do a second change to
remove it instead of just removing it right away

With the proposed patch, we get rid of the need for Step 4 and
decrease burden on users.

> and I don't see how it helps with
> changing the semantics of existing event parameters.

Step 1: we mark old field  and introduce a new parameter
Step 2: the users sees that.. etc, etc. same as above.

> It actually makes sense to me to support overloading for events.  Then for
> example you could have two event signatures depending on what information
> the handler was going to leverage, which would allow the event engine to
> offload work if there isn't a handler for a signature that requires extra
> computation.

I think the same kind of offloading is possible with the "parameter
subset" approach.   We know exactly what parameters are being
consumed, so we might have optimizations that don't produce a
parameter if no one consumes it.  And if no one consumes any
parameters we also don't generate the call.

If you have two different event signatures, we just get the same type
of optimization we currently do, which only optimize out the entire
call if there's no handlers, but doesn't know if individual parameters
are being consumed or not.  e.g:

http_request(a, b, c)
http_request(d, e, f)

If someone only consumes 'a' and 'e', you still have to produce both
function calls in their entirety (and also the other unused params),
but:

http_request(a, b, c, d, e, f)

You can potentially not do any work generating the unused parameters
and only have to do the one function call with 'a' and 'e'.

Technically, we can still require a matching signature and do such an
optimization by walking the AST and finding local parameter usages.  I
guess you have to do that ultimately, but it's an easy head start at
implementing such optimizations as a test/idea if we can simply see
someone isn't using a parameter because it's not in their handler
param list.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Vern Paxson
> The compelling use-case I'd say is the ability to change/deprecate
> event parameters without suddenly breaking people's code since that
> has come up many times already.

I see how it allows adding new parameters.  I don't see how it helps with
deprecating existing parameters (which seems would be better served with
some sort of  attribute), and I don't see how it helps with
changing the semantics of existing event parameters.

> Also this change only effects events and hooks, not functions.  The
> semantics are different enough that maybe we would only want
> overloading for functions anyway.

It actually makes sense to me to support overloading for events.  Then for
example you could have two event signatures depending on what information
the handler was going to leverage, which would allow the event engine to
offload work if there isn't a handler for a signature that requires extra
computation.

> Hooks and events have multiple implementations/bodies that are defined
> by the *user*.  The *author* is generally the one the generates
> (calls) the event/hook.

The big exception being the event engine (if I follow what you mean by
user/author).

> So if the event/hook name were overloaded, it's a bit confusing -- the
> user now has to decide between different signatures to handle, each
> containing different data sets and maybe neither contains the set they
> want (so now they handle two events of the same name instead of one).

Not really seeing this.  I'm picturing that a common idiom will be a
lightweight version of an event and a heavyweight version, or maybe a
spectrum from light-to-heavy.

> Really, an event is a unique name with some amount of data
> (parameters) associated with it and may always be generated with the
> full data set -- the user then chooses which data they are interested
> in by defining that explicitly in the handler's parameter list.

Yeah, I agree that this too would allow (most of) the sort of offloading
I sketch above.

Vern
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Seth Hall



On 1 Feb 2019, at 11:24, Robin Sommer wrote:

> It's a nice a idea to relax parameter passing to work by name, and
> allow subsets. However, I can't quite get myself to really like it in
> this form, because it *looks* like an error to not have matching
> argument lists. Is there some syntax that would make it more clear
> what's going on?

I think the change to using names does make things a bit more confusing 
for users, but it opens the door for us to greatly improve reliability 
of scripts in the long term and generally it feels like a nice way for 
analyzer authors to deprecate functionality without needing to create 
all new events.  In my opinion even though there are hairy side effects 
to this I think it's a net positive.  It would be great to get case 
sensitive versions of dns events and the http header event.  That has 
been a very long standing deficit.

I guess if there is some more obvious way to do it could make sense, but 
I haven't been able to come up with anything after thinking about this 
for quite a while.

   .Seth

--
Seth Hall * Corelight, Inc * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Jon Siwek
On Fri, Feb 1, 2019 at 10:24 AM Robin Sommer  wrote:

> On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote:
>
> > > global my_event: event(a: count, b: string);
> > > event my_event(b: string)
> > > { print "my_event", b; }
>
> it *looks* like an error to not have matching
> argument lists. Is there some syntax that would make it more clear
> what's going on?

Not sure.  If the syntax were different, that introduces a "one more
thing to remember" issue, so I might prefer consistency with other
function-like constructs.

Any other language we know that has multi-body functions we can
reference for ideas?

Did it look like an error in the sense of the user making a mistake or
in the sense of traditional way functions in other languages like
C/C++ require matching signatures?

In the former, I think the semantics/intentions are actually clearer
than before: the user didn't list a parameter because they don't care
about it, so why make them.  I know what event they want because they
use unique names and the parameters they listed do map in a valid way.

On the traditional side of things, overloading seems it's maybe a
legit reason for requiring matching signatures, but I also explained
why I think overloading wouldn't make sense in the context of events.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Robin Sommer



On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote:

> > global my_event: event(a: count, b: string);
> > event my_event(b: string)
> > { print "my_event", b; }

> Is there a compelling use-case that's motivating this change?

I'm sure the main use case is changing an existing event's parameters
without breaking existing scripts -- someting we've been increasingly
running into as a major challenge.

It's a nice a idea to relax parameter passing to work by name, and
allow subsets. However, I can't quite get myself to really like it in
this form, because it *looks* like an error to not have matching
argument lists. Is there some syntax that would make it more clear 
what's going on?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Jon Siwek
On Thu, Jan 31, 2019 at 6:29 PM Vern Paxson  wrote:

> > * user doesn't care about parameter 'a', so they shouldn't have to list it
> > * it makes it easier for to deprecate/change event parameters
>
> This seems like a pretty niche pair of benefits.  Is there a compelling
> use-case that's motivating this change?

The compelling use-case I'd say is the ability to change/deprecate
event parameters without suddenly breaking people's code since that
has come up many times already.  I briefly skimmed NEWS for just the
last 2.6 release and count 5 times we broke an event signature where
this patch would have helped.

I think there's also some other higher-profile changes to event args
we haven't moved forward with because we didn't want to break user
code that this would help with.  Old example from unresolved ticket:

https://bro-tracker.atlassian.net/browse/BIT-1431

> One thing I initially wondered was whether this was going to tie our hands
> in the future if we want to introduce C++-style overloading.  However, it
> looks like you've implemented this based on matching the names in the
> declaration rather than the types, so that should be okay.

Also this change only effects events and hooks, not functions.  The
semantics are different enough that maybe we would only want
overloading for functions anyway.

That is, functions have a single, fixed implementation/body that is
defined by the *author*, so you may want to re-use the same name for
something implemented in different ways.  The *user* is the one that
calls the function.

Hooks and events have multiple implementations/bodies that are defined
by the *user*.  The *author* is generally the one the generates
(calls) the event/hook.

So if the event/hook name were overloaded, it's a bit confusing -- the
user now has to decide between different signatures to handle, each
containing different data sets and maybe neither contains the set they
want (so now they handle two events of the same name instead of one).
Really, an event is a unique name with some amount of data
(parameters) associated with it and may always be generated with the
full data set -- the user then chooses which data they are interested
in by defining that explicitly in the handler's parameter list.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev