Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-20 Thread Jon Siwek
On Tue, Feb 19, 2019 at 10:55 PM Robin Sommer  wrote:
>
> On Tue, Feb 19, 2019 at 08:28 -0600, Jonathan Siwek wrote:
>
> > we pick points in time as an "event" and associate various bits of
> > data with them, then users declare their interest in some subset of
> > that data. Most natural way for them to do that seems to be by naming
> > each bit of data.
>
> I agree with all of that in principle, but for me it would need a
> different syntax to become practical. The current syntax for events
> implies certain semantics just by being similar to so many other
> languages that don't do named-based matching for parameters.

Granted, but not sure how problematic that is in reality.  E.g. you see this:

event foo(version: string, method: string);

event foo(method: string) { ... }

Seems intuitive that one understands the "method" parameter means the
same thing in both the definition and handler.  This is a fundamental
reason why, in human's general use of language, we name things in the
first place: things that share a common name share a common meaning.
Or what else is there to be confused about with name-based param
matching ?

A side point is that maybe some other languages are just following the
crowd and actually would benefit from name-based param matching in
some cases.  Maybe they just haven't thought about it (I have, that's
why I'm still trying to hang on to this :)).  Doesn't mean we
shouldn't evaluate Pros/Cons of breaking apart from the crowd.

I haven't seen anything so far that was a convincing Con to me, but
always possible I've missed some way name-based param matching could
conflict with an existing or future feature of the language -- those
would be particularly convincing Cons for me.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-20 Thread Jon Siwek
On Tue, Feb 19, 2019 at 1:27 PM Vern Paxson  wrote:

> > Which particular user errors were a concern?
>
> The main one being when there's an API change that's *not* backward
> compatible, and the user doesn't update their scripts to be in conformance
> with it as is required.  Clearly something we'll in general strive to avoid.

That's more of a developer error than a user error -- we can make
changes that are not backward compatible regardless of what mechanism
we go with, and if we do, that's on us (and we do want to always try
to make changes that are backward compatible for at least one release
cycle).

> > What about changing
> > semantics of a parameter though?
>
> That's indeed trickier.  Here's a thought for the example you provide:
>
> event foo%(is_server: bool%) 
> event foo%(is_client: bool%);
>
> This would mean "if the handler is defined using the name is_server as the
> parameter, that's the deprecated version".  Any other declaration (that's
> for a parameter of type bool, of course) would refer to the new version.

Now we are back to matching parameters on name, so again feels more
natural to me to instead just support that generally in the first
place.

Take another concrete example we've done in the past: change parameter order.

event foo(a: bool, b: string);

Changes to:

event foo(b: string, a: bool);

If we generally support matching params by name, then all existing
event handlers continue working, no user action needed.  Otherwise we
need to  and force users to do work to update their code.

> By the way, I didn't follow whether in Vlad's example, the semantics of
> the parameter changed, or if his fix was just to give it the correct name
> to match its actual semantics.  For the latter, existing handlers are
> presumably flat-out broken, and there's no benefit in trying to continue
> to support them.

In Vlad's example, the fix was to make "is_server" actually mean "is
server", but people already wrote code based on the real meaning being
"is client", so we broke code.  (It's a useful example both to learn
from and consider how we might do things differently if we had better
deprecation procedures/mechanisms).

> One concern I have with leaning on is-the-name-a-match is not-too-implausible
> user coding along the lines of:
>
> event foo(is_server_orig: bool)
> {
> ...
> local is_server = my_twisty_logic(is_server_orig, x, y, z);
> ...
> }
>
> i.e., the parameter being renamed by the user anyway because they want to
> use the original name for a local variable.  These users will be bitten
> by changes in parameter semantics regardless of which approach we use;
> name-based matching isn't a panacea.

They wouldn't be bitten if we just don't ever change parameter
semantics like that, but use a different deprecation process like I
mentioned.  E.g. we start with:

event foo(is_server: bool);

They use:

event foo(is_server_orig: bool);

We change to:

event foo(is_server: bool , is_client: bool);

Now they get a deprecation warning because we still fall back to
matching params by order+type if there's no valid name mappings.

The case where they *could* get silently bitten by changes in
parameter semantics is if we are actually swapping parameter order,
but they happen to swap params of the same type and the user defined
custom parameter names.

The fix there is either we just don't ever do such ambiguous parameter
order swapping or we say that the user is opting into that potential
pain for the purely cosmetic reason of custom param names.  (Or we
rely purely on name-based param matching without falling back to
traditional order+type matching.  I added the fallback just because it
seems generally worthwhile to still support that if people *really*
want that, or if they are using custom names already).

> > The other problem is that if a user is skipping a version, they may
> > end up with a handler that treats "is_server" in the original way, but
> > the meaning has been changed and we only match events based on type +
> > number of parameters.  With name-based parameter matching, we can
> > catch this scenario as well.
>
> With the tweak I outline above, this would only bite them once the 
> version is removed.  That could span several releases, depending on the
> particular situation.  I don't have a lot of sympathy for users who upgrade
> across a number of releases, for which the release notes flag backward
> compatibility issues, and they don't take heed to address those.

I don't sympathize much either, but still though -- if we've got a
solution that additionally makes this a non-issue, then that's a Good.

> I find an approach that changes the fundamental type-checking used for
> event handlers to be more heavy-handed than one that provides control to
> the developer to explicitly decide when they've made a change for which
> it makes sense to support a 

Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-19 Thread Robin Sommer



On Mon, Feb 18, 2019 at 16:32 -0800, Vern Paxson wrote:

>   event foo%(a: string%) 
>   event foo%(a: string, b: string%);

I like this. It addresses my main concern of making it clear what's
happening. If I see an implementation of the event and look up the
prototype, I'll find both and can understand what's going on.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-19 Thread Vern Paxson
> > (4) avoid certain forms of user errors
> 
> Which particular user errors were a concern?

The main one being when there's an API change that's *not* backward
compatible, and the user doesn't update their scripts to be in conformance
with it as is required.  Clearly something we'll in general strive to avoid.

> I was still thinking
> name-based matching most clearly expressed the users intent, so I
> doubt that results in increased errors.

Agreed for those instances where it's compatible.

> Yeah, removing should work following this scheme.  What about changing
> semantics of a parameter though?

That's indeed trickier.  Here's a thought for the example you provide:

event foo%(is_server: bool%) 
event foo%(is_client: bool%);

This would mean "if the handler is defined using the name is_server as the
parameter, that's the deprecated version".  Any other declaration (that's
for a parameter of type bool, of course) would refer to the new version.

By the way, I didn't follow whether in Vlad's example, the semantics of
the parameter changed, or if his fix was just to give it the correct name
to match its actual semantics.  For the latter, existing handlers are
presumably flat-out broken, and there's no benefit in trying to continue
to support them.

One concern I have with leaning on is-the-name-a-match is not-too-implausible
user coding along the lines of:

event foo(is_server_orig: bool)
{
...
local is_server = my_twisty_logic(is_server_orig, x, y, z);
...
}

i.e., the parameter being renamed by the user anyway because they want to
use the original name for a local variable.  These users will be bitten
by changes in parameter semantics regardless of which approach we use;
name-based matching isn't a panacea.

> The other problem is that if a user is skipping a version, they may
> end up with a handler that treats "is_server" in the original way, but
> the meaning has been changed and we only match events based on type +
> number of parameters.  With name-based parameter matching, we can
> catch this scenario as well.

With the tweak I outline above, this would only bite them once the 
version is removed.  That could span several releases, depending on the
particular situation.  I don't have a lot of sympathy for users who upgrade
across a number of releases, for which the release notes flag backward
compatibility issues, and they don't take heed to address those.

> Can you maybe break down what you mean by "clean" further so we can
> better compare/contrast solutions ?  I don't recall seeing anything
> that was a showstopper for my original patch (but been a while).

I find an approach that changes the fundamental type-checking used for
event handlers to be more heavy-handed than one that provides control to
the developer to explicitly decide when they've made a change for which
it makes sense to support a deprecated version.

If Zeek had originally used name-based parameters, then I'd be fine with
this being the solution.  However, switching from positional to name-based
strikes me as a conceptually deep change for a relatively modest problem.

Vern
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-19 Thread Jon Siwek
On Mon, Feb 18, 2019 at 6:46 PM Vern Paxson  wrote:

> Suppose "event foo(a: string)" is an event to which we want to add
> a second parameter, b: string.  We could express this in event.bif as:
>
> event foo%(a: string%) 
> event foo%(a: string, b: string%);

Nice idea.  I also think that would work in this "want to add a new param" case.

> (4) avoid certain forms of user errors

Which particular user errors were a concern?

And you mean in contrast w/ my initial proposal?  I was still thinking
name-based matching most clearly expressed the users intent, so I
doubt that results in increased errors.

> and (5) allow us to consider other
> API changes such as removing parameters.

Yeah, removing should work following this scheme.  What about changing
semantics of a parameter though?  Take earlier example Vlad gave, I
think the process is:

Step 1:

event foo%(is_server: bool%) 
event foo%(is_server: bool, is_client: bool%);

This is ok so far: no code will break and users can update to the new
handler with extra parameter.  It is a bit odd to have the "is_server"
parameter available for use when the actual intent is to later get rid
of it.

Step 2:

event foo%(is_server: bool, is_client: bool%) 
event foo%(is_client: bool%);

Now this is really awkward: users have to update their code once again
(in my proposal, they update their code just once).

The other problem is that if a user is skipping a version, they may
end up with a handler that treats "is_server" in the original way, but
the meaning has been changed and we only match events based on type +
number of parameters.  With name-based parameter matching, we can
catch this scenario as well.

> It's not as easy to implement ... but strikes me as cleaner than where
> this discussion has wound up going.

I still prefer my original proposal from everything I've seen so far,
I think it best fit the notion of what event handlers actually are
(but maybe not how they've traditionally been thought about): we pick
points in time as an "event" and associate various bits of data with
them, then users declare their interest in some subset of that data.
Most natural way for them to do that seems to be by naming each bit of
data.

Can you maybe break down what you mean by "clean" further so we can
better compare/contrast solutions ?  I don't recall seeing anything
that was a showstopper for my original patch (but been a while).

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-18 Thread Vern Paxson
This thread got backburnered for me due to other $dayjob stuff, but returning
to it now, one thing I wondered would be what if instead of just allowing
missing parameters as a way to support changing event interfaces, we
introduced an explicit notion of deprecation, as follows.

Suppose "event foo(a: string)" is an event to which we want to add
a second parameter, b: string.  We could express this in event.bif as:

event foo%(a: string%) 
event foo%(a: string, b: string%);

This would (1) allow the developer who's changing the API to decide up
front whether the change can be backward-compatible by leaving out parameters,
(2) put the maintenance burden just on the developer, (3) allow users to
(automatically) look for the uses of the deprecated API if they wish,
(4) avoid certain forms of user errors, and (5) allow us to consider other
API changes such as removing parameters.

It's not as easy to implement ... but strikes me as cleaner than where
this discussion has wound up going.

Vern
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-07 Thread Jan Grashöfer
On 06/02/2019 21:00, Jon Siwek wrote:
> On Wed, Feb 6, 2019 at 1:30 PM Vlad Grigorescu  wrote:
> 
>> I _think_ I like Seth's idea of records, but I'm still thinking it through. 
>> It would formalize a growing trend towards moving more parameters into 
>> records anyway. If we're worried about backwards compatibility, then maybe 
>> we have a built in version number in each record. Whenever fields are 
>> added/removed, or there are more subtle contextual changes, the version 
>> number could increase.
> 
> Explicit versioning is a neat idea to maybe try expanding on.  I'm not
> quite sure how it would look for the user to, when they write their
> code, make it explicit that they expect the semantics of the record to
> match version XYZ.

If I get that right, this would mean that every record type comes with 
an individual version number, independent from the package or zeek 
version. While I can see that this allows to achieve backwards 
compatibility I am quite sure it will end in a complete mess. I expect 
the resulting code to be harder to write and to read, assuming that most 
of the records come with versioning constraints.

Actually, I like Jon's solution most. From a coder's perspective, the 
only thing I have to keep in mind is that parameters are matched by 
name. If I want to rename I can fallback to matching by order. That 
said, I would guess renaming a parameter is not a critical use case. I 
cannot imagine a situation where I rely on that feature. For me, 
renaming parameters makes code harder to read if I am familiar with the 
original definition. One could even think about introducing 
syntactically explicit renaming of parameters (which would probably 
break backwards compatibility again).

On 06/02/2019 02:40, Robin Sommer wrote:
> The following would be even worst in terms of confusion:
> 
> global my_event: event(a: string, b: string);
> event my_event(b: string)
> 
> Now I need to know if the language goes by order of parameters or by
> parameter name.

I can see that this might be confusing at first but I would argue that 
this is a one-time confusion. Versioned records in contrast would 
require me to think about their history every time I try to use a type.

Jan
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-06 Thread Jon Siwek
On Wed, Feb 6, 2019 at 1:30 PM Vlad Grigorescu  wrote:

> I _think_ I like Seth's idea of records, but I'm still thinking it through. 
> It would formalize a growing trend towards moving more parameters into 
> records anyway. If we're worried about backwards compatibility, then maybe we 
> have a built in version number in each record. Whenever fields are 
> added/removed, or there are more subtle contextual changes, the version 
> number could increase.

Explicit versioning is a neat idea to maybe try expanding on.  I'm not
quite sure how it would look for the user to, when they write their
code, make it explicit that they expect the semantics of the record to
match version XYZ.  But yeah, on our end, we could ensure that if we
see someone wanted XYZ, we send them that version of the record, which
matches their semantic expectations.

> As a concrete example, the is_server parameter of the ssh_capabilities event 
> was being set backwards (effectively as "is_client.")

In this particular example, the way I would have wanted to solve it
with my proposed patch would be: deprecate "is_server", document it as
actually meaning "is client" (don't change semantics of that parameter
to avoid breaking user code that is already doing the inversion), but
do introduce a new "is_client" which actually means "is client", then
later remove "is_server".

- Jon

___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-06 Thread Jon Siwek
On Wed, Feb 6, 2019 at 10:51 AM Seth Hall  wrote:

> One thing I've been thinking about a little bit is how much we're
> concerning ourselves with maintaining perfect backward compatibility and
> if there is some benefit to breaking a bit of backward compatibility for
> something truly nicer?

Sure, I wouldn't argue against taking that approach if there's a good
suggestion.

>  Like, should we have some specialized syntax for
> specifying named parameter use?  Should we have something like anonymous
> records where you specify a variant of record syntax for named
> parameters?

I think a user would prefer to always do whatever is most robust and
won't break in the face of upstream changes.  If I'm going to use the
alternative syntax everywhere because it's better, why have an
alternative in the first place.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-06 Thread Vlad Grigorescu
I think compatibility is a growing issue with scripts being released as
plugins. I'm already seeing some code shift to:

> @if (Version ...)
> new event
> @else
> old event

I _think_ I like Seth's idea of records, but I'm still thinking it through.
It would formalize a growing trend towards moving more parameters into
records anyway. If we're worried about backwards compatibility, then maybe
we have a built in version number in each record. Whenever fields are
added/removed, or there are more subtle contextual changes, the version
number could increase.

As a concrete example, the is_server parameter of the ssh_capabilities
event was being set backwards (effectively as "is_client.") I introduced a
change[1] where the is_server parameter was corrected, but now how to
interpret the value depended on the Bro version. This is a very subtle
case, where no field was added or deleted, but users were still expected to
change their code.

  --Vlad

[1] - 

On Wed, Feb 6, 2019 at 5:10 PM Seth Hall  wrote:

>
>
> On 6 Feb 2019, at 10:48, Jon Siwek wrote:
>
> > We can't magically change that user's code.  We have to, somehow, let
> > that work as it is, without user intervention.
>
> Yeah, I think that's right.  I was trying to come up with some proposed
> model for indicating that you're specifying named parameters but that
> would still force existing code to be updated so that it could keep
> using the existing model.
>
> One thing I've been thinking about a little bit is how much we're
> concerning ourselves with maintaining perfect backward compatibility and
> if there is some benefit to breaking a bit of backward compatibility for
> something truly nicer?  Like, should we have some specialized syntax for
> specifying named parameter use?  Should we have something like anonymous
> records where you specify a variant of record syntax for named
> parameters?  (ruby comes to mind for this, but I've seen people do
> similar things in c too)
>
> I'm just wondering if we should do a one-time source compatibility break
> to could get some tangible benefit in usability or understanding.  I
> think Robin's concern is about backing us into a corner with a syntax
> that makes code confusing.  Is that correct Robin?
>
>.Seth
>
> --
> Seth Hall * Corelight, Inc * www.corelight.com
> ___
> zeek-dev mailing list
> zeek-dev@zeek.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev
>
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-06 Thread Seth Hall



On 6 Feb 2019, at 10:48, Jon Siwek wrote:

> We can't magically change that user's code.  We have to, somehow, let
> that work as it is, without user intervention.

Yeah, I think that's right.  I was trying to come up with some proposed 
model for indicating that you're specifying named parameters but that 
would still force existing code to be updated so that it could keep 
using the existing model.

One thing I've been thinking about a little bit is how much we're 
concerning ourselves with maintaining perfect backward compatibility and 
if there is some benefit to breaking a bit of backward compatibility for 
something truly nicer?  Like, should we have some specialized syntax for 
specifying named parameter use?  Should we have something like anonymous 
records where you specify a variant of record syntax for named 
parameters?  (ruby comes to mind for this, but I've seen people do 
similar things in c too)

I'm just wondering if we should do a one-time source compatibility break 
to could get some tangible benefit in usability or understanding.  I 
think Robin's concern is about backing us into a corner with a syntax 
that makes code confusing.  Is that correct Robin?

   .Seth

--
Seth Hall * Corelight, Inc * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-06 Thread Jon Siwek
On Tue, Feb 5, 2019 at 7:40 PM Robin Sommer  wrote:

> The following would be even worst in terms of confusion:
>
> global my_event: event(a: string, b: string);
> event my_event(b: string)
>
> Now I need to know if the language goes by order of parameters or by
> parameter name.

The issue may be exaggerated because we're using contrived parameter
names.  If we make it a bit more real (still "shortened" for sake of
example):

http_request: event(method: string, version: string);
event http_request(version: string) { ... }

The "parameter order vs. name" issue doesn't even cross my mind here
because the intent is clear -- I understand the meaning of the
handler's parameter because the original author of the event chose a
meaningful and useful name for it (which will be the common case).
That's all the reader cares about -- understanding the meaning of any
given parameter.

> I do see the appeal of making things just work when event handlers
> change, but is there really no different way to support that?

If the goal is to avoid breaking user-code, I don't think we're in a
position to do anything else.  For example, we have:

global my_event: event(a: string);

And we want to add a parameter:

global my_event: event(a: string, b: string);

The user has already written their handler as:

event my_event(a: string) { ... }

We can't magically change that user's code.  We have to, somehow, let
that work as it is, without user intervention.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-05 Thread Robin Sommer



On Fri, Feb 01, 2019 at 11:17 -0600, Jonathan Siwek wrote:

> > > > global my_event: event(a: count, b: string);
> > > > event my_event(b: string)
> > > > { print "my_event", b; }

> Did it look like an error in the sense of the user making a mistake or
> in the sense of traditional way functions in other languages like
> C/C++ require matching signatures?

It's probably the latter, but I'm not sure that helps: when I see the
above code, I wonder: "Heck, why is that working ... Or, wait, maybe
it isn't?". It makes it confusing to me to read the code.

If at least the prototype told me somehow "it's ok if parameters are
left off", then I'd have a clue that everything is fine.

The following would be even worst in terms of confusion:

global my_event: event(a: string, b: string);
event my_event(b: string)

Now I need to know if the language goes by order of parameters or by
parameter name.

I do see the appeal of making things just work when event handlers
change, but is there really no different way to support that?

(I don't have an opinion on the event overloading discussion yet, need
to digest that more)

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Jon Siwek
On Fri, Feb 1, 2019 at 12:59 PM Vern Paxson  wrote:

> I don't see how it helps with
> deprecating existing parameters (which seems would be better served with
> some sort of  attribute),

Support for  in parameters is part of the changes.

But if we don't allow the user to immediately remove the field, they
are then stuck doing 2 changes:

Step 1: we mark a field 
Step 2: the user sees that, so they remove uses of that parameter from
their body
Step 3: we actually remove the  field
Step 4: if the user was forced to still have the  param in
their handler's param list they now have to do a second change to
remove it instead of just removing it right away

With the proposed patch, we get rid of the need for Step 4 and
decrease burden on users.

> and I don't see how it helps with
> changing the semantics of existing event parameters.

Step 1: we mark old field  and introduce a new parameter
Step 2: the users sees that.. etc, etc. same as above.

> It actually makes sense to me to support overloading for events.  Then for
> example you could have two event signatures depending on what information
> the handler was going to leverage, which would allow the event engine to
> offload work if there isn't a handler for a signature that requires extra
> computation.

I think the same kind of offloading is possible with the "parameter
subset" approach.   We know exactly what parameters are being
consumed, so we might have optimizations that don't produce a
parameter if no one consumes it.  And if no one consumes any
parameters we also don't generate the call.

If you have two different event signatures, we just get the same type
of optimization we currently do, which only optimize out the entire
call if there's no handlers, but doesn't know if individual parameters
are being consumed or not.  e.g:

http_request(a, b, c)
http_request(d, e, f)

If someone only consumes 'a' and 'e', you still have to produce both
function calls in their entirety (and also the other unused params),
but:

http_request(a, b, c, d, e, f)

You can potentially not do any work generating the unused parameters
and only have to do the one function call with 'a' and 'e'.

Technically, we can still require a matching signature and do such an
optimization by walking the AST and finding local parameter usages.  I
guess you have to do that ultimately, but it's an easy head start at
implementing such optimizations as a test/idea if we can simply see
someone isn't using a parameter because it's not in their handler
param list.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Vern Paxson
> The compelling use-case I'd say is the ability to change/deprecate
> event parameters without suddenly breaking people's code since that
> has come up many times already.

I see how it allows adding new parameters.  I don't see how it helps with
deprecating existing parameters (which seems would be better served with
some sort of  attribute), and I don't see how it helps with
changing the semantics of existing event parameters.

> Also this change only effects events and hooks, not functions.  The
> semantics are different enough that maybe we would only want
> overloading for functions anyway.

It actually makes sense to me to support overloading for events.  Then for
example you could have two event signatures depending on what information
the handler was going to leverage, which would allow the event engine to
offload work if there isn't a handler for a signature that requires extra
computation.

> Hooks and events have multiple implementations/bodies that are defined
> by the *user*.  The *author* is generally the one the generates
> (calls) the event/hook.

The big exception being the event engine (if I follow what you mean by
user/author).

> So if the event/hook name were overloaded, it's a bit confusing -- the
> user now has to decide between different signatures to handle, each
> containing different data sets and maybe neither contains the set they
> want (so now they handle two events of the same name instead of one).

Not really seeing this.  I'm picturing that a common idiom will be a
lightweight version of an event and a heavyweight version, or maybe a
spectrum from light-to-heavy.

> Really, an event is a unique name with some amount of data
> (parameters) associated with it and may always be generated with the
> full data set -- the user then chooses which data they are interested
> in by defining that explicitly in the handler's parameter list.

Yeah, I agree that this too would allow (most of) the sort of offloading
I sketch above.

Vern
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Seth Hall



On 1 Feb 2019, at 11:24, Robin Sommer wrote:

> It's a nice a idea to relax parameter passing to work by name, and
> allow subsets. However, I can't quite get myself to really like it in
> this form, because it *looks* like an error to not have matching
> argument lists. Is there some syntax that would make it more clear
> what's going on?

I think the change to using names does make things a bit more confusing 
for users, but it opens the door for us to greatly improve reliability 
of scripts in the long term and generally it feels like a nice way for 
analyzer authors to deprecate functionality without needing to create 
all new events.  In my opinion even though there are hairy side effects 
to this I think it's a net positive.  It would be great to get case 
sensitive versions of dns events and the http header event.  That has 
been a very long standing deficit.

I guess if there is some more obvious way to do it could make sense, but 
I haven't been able to come up with anything after thinking about this 
for quite a while.

   .Seth

--
Seth Hall * Corelight, Inc * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Jon Siwek
On Fri, Feb 1, 2019 at 10:24 AM Robin Sommer  wrote:

> On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote:
>
> > > global my_event: event(a: count, b: string);
> > > event my_event(b: string)
> > > { print "my_event", b; }
>
> it *looks* like an error to not have matching
> argument lists. Is there some syntax that would make it more clear
> what's going on?

Not sure.  If the syntax were different, that introduces a "one more
thing to remember" issue, so I might prefer consistency with other
function-like constructs.

Any other language we know that has multi-body functions we can
reference for ideas?

Did it look like an error in the sense of the user making a mistake or
in the sense of traditional way functions in other languages like
C/C++ require matching signatures?

In the former, I think the semantics/intentions are actually clearer
than before: the user didn't list a parameter because they don't care
about it, so why make them.  I know what event they want because they
use unique names and the parameters they listed do map in a valid way.

On the traditional side of things, overloading seems it's maybe a
legit reason for requiring matching signatures, but I also explained
why I think overloading wouldn't make sense in the context of events.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Robin Sommer



On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote:

> > global my_event: event(a: count, b: string);
> > event my_event(b: string)
> > { print "my_event", b; }

> Is there a compelling use-case that's motivating this change?

I'm sure the main use case is changing an existing event's parameters
without breaking existing scripts -- someting we've been increasingly
running into as a major challenge.

It's a nice a idea to relax parameter passing to work by name, and
allow subsets. However, I can't quite get myself to really like it in
this form, because it *looks* like an error to not have matching
argument lists. Is there some syntax that would make it more clear 
what's going on?

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-02-01 Thread Jon Siwek
On Thu, Jan 31, 2019 at 6:29 PM Vern Paxson  wrote:

> > * user doesn't care about parameter 'a', so they shouldn't have to list it
> > * it makes it easier for to deprecate/change event parameters
>
> This seems like a pretty niche pair of benefits.  Is there a compelling
> use-case that's motivating this change?

The compelling use-case I'd say is the ability to change/deprecate
event parameters without suddenly breaking people's code since that
has come up many times already.  I briefly skimmed NEWS for just the
last 2.6 release and count 5 times we broke an event signature where
this patch would have helped.

I think there's also some other higher-profile changes to event args
we haven't moved forward with because we didn't want to break user
code that this would help with.  Old example from unresolved ticket:

https://bro-tracker.atlassian.net/browse/BIT-1431

> One thing I initially wondered was whether this was going to tie our hands
> in the future if we want to introduce C++-style overloading.  However, it
> looks like you've implemented this based on matching the names in the
> declaration rather than the types, so that should be okay.

Also this change only effects events and hooks, not functions.  The
semantics are different enough that maybe we would only want
overloading for functions anyway.

That is, functions have a single, fixed implementation/body that is
defined by the *author*, so you may want to re-use the same name for
something implemented in different ways.  The *user* is the one that
calls the function.

Hooks and events have multiple implementations/bodies that are defined
by the *user*.  The *author* is generally the one the generates
(calls) the event/hook.

So if the event/hook name were overloaded, it's a bit confusing -- the
user now has to decide between different signatures to handle, each
containing different data sets and maybe neither contains the set they
want (so now they handle two events of the same name instead of one).
Really, an event is a unique name with some amount of data
(parameters) associated with it and may always be generated with the
full data set -- the user then chooses which data they are interested
in by defining that explicitly in the handler's parameter list.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] support for event handlers using a subset of parameters

2019-01-31 Thread Vern Paxson
> global my_event: event(a: count, b: string);
> 
> event my_event(b: string)
> { print "my_event", b; }
> 
> Which is good because:
> 
> * user doesn't care about parameter 'a', so they shouldn't have to list it
> * it makes it easier for to deprecate/change event parameters

This seems like a pretty niche pair of benefits.  Is there a compelling
use-case that's motivating this change?

One thing I initially wondered was whether this was going to tie our hands
in the future if we want to introduce C++-style overloading.  However, it
looks like you've implemented this based on matching the names in the
declaration rather than the types, so that should be okay.

Vern
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] support for event handlers using a subset of parameters

2019-01-31 Thread Jon Siwek
Just wanted to draw more attention/feedback to a proof-of-concept
patch I just made:

https://github.com/zeek/zeek/pull/262

Basically lets us do this:

global my_event: event(a: count, b: string);

event my_event(b: string)
{ print "my_event", b; }

Which is good because:

* user doesn't care about parameter 'a', so they shouldn't have to list it
* it makes it easier for to deprecate/change event parameters

I didn't see any drawbacks to this change, but maybe I missed
something, let me know.

- Jon
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev