from:"Jon Siwek"

Re: [Bro-Dev] "delete" of entire tables/sets

2018-12-06 Thread Jon Siwek

On Thu, Dec 6, 2018 at 4:09 PM Vern Paxson  wrote:

> This means I have to first build up a *separate* vector of all the indexes,
> then iterate over that to remove them.

I guess re-assigning a new, empty table to the variable could be
analogous to deleting all entries and also avoids the iterator
invalidation problem ?

> Proposal: extend "delete" so that if applied to a table or a set, it clears
> out all of its elements.

But yeah, that also seems clearer to me at first thought.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker cluster discovery and load balancing

2018-12-06 Thread Jon Siwek

On Thu, Dec 6, 2018 at 11:16 AM Hosom, Stephen M  wrote:

> I have looked at the implementations of publish_hrw and publish_rr in bro. I 
> could easily implement those features in my application if that is the 
> recommended way to handle this issue.

There's been some ideas on pushing loading balancing mechanisms into
broker itself, but I doubt that's something that will get done in the
near term.  Explicitly implementing HRW or Round-Robin load balancing
in your application is definitely one way to handle things for now.

> Unfortunately, that leaves me with another unfortunate problem. I have been 
> unsuccessful in determining how to 'discover' members of a Bro cluster via 
> Broker. Is there a way to do discovery, or do I need to know who the cluster 
> members are and what port they are listening on via a broctl configuration 
> equivalent?

At the moment, the cluster configuration is defined by the
Cluster::nodes table that gets auto-generated by broctl into
cluster-layout.bro.  Does that table have all the info you need?

You still might need a well-known node to connect your broker
application (e.g. manager) to, but then you should be able to send
back the information you need from that table (e.g. here's the list of
worker ip/port pairs).

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Consistent error handling for scripting mistakes

2018-11-14 Thread Jon Siwek

On Tue, Nov 13, 2018 at 4:32 PM Vern Paxson  wrote:
>
> I like what you & Robin sketch.  FWIW, it's hard for me to get excited
> over the issue of leaks-in-the-face-of-error-recovery.

Yeah, it's not great, but intention would still be to eventually fix
the leakage problem, too.

Anyway, made a new GH issue to track the broader error handling enhancement:

https://github.com/bro/bro/issues/211

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Consistent error handling for scripting mistakes

2018-11-12 Thread Jon Siwek

On Mon, Nov 12, 2018 at 1:44 PM Jim Mellander  wrote:

> I was tinkering with the sumstats code, and inadvertantly deleted the final 
> "}" closing out the last function.  When running the code, the misleading 
> error message is received:

Yes, that's a bit of a different topic, but still tracked (at
low-normal priority):

https://github.com/bro/bro/issues/167

> There are also silent fails which probably should give a warning, such as 
> failing to include the fully-qualified event name silently preventing the 
> event from being triggered.

Also a bit different that what I was talking about, but also tracked
(at higher priority since it's a common mistake):

https://github.com/bro/bro/issues/163

> My idea on runtime scripting errors would be to apply a sensible default to 
> the offending expression (null or 0, as the case may be, might be 
> sufficient), log the error, and continue

In the following example (comments reflect current behavior) you'd
expect the "false" branch in foo() to be taken?

#
function foo()
{
local t: table[string] of string = table();

# Non-existing index access: (sub)expressions are not evaluated
if ( t["nope"] == "nope" )
# Unreachable
print "yes";
else
# Unreachable
print "no";

# Reachable
print "foo done";
}

event bro_init()
{
foo();
# Reachable
print "bro_init done";
}
#

My thought was that should behave more like a "key error" run-time
exception (e.g. like Python).  Bro scripting doesn't have exception
support, but internally we can use an exception to unwind the call
stack (additionally I was thinking that the unwind needs to proceed
further than what it does already in some cases, which is just the
current function body).  In any case, logging of the error would also
occur (as it already does).

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

[Bro-Dev] Consistent error handling for scripting mistakes

2018-11-12 Thread Jon Siwek

Trying to broaden the scope of:

https://github.com/bro/bro/issues/208

I recently noticed there's a range of behaviors in how various
scripting mistakes are treated.  They may (1) abort, as in case of bad
subnet mask or incompatible vector element assignment (2) skip over
evaluating (sub)expression(s), but otherwise continue current function
body, as in case of non-existing table index access or (3) exit the
current function body, as in the classic case of uninitialized record
field access.

1st question: should these be made more consistent? I'd say yes.

2nd question: what is the expected way for these to be handled?  I'd
argue that (3) is close to expected behavior, but it's still weird
that it's only the *current function body* (yes, *function*, not
event) that exits early -- hard to reason about what sort of arbitrary
code was depending on that function to be fully evaluated and what
other sort of inconsistent state is caused by exiting early.

I propose, for 2.7, to aim for consistent error handling for scripting
mistakes and that the expected behavior is to unwind all the way to
exiting the current event handler (all its function bodies).  That
makes it easier to explain how to write event handlers such that they
won't enter too wild/inconsistent of a state should a scripting error
occur: "always write an event handler such that it makes no
assumptions about order/priority of other events handlers".  That's
already close to current suggestions/approaches.

One exception may be within bro_init(), if an error happens at that
time, I'd say it's fine to completely abort -- it's unlikely or hard
to say whether Bro would operate well if it proceeded after an error
that early in initialization.

Thoughts?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] attributes & named types

2018-11-07 Thread Jon Siwek

On Tue, Nov 6, 2018 at 3:00 PM Vern Paxson  wrote:

> I think what we need to do is rethink the basic architecture/structure of
> attributes.  In particular, types in general (not just named types) should
> be able to have attributes associated with them.  The attributes associated
> with an identifier are those associated with its type plus those directly
> associated with the identifier (like ).

Sounds worth pursuing.  I think this was also one of the routes
originally offered, but not sure if it actually got attempted or there
were other complications.  Hard to remember all the twists this issue
has taken, but you probably have the freshest view of things at the
moment to decide which way to try going.

> While doing this, we can also think about mechanisms for removing attributes.
> I don't think the "=F" approach mentioned earlier on this thread will do
> the trick, since it's syntactically/semantically quite weird for attributes
> that already take expressions as values, such as  or _expire.

Yeah, attr removal seems to warrant its own unique syntax.  But might
help to just review which attrs one may actually want to remove (or
even make sense to remove) -- seems like it's only  in the first
place so maybe doesn't warrant a generalized mechanism ?

A related idea I haven't thought through: how about providing a BIF
that does attr removal/modification?  Actually seems more powerful to
be able to change attributes at runtime rather than just parse-time.

Another thought/worry that may or may not be valid for generalized
attr remove/modification: seems there may be opportunity to create
non-sensical states.  e.g. the sequence of (1) create a value of the
type "foo" which initially has attr , (2) later remove  from
type foo, (3) are the existing values of type foo still coherent now
that they lack  ?  Obviously made up the type/attr, but probably
have to think that sequence through for each existing attribute to
make sure behavior is well-defined for each.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

[Bro-Dev] Any 2.6 release blockers?

2018-10-29 Thread Jon Siwek

Anyone have any last minute issues that would block the 2.6 release?

Or any remaining changes (bug fixes) they want to still try to get into it?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] bro-pkg Bro version requirements

2018-10-16 Thread Jon Siwek

On Tue, Oct 16, 2018 at 2:23 PM Vlad Grigorescu  wrote:

>> @if ( Version::at_least("2.6") )
>> event ssl_client_hello(c: connection, version: count, record_version:count, 
>> possible_ts: time, client_random: string, session_id: string, ciphers: 
>> index_vec, comp_methods: vector of count) =1
>> @else
>> event ssl_client_hello(c: connection, version: count, possible_ts: time, 
>> client_random: string, session_id: string, ciphers: index_vec) =1
>> @endif
>
>
> That works, but I worry that the overhead of trying to maintain that will 
> grow out of hand.

I don't have the same worry at least at the moment.  The maintenance
overhead doesn't seem that great: those are fairly trivial lines of
code to understand.  The growth potential also seems low considering
we really don't have cycles to maintain past the last version of Bro
ourselves, so I wouldn't expect packagers to maintain their code to
work against all possible versions.  e.g. in the already-unlikely
chance the same event introduces a breaking change in 2.7, the branch
complexity would not grow if they just throw out the 2.5 branch.

> I'm wondering if there's a better mechanism for this. A naive approach might 
> be to include an option in the package metadata, which specifies 
> minimum/maximum Bro versions that it requires. The installer, then, would 
> simply install the latest version that supports your Bro version.

Interesting thought, but my hunch is it just makes the complexity
implicit instead of explicit.  e.g. if metadata claims it works for
2.6, it's tricky the answer why that is, but with the code above you
can easily search for those specific areas that are version-sensitive.

I'm thinking it may be best to wait and see if there is actually a
problem (lots of people complaining) before trying to decide how to
solve anything.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Intermittent bro crashes

2018-10-16 Thread Jon Siwek

On Tue, Oct 16, 2018 at 9:38 AM McMullan, Tim  wrote:

> We have been seeing occasional core dumps from bro, currently running on 
> 2.5-870.

May be nice to try most recent master version to see if it still pops up.

>  We’ve tried a few things to reproduce it on-demand but haven’t been 
> successful.  We were wondering if you might have some insight into the crash. 
>  This is the backtrace we get:
>
> #0  TableEntryVal::ExpireAccessTime (this=0x9b349bd32ebb5614) at 
> /hostname/bro-devel-src/src/Val.h:741
>
> #1  TableVal::DoExpire (this=0x3955a40, t=1539447574.1403711) at 
> /hostname/bro-devel-src/src/Val.cc:2353

I don't see an immediate culprit there -- it's the table entry
expiration algorithm that I guess is well-traveled by most people
running Bro, so maybe need to work on narrowing down a way to
reproduce or what specific script is triggering it.  Do you have any
custom scripts that make use of _func or &{read, write,
create}_expire table attributes?  That could be a first place to
inspect.  I also fixed a bug [1] in related code just last week, but
I'd expect that to give a different stack trace if it was the same
problem here (still doesn't hurt to try to rule that out as a
contributing factor by testing w/ latest git/master).

- Jon

[1] https://github.com/bro/bro/commit/8792f5545cd5b7de433d0eee510fde94371fdee3

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] SSH Capabilities Bug: Fix for 2.6?

2018-10-15 Thread Jon Siwek

On Mon, Oct 15, 2018 at 3:33 PM Vlad Grigorescu  wrote:

> The SSH Capabilities record has the following field, which is being set 
> incorrectly:
>
>> ## Are these the capabilities of the server?
>> is_server:  bool;
>
>> result->Assign(6, new Val(${msg.is_orig}, TYPE_BOOL));
>
> Obviously, I'd like to fix this. I'm curious to hear thoughts about getting 
> this into 2.6.

Yes, that seems like a bug fix that can be included in 2.6.  Do you
want to make a PR for that and verify/update any unit test baselines
that change?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Building Bro with openssl 1.1.1

2018-10-12 Thread Jon Siwek

On Fri, Oct 12, 2018 at 6:33 PM Craig Leres  wrote:

> Is openssl 1.1.1 support on the roadmap?

Bro 2.6 will support OpenSSL 1.1.  Possibly only a couple more weeks
of beta remain before that release.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Event handling within script modules

2018-10-11 Thread Jon Siwek

On Thu, Oct 11, 2018 at 3:53 AM Steffen Haas
 wrote:

> # Problem 1: Event Handling within module
> The handler for "my_event" is not invoked as long as it is defined
> within the same module "module_test".
> - Workaround a): Do not use modules at all
> - Workaround b): Define the handler in a different module
> But why should not it be allowed to raise and handle events within the
> same module?

This is related to issues discussed at [1].  You can get the behavior
you want, but the short explanation/recommendation is to always use
the module prefix when raising/handling the event.  So, in your
example, when you raise and handle the event, explicitly add
"module_test::" even when inside the module.

> # Problem 2: Direct execution of scheduled events
> At least within this example, I expected the "raise_my_event" to be
> invoked after 10 seconds. However, it is done immediately. Am I using
> the schedule wrong?

When there's no input sources (not listening on an interface) and
you're just running a script, timers may get processed as fast as
possible, but you can add this and you'll then see timers behave based
on wall clock:

redef exit_only_after_terminate = T;

- Jon

[1] https://github.com/bro/bro/issues/163
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Module prefix in sending and receiving Broker events

2018-09-27 Thread Jon Siwek

On Thu, Sep 27, 2018 at 3:47 AM Matthias Vallentin  wrote:

> I hope this illustrates the issue a bit better.

Thanks, that helps.  Underlying issue/pitfall still seems related to
module interactions with event handler definition/declaration, so
added a link back from [#163].  Namely, seems you can try to reason
about events + modules by trying to remember that event declarations
will implicitly get prefixed by module names while event handlers
don't (i.e. a handler for something that hasn't been declared yet
isn't considered an implicit declaration... or at least not the same
kind of declaration).  It's not in the scope of Bro 2.6 to try
improving anything here, so I still suggest the simplest rule/approach
to remember is to always/everywhere prefix event names with an
intended module name.

- Jon

[#163] https://github.com/bro/bro/issues/163
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] 2.6-beta build error

2018-09-21 Thread Jon Siwek

On Fri, Sep 21, 2018 at 12:00 PM Azoff, Justin S  wrote:

> I think I remember the older geoip bifs logging this as well, but only once 
> at startup, not once per call to lookup_location.

Yes, looks like it used to be that way.  Changed via [1] now in master.

- Jon

[1] https://github.com/bro/bro/commit/2ede95422bab0c4853b8db7cc15fe5c6e77aa75a
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] 2.6-beta build error

2018-09-21 Thread Jon Siwek

On Thu, Sep 20, 2018 at 2:14 PM Michael Dopheide  wrote:
>
> bro.bif:3630:47: error: cannot convert ‘stat’ to ‘__dev_t {aka long unsigned 
> int}’ in initialization

Thanks, it's fixed by a simple patch [1] now in master.

- Jon

[1] https://github.com/bro/bro/commit/d7097635f4b1ec9adca548b81ee42968629987a5

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

[Bro-Dev] 2.6-beta and 2.7 parallel development

2018-09-19 Thread Jon Siwek

Due to how our version tagging works and to prevent things destined
for 2.7 getting incorrectly labelled as part of 2.6-beta in CHANGES,
I'm going to start a dev/2.7 branch.

* Any new commits that should be included for final 2.6: merge to master
* Any new commits that are meant for 2.7: merge to dev/2.7

After a final 2.6 release is tagged, dev/2.7 can then be merged into master.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-18 Thread Jon Siwek

On Tue, Sep 18, 2018 at 7:35 AM Vlad Grigorescu  wrote:
>
> On Sat, Sep 15, 2018 at 1:28 AM Robin Sommer  wrote:
>>
>> Are Jenkins and Coverity already pulling from GitHub?
>
> No, I thought Jenkins was pushing to Coverity.

Travis is triggered by commits on GitHub and looks like it has a
monthly cron to upload to Coverity.

However, some submodule + testing repos that Travis clones are
actually using the bro.org repo instead of github.com URL, but that's
maybe not something to try to change now, but rather wait until
performing the actual move.

> Is the plan to have GitHub issues within each repo? That is, bro, binpac, 
> etc. I think we'd lose the easy way to see all issues,

Yeah, issues are better kept within the associated GH repo.  We rarely
saw JIRA tickets made for non-bro subcomponents and a good portion of
people were already using GitHub issues for those anyway.

> but if I recall, there was a way in GitHub to see issues across a few repos. 
> Maybe by organization?

Probably thinking of "project boards".  You can create an
organization-wide board that contains issues from separate repos.  I
haven't looked extensively at project boards, but I don't think it's
something we'll generally use because you have to manually add issues
to boards rather than treat them like an automated filter based on
issue labels/status/etc.  That could make it more useful when
organizing specific long-term projects, but still likely decide to use
it on a case-by-case basis rather than generally.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-14 Thread Jon Siwek

On Fri, Sep 14, 2018 at 9:54 AM Robin Sommer  wrote:
>
> real question here is if we want to switch repositories before or
> after 2.6?

Forgot to say that my sentiment is conditional -- if I find that it's
not disruptive/risky to the release process (e.g. whether I need to
make commits within bro itself), then changing whenever seems fine,
else wait until after 2.6.

It's maybe a week or two away before I have more complete
investigation/answer, but it's possibly that risky.  Here's the
general things I'll be looking into and fleshing out:

* GitHub "protected branches" for permissions and access control
* Which private repos can actually be moved to GitHub (we'll need to
update to Team Account)
* Where all in the docs uses the old bro.org git URL.
* How to handle commit email notifications.
* How decouple-able is the script-reference building process.  I'm
guessing I can just point to the github url and it will work, but
maybe there's some more we can offload from bro.org to Travis or AWS
at this point as well.

Anything else to worry about?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-14 Thread Jon Siwek

On Fri, Sep 14, 2018 at 9:54 AM Robin Sommer  wrote:
>
> On Thu, Sep 13, 2018 at 19:39 -0500, Jon Siwek wrote:
>
> > A preview of what migrated issues will look like along with new labeling 
> > scheme:
>
> The only thing I noticed is that the labels are
> quite long, making the list of tickets appear somewhat crowded. Could
> we skip the prefixes ("Type:", "Component:") and instead use colors to
> encode them?

I did some label tweaking and reduced some prefix names: "Component"
-> "Area" and "Difficulty" -> "Pain".

It's possible the color design could be done more carefully, but the
basic idea is the colors are mostly encoding importance/difficulty.
Green = go, yellow = caution, red = stop.  That may help people scan
the list to find potential things to work on related to their
abilities or available time.

A problem with removing the prefixes totally is that labels are sorted
by name.  So when one is viewing issues, they can get inconsistent
label orders between issues: the priority label could be at the start,
end, or middle of the label set.  A more annoying thing is that when
someone is trying to add labels to a new issue, they have to jump
around a list to find which labels to add and there's not a clear end
point.  With prefixes, I can scan the list from top to bottom: pick an
area, pick a pain, pick a priority, pick a type, done.  Without
prefixes, it's also less intuitive whether one should be mixing
various labels together and accidentally pick two types that aren't
meant to go together.  Especially for those that are color blind,
having a secondary way (the prefix) to distinguish label categories is
a simple solution without having to do more color design and
verification effort.   I don't know enough to say how much a problem
the current colors actually would be for what number of people, but
trying to be thoughtful just in case.

> We are leaving switching to github as authoritative source for the
> repositories to later, right? Doing it all at the same time could
> avoid confusion ("everything is on github now" is an easier
> statement), but would also make the process more complex. Maybe the
> real question here is if we want to switch repositories before or
> after 2.6?

I'd like to do the steps separately.  Moving tickets shouldn't make
the location of the git repo any more confusing than it is already.
Some users are already using github instead of bro.org and devs are
likely less-confusable anyway (or if not, it's still not a big deal if
someone accidentally pushes commits to a non-master branch of github
instead of bro.org).

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

[Bro-Dev] JIRA to GitHub ticket migration plan

2018-09-13 Thread Jon Siwek

A preview of what migrated issues will look like along with new labeling scheme:

https://github.com/jsiwek/test/issues

The selection strategy of which tickets to migrate is something like
"anything that is a reproducible bug or a simple/uncontroversial
enhancement".  Any tickets not in that preview list will be left open
in JIRA, but frozen as read-only.  Going forward, JIRA will serve as a
historical archive or Good Idea Backlog that we can occasionally pull
from, but that can be done manually whenever the situation comes up
(just create a GH issue and link to old JIRA ticket).

Remaining tasks:

* Perform the actual migration
* Update website/docs with new directions/process for issues
* Email bro + bro-dev mailing lists once tickets are migration
* Adapt workflow permissions for JIRA "BIT" project so it's read-only

I expect I'll do those next Monday unless there's objections or
feedback to address.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Module prefix in sending and receiving Broker events

2018-09-13 Thread Jon Siwek

On Thu, Sep 13, 2018 at 4:28 AM Matthias Vallentin  wrote:
>
> >  Does the
> > suggestion [1] to always explicitly scope events by their
> > namespace/module address your problem?
>
> That's what I thought would work, but it's the opposite: when I add
> the module name as prefix, Bro silently ignores the event.

So [1] didn't work...

> As mentioned in your reference [1], explicit module qualification works as 
> well:
>
> event Foo::foo() { ... }

But [1] also worked? :)

> I'm essentially running into the inverse of BIT-71, the ticket you referenced.

Yeah, I think I see how it's the reverse of the original example, but
it's likely the same underlying module/namespacing ambiguities with
events, so I'd still suggest explicitly scoping events always and
everywhere (which was the [1] suggestion).

It might also help if you send actual examples that can be run if that
still doesn't work because it's hard to interpret what you mean by
"publish via Broker".  That could mean Bro's Broker::publish() API or
the standalone Broker API itself and they are potentially different.

> I found a
> surprising solution though: if I declare the event first, then it
> works. Here's an example:
>
> module Foo;
> global foo: event();
> event foo() { ... }
>
> This works as expected: when I create an event with name Foo::foo, the
> handler gets dispatched. Without the declaration, it doesn't work.

Sure, but that's also not the [1] suggestion either.  Adding the
namespace scoping always and everywhere means to the event
declaration, the handlers, event/schedule dispatching, any strings
that contain the event name, etc.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-12 Thread Jon Siwek

On Wed, Sep 12, 2018 at 11:58 AM Alan Commike  wrote:

> Are there still parent/child processes handling comms/work?

No.  Single process, configurable number of threads (default 1).

> Is there a mechanism today for per node type tuneables?

One should be able to use an @if directive [1] to tune differently
per-node if they want.

- Jon

[1] https://www.bro.org/sphinx-git/script-reference/directives.html
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Module prefix in sending and receiving Broker events

2018-09-12 Thread Jon Siwek

On Wed, Sep 12, 2018 at 10:09 AM Matthias Vallentin  wrote:
>
> When I receive events from Bro via Broker, they have the prefix of the
> enclosing module:
>
> module Foo;
>
> event foo() { ... }
> event bar() { ... }
>
> When I publish "foo" via Broker, it arrives as "Foo::foo". However, when
> I publish an event "Foo::bar" from Broker, Bro doesn't recognize it. I
> must published it as "bar" - without the module prefix. Is this
> intentional?

Maybe not so much intentional, but expected at this point.  Does the
suggestion [1] to always explicitly scope events by their
namespace/module address your problem?  There's some longstanding
oddities [2] with the way events interact with module namespacing.

- Jon

[1] 
https://www.bro.org/sphinx-git/frameworks/broker.html#a-reminder-about-events-and-module-namespaces
[2] https://bro-tracker.atlassian.net/browse/BIT-71
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-12 Thread Jon Siwek

On Wed, Sep 12, 2018 at 9:18 AM Azoff, Justin S  wrote:
>
> Just finished the migration to master across the board, and it's looking 
> REALLY good.

Great, thanks for helping test and provide performance data.

> The manager box in this cluster only runs the manager and logger processes, 
> no proxies.  It also has something like 20 idle cores,
> so this isn't a problem at all, but could affect people who run a 
> cluster-in-a-box.

An idea in this type of situation could be to tune Broker::max_threads
per node type.  E.g. leave at 1 for workers and bump to ~4 for
manager/logger since there's idle cores on their host and they're
inherently in a less-scalable/centralized location.  That may not
lower overall cpu usage, but may help prevent some bottlenecks in the
processing of remote messages.  Particularly the work of processing
data store communication should distribute among threads, potentially
each data store could be processing messages independently on separate
threads.  (The default scripts have 3 stores, one for each known-*
script).

> I do seem to be seeing a bunch of reporter errors like
>
> Reporter::ERROR string with embedded NUL: "\\x00\\x00\\x00\\x00OPTIONS"

Ok, not necessarily that bad, but would be nice to find where that's
coming from to handle it more properly

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] How to use Broker::Data in an event handler?

2018-09-11 Thread Jon Siwek

On Tue, Sep 11, 2018 at 5:52 AM Matthias Vallentin  wrote:

> One more question: how would I capture a default-constructed
> broker::Data() in a case statement? This would happen when I publish
> just "None" on the Python side. In Bro, it shows up on the command
> line as "broker::data{nil}".

There's no nil/null/none type in Bro, so only way I can think to do it
at the moment is:

function is_nil(x: any): bool
{
if ( ! (x is Broker::Data) )
return F;

local d = x as Broker::Data;

if ( ! d?$data )
return T;

if ( cat(d$data) != "broker::data{nil}" )
return F;

return T;
}

Or in switch case, it's like:

case type Broker::Data as d:
print "Broker::Data, expected to be nil", d?$data, d?$data ?
cat(d$data) : "nil";
# or use the same logic from the is_nil() function above

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] switch/case type recognition

2018-09-11 Thread Jon Siwek

On Tue, Sep 11, 2018 at 5:42 AM Matthias Vallentin  wrote:

> I am aware that this is a somewhat pathological case, because 'case type
> any' is probably equivalent to the 'default' case.

A 'vector of any' also qualifies as an 'any', so while the error
message of "duplicate case label" could possibly be improved, I think
it's still correct in that it is meant to prevent an ambiguous case
match.  In your example, if you did pass in a vector to that function,
it could choose either the 'any' case or the 'vector of any' case.
One might have the expectation that it chooses the first case that
matches in top-down order, however that might conflict with other
expectations that switch statements can be implemented via hash table
lookup and require unambiguous cases.  In Bro, regular 'case' is
currently a hash table lookup, while actually 'case type' is a linear
search, but that seems to be just implementation detail and not a
conscious decision that can be relied on to allow ordered-case-type
matching.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] attributes & named types

2018-09-10 Thread Jon Siwek

On Mon, Sep 10, 2018 at 5:08 PM Vern Paxson  wrote:

> > At least that's how I think it's currently working, so are you going
> > start using TypeType as a means of type aliasing in addition to adding
> > attributes to them?
>
> Not quite.  Rather, the type of "mytype" would be a TypeType, which would
> have attributes.  The TypeType instance however would not know that it
> belongs to "mytype" (just as is currently the case).

Ok, think I got it now (and agree it seems like a possible way forward
regarding attributes).

> We could continue to support a call like "foo(mytype)" above by hoisting
> the base type (what the TypeType points to) when evaluating the expression
> "mytype", just as currently the identifier gets turned into a TypeType at
> that point.  That said, just what is the use for calls like "foo(mytype)"
> anyway?  Seems a bit peculiar, but maybe I'm missing something.

The actual usage I know is from Log::create_stream(), like this one:

  Log::create_stream(DPD::LOG, [$columns=Info, $path="dpd"]);

There, the Info is a type that's being stored into the $columns record
field of type 'any'.  It's not exactly the same as "passed as function
argument" example I gave, but same idea.

I think TypeType was added particularly for this logging framework usage.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] attributes & named types

2018-09-10 Thread Jon Siwek

On Mon, Sep 10, 2018 at 4:16 PM Vern Paxson  wrote:

> Seems simplest to me to have TypeType's (and only those) include attributes.
> The rest of it is easy to do from there.  We could also do this with a
> separate NameType, but I don't see what that gains, since TypeType's already
> come into existence because of binding names to types.

My understanding was just that TypeType's currently are *not* used
when creating type aliases.  Example:

# "mytype" is an ID w/ type "count" (i.e. it's a type alias).
# It's not using "TypeType" at this point.
type mytype: count;

function foo(x: any)
{ print x; }

# Passing the type name/alias as a value.
# The local variable/argument 'x' becomes of type
# "TypeType".  It's not of type "count".
foo(mytype);

# Here, 'y' and 'x' are now actually type "count".
# They aren't a "TypeType".
local y: mytype = 3;
foo(y);

At least that's how I think it's currently working, so are you going
start using TypeType as a means of type aliasing in addition to adding
attributes to them?

Just trying to clarify/understand how things currently work vs. what is planned.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] How to use Broker::Data in an event handler?

2018-09-10 Thread Jon Siwek

On Mon, Sep 10, 2018 at 8:01 AM Matthias Vallentin  wrote:
>
> I'm trying to figure out if/how it is possible to use Broker::Data in an
> event handler as follows:
>
> event foo(x: Broker::Data)
>   {
>   print x;
>   }

No, but you can try to use 'any' instead of 'Broker::Data'.  Examples/ideas:

# Bro code

type myvec: vector of any;

type myrec: record {
a: string 
b: count 
c: int 
};

event bar(x: any)
{
switch ( x ) {
case type myrec as r:
print "record", r;
break;
case type string as s:
print "string", s;
break;
case type int as i:
print "int", i;
break;
case type count as c:
print "count", c;
break;
case type myvec as v:
{
print "vector", v;

for ( i in v )
event bar(v[i]);
}
break;
default:
print "got unknown type", x;
break;
}
}

# Python code

endpoint.publish("/test", broker.bro.Event("bar", "one"))
endpoint.publish("/test", broker.bro.Event("bar", 2))
endpoint.publish("/test", broker.bro.Event("bar", broker.Count(3)))
endpoint.publish("/test", broker.bro.Event("bar",
["one", "two", broker.Count(3)]))
endpoint.publish("/test", broker.bro.Event("bar",
["one", None, None]))

> The use case for having a Broker::Data in the Bro event handler is that
> the structure of the data is varying at runtime (similar to JSON).

Should be the same idea if you use the 'any' type along with
appropriate type checking/casting.

> (The code is a slightly adapted version from
> https://github.com/bro/broker/issues/11.)

This, plus a couple other bugs should now be fixed in bro + broker, so
make sure to update both if trying the above examples.

- Jon
On Mon, Sep 10, 2018 at 8:01 AM Matthias Vallentin  wrote:
>
> I'm trying to figure out if/how it is possible to use Broker::Data in an
> event handler as follows:
>
> event foo(x: Broker::Data)
>   {
>   print x;
>   }
>
> I'm trying to send an event via the Python bindings:
>
> event = broker.bro.Event("foo", broker.Data(42))
> endpoint.publish("/test", event)
>
> However, Bro complains:
>
> warning: failed to convert remote event 'foo' arg #0, got integer, 
> expected record
>
> I tried both
>
> event = broker.bro.Event("foo", 42)
>
> and a wrapped version
>
> event = broker.bro.Event("foo", broker.Data(42))
>
> and even
>
> event = broker.bro.Event("foo", broker.Data(broker.Data(42)))
>
> but it seems that nesting is not possible.
>
> The use case for having a Broker::Data in the Bro event handler is that
> the structure of the data is varying at runtime (similar to JSON).
>
> Matthias
>
> (The code is a slightly adapted version from
> https://github.com/bro/broker/issues/11.)
>
> ___
> bro-dev mailing list
> bro-dev@bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-07 Thread Jon Siwek

On Fri, Sep 7, 2018 at 3:41 PM Azoff, Justin S  wrote:

> One thing I'm still seeing when I switch from an old version to latest master 
> is that huge spike
> in Content switches/interrupts and cpu spent in the kernel.

I just updated the default tuning parameters for CAF's scheduling
policy and exposed them all in Bro [1] and get within 10% of the
number of context switches that 2.5.5 had in a very simple test case.

Let me know how that goes for everyone.

[1] 
https://github.com/bro/bro/blob/4bd6da71864b4ab68b372954c6268a023d69e52b/scripts/base/frameworks/broker/main.bro#L64-L94
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-06 Thread Jon Siwek

On Thu, Sep 6, 2018 at 3:40 PM Azoff, Justin S  wrote:

>8842 
> broker::topic+broker::internal_comma...@u32.bro/known/certs/<$>/data/clone

Thanks, there was an unintended forwarding loop in data store
communication.  It's fixed in master now, but I've also just reverted
to generally disabling the forwarding mechanisms by default (any
specific/advanced use-cases can always choose to selectively enable
it, but the default cluster config doesn't need it for anything
currently).

For anyone doing testing, please pull the latest change from master
and give feedback on that version.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-06 Thread Jon Siwek

On Thu, Sep 6, 2018 at 3:14 PM Azoff, Justin S  wrote:


> I tested an almost stock local.bro (a few additional things disabled) and saw 
> the same thing.
>
> fa7fa5aa is fine, but with 452eb0cb everything is working really hard to do 
> something.

Thanks for that, I'll start looking into it, but still would be
helpful if you could try disabling message forwarding (or disable ssl
+ look at some captured traffic to see if you can understand what
might be happening).  Thanks.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-06 Thread Jon Siwek

On Thu, Sep 6, 2018 at 2:47 PM Azoff, Justin S  wrote:

> I just got 2 clusters upgraded from
>
> fa7fa5aa to
> 452eb0cb
>
> And now everything is broken..
>
> cpu and memory are through the roof across the board, as well as network 
> traffic, but it's not logging much.
>
> I may have created a message loop replacing the relay_rr stuff, but it's kind 
> of hard to tell.

The recent forwarding changes would be my main suspicion and, at least
in the default scripts, there's no communication patterns that
actually make use of the automatic forwarding, so can you check if
adding "redef Broker::forward_messages = F;" to site/local.bro makes a
difference?

If it does fix things, then yeah, either I missed a forwarding loop in
the default scripts or potentially you introduced one when replacing
relay_rr (feel free to point me at stuff to look over).

(Generally may want to just leave message forwarding turned off due to
these types of dangers if that's what it turns out to be...).

> I guess one observation is that it is really hard to tell what bro/broker are 
> doing.   Before you could minimally
> tcpdump the communication and see what events were being sent back and forth, 
> but now that is encrypted.

You can redef Broker::disable_ssl=T.  I don't recall how readable the
non-encrypted communications are, but I think I did it at least once
or twice and still was able to spot event names.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Bro 2.6-beta plans

2018-09-06 Thread Jon Siwek

On Wed, Sep 5, 2018 at 5:43 PM Michael Dopheide  wrote:
>
> To be clear, Cluster::relay_rr() is gone forever?  I’ll need to rewrite some 
> policies, but also update the blog to be correct.

Yes.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] attributes & named types

2018-09-06 Thread Jon Siwek

On Wed, Sep 5, 2018 at 6:47 PM Vern Paxson  wrote:
>
> > I think the right solution for this is to introduce a new BroType called
> > a NamedType.  A NamedType has an identifier, an underlying BroType, and a
> > set of attributes.  When a NamedType is constructed ...
>
> Huh, turns out there's already a "TypeType", which is the equivalent.
> All I need to do, I believe, is allow these to have attributes.

My recollection is TypeType is the type that a value gets when the
stored-value is actually just a type and doesn't quite fit what's
needed here.

Take the original example:

type a: table[count] of count  = 5;
type b: a _expire = 1 min;

Declaring a variable of type 'a' or type 'b' doesn't mean that the
values stored in that variable are types themselves, they're still
just storing values of the table-type (but with different attributes
depending on 'a' or 'b').

Other ideas I'm having for solving the overall problem are:

(1) 'a' and 'b' need to actually be distinct BroType's.  Instead of
'b' being a reference/alias to 'a' with extended attributes, just
create it as it's own BroType by first cloning 'a', then adding any
additional attributes.

(2) BroType could somehow store a mapping of identifier -> attributes
so that on declaring a variable of either 'a' or 'b', we can choose
which attributes apply.  But this is relying on the idea that you have
to push the desired attributes into each new Val because you can't
mutate the underlying the BroType since multiple Vals of either type
'a' or 'b' are going to be referencing it.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

[Bro-Dev] Bro 2.6-beta plans

2018-09-05 Thread Jon Siwek

There's no significant code changes/features planned to get added to
the master branch from now until the 2.6-beta gets released (maybe in
about a week).  Until that happens, please help test the latest master
branch and provide any feedback about how it's working if you can.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] [Bro-Commits] [git/bro] master: Allow loading policy/protocols/smb once again (57a505b0e)

2018-08-31 Thread Jon Siwek

On Thu, Aug 30, 2018 at 5:06 PM Johanna Amann  wrote:
>
> To pick up the idea that you mentioned before - do we also want to make
> the new policy/protocols/smb/__load__.bro trigger a reporter warning
> that it is deprecated?

Sounds right -- unlikely it will ever be used in the future and should
be removed (I don't see other policy/protocols/*/__load__.bro scripts,
so I think that's a general convention anyway that got broke in this
case since the intent was to eventually put it in base/).

Other problem is that a simple reporter warning via a call in
bro_init() doesn't give details on where the deprecated script was
loaded from, so I've added a @deprecated directive:

https://bro-tracker.atlassian.net/browse/BIT-1980

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Compatibilty script for policy/protocols/smb?

2018-08-30 Thread Jon Siwek

On Thu, Aug 30, 2018 at 9:50 AM Azoff, Justin S  wrote:

> fatal error in /bro/share/bro/site/local.bro, line 88: can't open 
> /bro/share/bro/policy/protocols/smb/__load__.bro
>
> I see in NEWS we have
>
> - The SMB scripts in policy/protocols/smb are now moved into 
> base/protocols/smb
>   and loaded/enabled by default.
>
> But should there be an empty script in there or something that does a 
> reporter warning telling people to update local.bro?

Thanks for pointing that out.

I'll put a placeholder at the old policy/ location, but also call out
in NEWS that such @loads can be removed from local.bro or other custom
scripts.

Or let me know if there's other ideas.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6)

2018-08-29 Thread Jon Siwek

On Wed, Aug 29, 2018 at 11:02 AM Johanna Amann  wrote:

> I actually tested it - and it works fine with old versions as long as
> you use the @if this way round.

Ah, tricky.  I can see how that would work now, thanks for clarifying.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6)

2018-08-29 Thread Jon Siwek

On Tue, Aug 28, 2018 at 6:35 PM Johanna Amann  wrote:

> +  If you use these events, you can make your scripts work on old and new 
> versions
> +  of Bro by wrapping the event definition in an @if, for example:
> +
> +@if ( Version::at_least("2.6") || ( Version::number == 20500 && 
> Version::info$commit >= [commit number of change] ) )
> +event ssl_client_hello(c: connection, version: count, record_version: 
> count, possible_ts: time, client_random: string, session_id: string, ciphers: 
> index_vec, comp_methods: index_vec)
> +@else
> +event ssl_client_hello(c: connection, version: count, possible_ts: time, 
> client_random: string, session_id: string, ciphers: index_vec)
> +@endif

Since the parser won't be happy with that type of @if usage in old
releases due to [1], should we instead suggest something like:

function my_ssl_client_hello_impl(c: connection, version: count,
possible_ts: time, client_random: string, session_id: string, ciphers:
index_vec, record_version: counter =0, comp_methods: index_vec
=index_vec())
{
# Copy existing code to here
}

@if ( Version::at_least("2.6") || ( Version::number == 20500 &&
Version::info$commit >= [commit number of change] ) )
event ssl_client_hello(c: connection, version: count, record_version:
count, possible_ts: time, client_random: string, session_id: string,
ciphers: index_vec, comp_methods: index_vec)
{ my_ssl_client_hello_impl(c, version, possible_ts, client_random,
session_id, ciphers, record_version, comp_methods); }
@else
event ssl_client_hello(c: connection, version: count, possible_ts:
time, client_random: string, session_id: string, ciphers: index_vec)
{ my_ssl_client_hello_impl(c, version, possible_ts, client_random,
session_id, ciphers); }
@endif

- Jon

[1] https://bro-tracker.atlassian.net/browse/BIT-1976
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Jira filter results

2018-08-28 Thread Jon Siwek

On Tue, Aug 28, 2018 at 12:48 PM Johanna Amann  wrote:

> "The filter configured for this gadget could not be retrieved. Please
> verify it is still valid on the issue navigator.".

Should be showing merge requests again.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] libmaxminddb configure issue

2018-08-23 Thread Jon Siwek

Thanks, does look like these wouldn't work as intended for CMake <
3.3, but I've merged Daniel's patch in to master now.

- Jon
On Thu, Aug 23, 2018 at 3:36 PM Michael Dopheide  wrote:
>
> Yeah, I just figured that out myself and rebuilt...
>
> bro -e "print lookup_location(8.8.8.8);"
> [country_code=US, region=, city=, 
> latitude=37.751, longitude=-97.822]
>
> Looks like you'll have the same issue with LibKRB5_FOUND (I didn't look for 
> others).
>
> -Dop
>
> On Thu, Aug 23, 2018 at 3:10 PM, Thayer, Daniel N  
> wrote:
>>
>> Could you try the following patch and let me know if it works for you:
>>
>> --- a/CMakeLists.txt
>> +++ b/CMakeLists.txt
>> @@ -134,7 +134,7 @@ include_directories(BEFORE
>>
>>  set(USE_GEOIP false)
>>  find_package(LibMMDB)
>> -if (LibMMDB_FOUND)
>> +if (LIBMMDB_FOUND)
>>  set(USE_GEOIP true)
>>  include_directories(BEFORE ${LibMMDB_INCLUDE_DIR})
>>  list(APPEND OPTLIBS ${LibMMDB_LIBRARY})
>>
>>
>> --
>>
>> From: bro-dev-boun...@bro.org [bro-dev-boun...@bro.org] on behalf of Michael 
>> Dopheide [dophe...@es.net]
>>
>> Sent: Thursday, August 23, 2018 2:16 PM
>>
>> To: 
>>
>> Subject: [Bro-Dev] libmaxminddb configure issue
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Johanna mentioned to me that libmaxminddb should be working now in master...
>>
>>
>>
>>
>>
>> So far I haven't been able to get 'configure' to find it, neither with the 
>> OS packages nor by installing libmaxminddb in /usr/local/ and specifying 
>> --with-geoip.
>>
>>
>>
>>
>>
>> This is CentOS 7.5.
>>
>>
>>
>>
>>
>> -Dop
>>
>>
>>
>>
>>
>>
>>
>
> ___
> bro-dev mailing list
> bro-dev@bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data layouts

2018-08-23 Thread Jon Siwek

On Thu, Aug 23, 2018 at 8:32 AM Dominik Charousset
 wrote:

> I’m a bit hesitant to rely on this header at the moment, because of:
>
> /// A Bro log-write message. Note that at the moment this should be used only
> /// by Bro itself as the arguments aren't publicly defined.
>
> Is the API stable enough on your end at this point to make it public?

The comment is just pointing out what was said about the log message
formats being opaque at the moment.  It's expected only Bro will be
able to make sense of the content.

> Also, there are LogCreate and LogWrite events. The LogCreate has the 
> `fields_data` (a list of field names?).

Yeah, there's some field info in there: names, types, optionality.
The type info in particularly doesn't seem good to treat as intended
for public consumption.

> Does that mean I need to receive the LogCreate even first to understand 
> successive LogWrite events? That would mean I cannot parse logs that had 
> their LogCreate event before I was able to subscribe to the topic.

Yeah, that's one problem, but a bigger issue is you can't parse
LogWrite because the content is a serial blob whose format is another
thing not intended for public consumption.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data layouts

2018-08-21 Thread Jon Siwek

On Tue, Aug 21, 2018 at 1:09 PM Robin Sommer  wrote:

> Also, this question is about events, not logs, right? Logs have a
> different wire format and they actually come with meta data describing
> their columns.

Though the Broker data corresponding to log entry content is also
opaque at the moment (I recall that was maybe for performance or
message volume optimization), but I suppose same reasoning as before
could apply: this info is internal to Bro operation unless one wants
to explicitly re-publish it via their own event for external
consumption.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data layouts

2018-08-21 Thread Jon Siwek

On Tue, Aug 21, 2018 at 8:54 AM Dominik Charousset
 wrote:

> This raises a couple of questions. Primarily: where can Broker users learn 
> the layouts to interpret received data?

broker/bro.hh is basically all there is right now.  e.g. if you
construct a broker::bro::Event from a received broker::data, you get
access to event name + "interpretable" arguments.

> There’s essentially no hope in deferring a layout, since broker::data doesn’t 
> include any meta information such as field names. Is meta information stored 
> somewhere?

No, nothing like that is in implicit in message content.

> Is there a convention how to place and retrieve such meta information in 
> Broker’s data stores?

No, and any stores created by Bro don't even have such meta info.  The
types stored in them are just documented like "the keys are type Foo
and values are type Bar".

> How does Bro itself make such information available?

Nothing beyond documentation or the Bro -> Broker type mapping that's
implicit in events themselves (or as given in docs for data stores).

> Is there a document that lists all topics used by Bro with their respective 
> broker::data layout?

I don't think there's a plan to keep such an up-to-date document.

A basic usage premise that I'm wondering about is that none of the
current Broker usage in Bro actually seems suitable for generic/public
consumption as it is.  It's maybe more implementation details of doing
cluster-enabled network traffic analysis, so also not a primary goal
to make interpretation of those communications easy for
external/direct Broker users.  (You can ingest it if you want and do
the work of "manually" interpreting it all, but maybe won't be a
stable/transparent source of data going forward).

However, one can still use Bro + Broker to create their own
events/stores in a way that does contain the meta information required
for easier/programmatic interpretation on the receiving side.  e.g. I
think, at the moment, if one is interested in ingesting data produced
by Bro, they are best served by explicitly defining topic names,
event/data types, and explicitly producing those messages at suitable
places within Bro scripts themselves.  Then, one can be in control of
defining a common/expected data format and include whatever meta
information is necessary to help receivers interpret the data.

Maybe there's a more standardized approach that could be worked
towards, but likely we just need more experience in understanding and
defining common use-cases for external Bro data consumption.

Or if we were just talking about Broker-only usage independent of Bro,
then I think it's still the same ideas/answers: currently up to user
to decide how to encode broker::data in a way that defines
common/expected layouts + any required meta info.

Does that help at all?

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Use of 'any' type

2018-08-16 Thread Jon Siwek

In the master branch, there are also type checking/casting 'is' and
'as' operators [1] and type-based switch statement [2] that may be be
useful.

- Jon

[1] https://www.bro.org/sphinx-git/script-reference/operators.html
[2] 
https://www.bro.org/sphinx-git/script-reference/statements.html#keyword-switch

On Thu, Aug 16, 2018 at 4:24 PM Jim Mellander  wrote:
>
> Thanks, Johanna - I think type_name() may suffice for the purposes I am 
> envisioning.
>
> On Thu, Aug 16, 2018 at 1:57 PM, Johanna Amann  wrote:
>>
>> Hi Jim,
>>
>> On 16 Aug 2018, at 13:40, Jim Mellander wrote:
>>
>>> It would be most convenient if the 'any' type could defer type checking
>>> until runtime at the script level.
>>>
>>> For instance, if both A & B are defined as type 'any', a compile time error
>>>
>>> "illegal comparison (A < B)"
>>>
>>> occurs upon encountering a bro statement
>>>
>>> if (A < B) do_something();
>>>
>>> even if the actual values stored in A & B at runtime are integral types for
>>> which comparison makes sense.
>>
>>
>> I think this is a bit hard to do with how things are set up at the moment 
>> internally - and it also does make type-checking at startup less 
>> possible-helpful.
>>
>> However...
>>
>>>
>>> If the decision could be made at runtime (which could then potentially
>>> throw an error), a number of useful generic functions could be created at
>>> the script level, rather than creating yet-another-bif.  A useful
>>> yet-another-bif would be 'typeof' to allow varying code paths based on the
>>> type of value actually stored in 'any'.
>>
>>
>> This already exists and I think you can actually use it to write code like 
>> that; you just have to cast your any-type to the correct type first. The 
>> function you want is type_name; it is e.g. used in base/utils/json.bro.
>>
>> Johanna
>
>
> ___
> bro-dev mailing list
> bro-dev@bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] reproducible segfault in master branch

2018-08-15 Thread Jon Siwek

On Tue, Aug 14, 2018 at 10:26 PM Michael Dopheide  wrote:
>
> Somehow related to Broker stores and/or casting.

You'll get a better error message/behavior now using:

https://github.com/bro/bro/commit/f336c8c710bdeb41eb0aba88967ee90da24848b2

But ultimately, you likely want to do something like this patch:

```
--- known-hosts-with-dns.bro.orig 2018-08-15 11:07:41.0 -0500
+++ known-hosts-with-dns.bro 2018-08-15 10:44:03.0 -0500
@@ -113,7 +113,7 @@
  for (ip in r$result as addr_set){
  when ( local res = Broker::get(Known::host_store$store,ip)){

- if(res?$result){
+ if(res$status == Broker::SUCCESS){
  @if ( ! Cluster::is_enabled() )
  Known::hosts[ip] = fmt("%s",res$result as string);
  @else
```

As for why some keys no longer exist in those lookups immediately
after retrieving the full key set: my guess is they simply expired
between those two points in time, but I didn't dig into it.  The main
point would be to never assume the Broker::get() call succeeds, which
was likely your intent, except "res?$result" is always true (another
form of checking the data exists would be "res$result?$data").

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Jon Siwek

On Tue, Aug 14, 2018 at 12:09 PM Jan Grashöfer  wrote:
>
> On 13/08/18 18:24, Jon Siwek wrote:
> > Old Worker:
> >
> >Cluster::relay_rr(Cluster::proxy_pool, my_event);
> >
> > New Worker:
> >
> >Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event);
>
> That doesn't look like an API simplification to me ;D

The goal here I imagine is rather to avoid releasing a function that
we knowingly plan to remove later.  A user would have to eventually
port all Cluster::relay_rr() calls, but that Broker::publish() pattern
remains valid.

> > Even Newer Worker:
> >
> >Broker::publish(Cluster::worker_topic, my_event);
> >
> > See any problems there?
>
> For this case: Would it be easy to setup distinct pools for different
> tasks? I could imagine a pool of proxies that is used explicitly for
> intel distribution and one pool used for processing SumStats events. I
> think we have discussed something like that before.

Yeah, it would still be possible to define your own pool and use it
for your own purposes and it looks similar to the call before:

 Broker::publish(Cluster::rr_topic(Cluster::my_pool), my_event);

A difference in the context of our needs for the cluster communication
is that the pool is being used as a means of achieving routing (in a
load-balanced fashion) and so the call gets simplified once those
mechanisms get built into Broker routing.  In your case, you don't
need the routing aspect, just the load-balancing provided by the
"pool" concept.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-14 Thread Jon Siwek

On Tue, Aug 14, 2018 at 10:13 AM Robin Sommer  wrote:

> One more question: what about raising published events locally as well
> if the sending node is subscribed to the topic? I'm kind of torn on
> that. I don't think we want that as a default, but perhaps as an
> option, either with the publish() call or, likely better, with the
> subscribe() call? I can see that being helpful in cases like unifying
> standalone vs cluster operation; and more generally, for running
> multiple node types inside the same Bro instance.

Not sure, is Broker::auto_publish() currently able to do the same thing?

e.g. if I want an event to be raised locally, I raise it via "event"
and it automatically gets published.

I can also see the opposite being intuitive: If I told
Broker::subscribe() to raise locally, then I get just always use
Broker::publish() and not think about the difference between using
"event" versus "publish".  Would Broker::auto_publish() be removable
then?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Writing SumStats plugin

2018-08-13 Thread Jon Siwek

On Tue, Aug 7, 2018 at 5:15 PM Jim Mellander  wrote:

> Incidentally, I think theres a bug in the observe() function:
>
> These two lines are run in the loop thru the reducers:
>if ( r?$normalize_key )
> key = r$normalize_key(copy(key));
> which has the effect of modifying the key for subsequent loops, rather than 
> just for the one reducer it applies to.  The fix is easy and and obvious

Yeah, looked wrong to me also.  Fixed via [1] in master branch now.
Sorry I don't have much knowledge of the existing sumstats code to
drive the other discussion/suggestions forward.

- Jon

https://github.com/bro/bro/commit/5821c16490e731a68c0efc9c1aaba2d7aec28f48
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-13 Thread Jon Siwek

On Mon, Aug 13, 2018 at 8:09 AM Jan Grashöfer  wrote:
>
> On 10/08/18 17:12, Robin Sommer wrote:
> > I hear you, but I think I haven't quite understood the concern yet.
> > Can you give me an example where the difference matters? What's
> > different between publishing intel events to bro/cluster/worker/intel
> > vs bro/cluster/worker if both go to all workers? Or is it so that some
> > workers can decide not to receive the intel events?
>
> The use case I had in my mind is an external application that is
> interested in interfacing with the intelligence framework. Either for
> querying it similar to workers of for managing purposes. If possible, it
> could be beneficial for such an application to receive only the relevant
> parts of cluster communication.
>
> On 10/08/18 17:52, Jon Siwek wrote:
> > (1) if the event you're publishing just facilitates scalable cluster
> > analysis: you'd tend to use the topic names which target node classes
> > within a cluster (eventually this might be "bro//worker")
> >
> > (2) if the event you're publishing is intended for external
> > consumption, then you should use a topic which describes some specific
> > qualities of the message (e.g. "jan/intel")
>
> The case described above seems to be both. On the one hand the primary
> use case is internal cluster communication. On the other hand it feels
> quite natural to dock here for external applications. Another
> (debatable) use case might be directly interfacing the configuration
> framework, skipping the configuration file layer.

I'm generally thinking there's nothing stopping one from picking a new
topic name to re-publish some set of events under.  Would that be
possible in the case you're imagining?

I don't think we're going to come up with a general (or enforce-able)
way of picking topic names such that they'll be useful for any
arbitrary, external use-case.  So we pick the topic name that is best
for the use-case we have at time of writing a script (e.g. we just
want to get it working on a cluster so use the pre-existing topics
that are available for that), and then let others re-publish a subset
of events under different topics dependent on their specific use-case.

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-13 Thread Jon Siwek

On Fri, Aug 10, 2018 at 11:47 AM Azoff, Justin S  wrote:

> If relay is removed how does a script writer efficiently get an event from 
> one worker (or manager)
> to all of the other workers?

Old Worker:

  Cluster::relay_rr(Cluster::proxy_pool, my_event);

New Worker:

  Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event);

New Proxy:

  event my_event() { Broker::publish(Cluster::worker_topic, my_event); }

So the proxy has additional overhead of the proxy's event handler.  I
doubt that's much a problem from the "efficiency" standpoint, but if
it were, then just having more proxies helps.  Once real routing were
available the code would still work or you could opt to change to
just:

Even Newer Worker:

  Broker::publish(Cluster::worker_topic, my_event);

See any problems there?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Jon Siwek

On Fri, Aug 10, 2018 at 8:29 AM Jan Grashöfer  wrote:

> > Let me try to phrase it differently: If there's already a topic for a
> > use case, it's better to use it. That's easier and less error-prone.
> > So if, e.g., I want to send my script's data to all workers,
> > publishing to bro/cluster/worker will do the job. And that will even
> > automatically adapt if things get more complex later.
>
> Maybe a silly question: Would that work using further "specialized"
> topics like bro/cluster/worker/intel? From my understanding one feature
> of topics is that one would be able to subscribe only the the things
> that one is interested in. Having a bunch of events just published to
> bro/cluster/worker seems counterproductive.

Yeah, topic use-cases may need clarification.  There's one desire to
use topics as a way to specify known destination(s) within a cluster.
Another desire could be using the topic name to hierarchically
summarize/describe a quality of the message content in order to share
with the external world.  Maybe the thing that's currently unclear is
what the intended borders are for information sharing?  I break it
down like:

(1) if the event you're publishing just facilitates scalable cluster
analysis: you'd tend to use the topic names which target node classes
within a cluster (eventually this might be "bro//worker")

(2) if the event you're publishing is intended for external
consumption, then you should use a topic which describes some specific
qualities of the message (e.g. "jan/intel")

Events that fall under (1) don't need to be descriptive since we don't
want to encourage people to arbitrarily start subscribing to events
that act as the details for how cluster analysis is implemented.  Or I
guess if they do subscribe, then they are the kind of person that's
more interested in inspecting the cluster's performance/communication
characteristics anyway.

I'd also say that (2) is a user decision -- they need to be the one to
decide if their cluster has produced some bit of information worthy of
sharing to the external world and then publish it under a suitable
topic name.

That make sense?

- Jon

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-10 Thread Jon Siwek

On Thu, Aug 9, 2018 at 1:29 PM Robin Sommer  wrote:

> > (1) enable the "explicit/manual" forwarding by default?
>
> Coming from that assumption above, I'd say yes here, doing it like you
> suggest: differentiate between forwarding and locally raising an event
> by topic. Maybe instead of adding it to Broker::subscribe() as a
> boolean, we add a separate "Broker::forward(topic_prefix)" function,
> and use that to essentially hardcode forwarding on each node just like
> we want/need for the cluster. Behind the scenes Broker could still
> just store the information as a boolean, but API-wise it means we can
> later (once we have real routing) just rip out the forward() calls and
> let Magic take its role. :)

Not sure there'd be anywhere we'd currently use Broker::forward() ?
Or is it a matter of "if a user needed it for something, then it's
available" ?

The only intra-cluster communication that's more than 1 hop at the
moment is worker-worker, but setting up a Broker::forward() route
wouldn't be my first thought as it's not currently a scalable
approach.  I'd instead take the cautious approach of relaying via a
RR-proxy so one can add proxies to handle more load as needed.

However, I can see Broker::forward() could make it a bit easier for a
user wanting to manually set up a forwarding route between clusters or
other external applications.  Is that a clear use-case we need to
cater to now?  If so, then it would indeed be just saying "hey,
Broker::forward() is now a no-op since Broker has real routing
mechanisms and you can remove them".

> As you say, we don't get load-balancing that way (today), but we still
> have pools for distributing analyses (like the known-* scripts do).
> And if distributing message load (like the Intel scripts do) is
> necessary, I think pools can solve that as well: we could use a RR
> proxy pool and funnel it through script-land there: send to one proxy
> and have an event handler there that triggers a new event to publish
> it back out to the workers. For proxies, that kind of additional load
> should be fine (if load-balancing is even necessary at all; just going
> through a single forwarding node might just as well be fine.

Seems more prudent not to guess whether a single, hardcoded forwarding
node is good enough when writing the default cluster-enabled scripts.
RR via proxy is not just load-balancing either, but fault-tolerance as
well.

But here you're talking more about removing the relay() functions and
doing the RR-via-proxy "manually", right?  That seems ok to me -- once
"real" routing is available, you then have the option to simplify your
script and get a minor optimization by not having to manually
handle+forward the event on proxies.

> > (2) re-implement any existing subscription cycles?
>
> Now, here I'm starting to change my mind a bit. Maybe in the end, in
> large topologies, it would be futile to insist on not having cycles
> after all. The assumption above doesn't care about it, putting Broker
> in charge of figuring it out. So with that, if we can set up
> forwarding through (1) in a way that cycles in subscriptions don't
> matter, it may be fine to just leave them in. But I guess in the end
> it doesn't matter, removing them can only make things better/easier.

Again, I think we wouldn't have any Broker::forward() usages in the
default cluster setup, but simply enabling the forwarding of messages
at the Broker-layer would currently cause some messages to route in a
cycle.  Enabling the current message forwarding means we need to
re-implement existing subscription cycles.  If we instead waited for
the "real" routing, then it doesn't matter if we leave them in.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-09 Thread Jon Siwek

On Wed, Aug 8, 2018 at 2:50 PM Robin Sommer  wrote:

> > * enable message forwarding by default (meaning re-implement the one
> > or two subscription patterns that might create a cycle)
>
> Haven't quite made up my mind on this one. In principlel yes, but
> right now a host needs to be subscribed to a topic to forward it if I
> remember than right. That may limit how we use topics, not sure (e.g.,
> if a worker wanted to talk to other workers, with "real"
> forwarding/routing they'd just publish to the worker topic and that
> message would get routed there, but not be processed at the
> intermediary hops as well. With our current forwarding, the hops would
> need to subscribe to the worker topic as well and hence the event got
> raised there, too.)

Yeah, that's how I also understand the current mechanisms would work.

Maybe can split it into two separate questions:

(1) enable the "explicit/manual" forwarding by default?
(2) re-implement any existing subscription cycles?

Answer to (2) may pragmatically be "yes" because they'd be known to
cause problems if ever (1) did become enabled (and also could be
problematic for a more sophisticated/automatic/implicit routing system
should that become available in the future... at least I think it's a
problem, but then again maybe connection-cycles would also still be a
problem at that point, not quite sure).

Answer to (1) may be "no" because we don't have a use for it at the
moment -- having the forwarding-nodes also raise events is not ideal,
but if we solved that would it be useful?  Maybe an idea would be
extend the subscribe() API in Bro:

function Broker::subscribe(topic_prefix: string, forward_only:
bool =F);

I recall that we have access to both the message/event as well as the
topic string on the receiver side, so could be possible to detect
whether or not to raise the event depending on whether the topic only
has a matching subscription prefix that is marked as forward_only.

With that you could do something like:

# On Manager
Broker::subscribe(worker_to_worker_topic, T);

# On Worker
Broker::subscribe(worker_to_worker_topic);
Broker::publish(worker_to_worker_topic, my_event);

There, my_event would be distributed from one worker to all workers
via the manager, but not sure that's as usable/dynamic as the current
"relay" mechanism because you also get a load-balancing scheme to go
along with it.  Here, you'd only ever want to pick a single manager or
proxy to do the forwarding (subscribing like this on all proxies
causes all proxies to forward to all workers resulting in undesired
event duplication.)

So I guess that's still to say I'm not sure what the use of the
current forwarding mechanism would be if it were enabled.  Also maybe
begs the question for later regarding the "real" routing mechanism: I
suppose that would need to be smart enough to do automatic
load-balancing in the case of there being more than one route to a
subscriber.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Jon Siwek

On Wed, Aug 8, 2018 at 10:55 AM Robin Sommer  wrote:

> That's actually something I realized yesterday: we don't have direct
> worker-to-worker communication right now, correct? A worker cannot
> just publish to "bro/cluster/workers".

Right, here's a crude graphic of the cluster layout from the docs:

https://github.com/bro/bro/blob/master/doc/frameworks/broker/cluster-layout.png

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-08 Thread Jon Siwek

On Wed, Aug 8, 2018 at 10:53 AM Robin Sommer  wrote:
>
> Yeah, I realize that. A direct port of the old logic was of course the
> goal so far, with the drawbacks of that approach accepted &
> understood. That's what's in place now; that's great and exactly as
> planned. We can get 2.6 out this way, and it'll be fine.

I'm earnestly probing to try to get a better decomposition of the
issues that make it hard to understand cluster communication patterns.

There's the exercise of trying to answer "what *is* this script
doing?" and then there's also trying to answer "what *should* it be
doing?".

I seldom felt like I had definitive answers for the later, but I can
see how it would be beneficial to do that and also broader
script/framework makeovers, possibly before 2.6, because it would help
inform whether new APIs are catering to "good" use-cases.  Though my
thinking is it's not critical to get a 100% API/use-case match off the
bat and that there's some actionable stuff to take away from this
thread that is at least going to have us heading in a better direction
sooner rather than later...

> My point is that now also seems like a good time to take stock of what
> we got that way. That direct porting is finally getting us some sense
> of where things aren't an ideal match between API and use cases yet.
> And if there's something easy we can do about that before people start
> relying on the new API, it seems that would be beneficial to do. But
> we can see.

Yeah, agreed.  What I've taken away from your earlier points is that
these smaller changes are seeming like they'd be beneficial to do
before 2.6:

* publish() API simplifications/compressions (pending decision on
exactly what those should be)
* enable message forwarding by default (meaning re-implement the one
or two subscription patterns that might create a cycle)
* see if any script-specific topics can instead use a pre-existing
"cluster" topic

What do you think?

A separate question/idea I just had was whether how much of the
process of auditing the subscriptions and communication patterns was
difficult due to having to hunt down things in various scripts and
whether a more centralized config could be something to do?  e.g. I
don't know how the details would work out, but I'm imagining a
workflow where one edits a centralized config file with
subscription/node info in it and that auto-generates the code to set
them up.  Sort of like working backward from the info in the PDF you
shared.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-07 Thread Jon Siwek

On Mon, Aug 6, 2018 at 3:00 PM Robin Sommer  wrote:

> Overall I have to say I found it pretty hard to follow this all
> because we don't have much consistency right now in how scripts
> structure their communication. That's not surprising, given that we're
> just starting to use all this, but it suggests that we have room for
> improvement in our abstractions. :)

How much is due to new API usage and how much is due to things mainly
being a direct port of old communication patterns (which I guess are
written by various people over extended lengths of time and so there's
inconsistencies to be expected) ?  Or due to being a mishmash of both
old and new?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-07 Thread Jon Siwek

On Mon, Aug 6, 2018 at 1:57 PM Robin Sommer  wrote:

> I have another question about this specific case: we use relay_rr()
> only for sending Intel::insert_indicator. Intel::remove_indicator gets
> published normally through auto_publish(). Why the difference?

Potentially no reason other than no one reviewed whether it had
potential to be optimized in a similar way.  e.g. I first ported
scripts in a direct fashion without trying to change too much
structurally about comm. patterns or doing any optimization except in
cases where a change was specifically talked about.  I only recall
Justin had called out Intel::insert_indicator, so it got changed.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-08-03 Thread Jon Siwek

On Fri, Aug 3, 2018 at 12:22 PM Robin Sommer  wrote:

> On Fri, Jul 27, 2018 at 10:39 -0700, I wrote:
>
> > Broker::relay(change_topic, change_topic, Config::cluster_set_option, ID, 
> > val, location);
>
> Can somebody remind me what the use-case is for changing the topic on
> relay? Grepping over our standard scripts, I see only one use of
> relay(), and that's the one above.

Another use is hidden within Cluster::relay_rr():

event Intel::new_item(item: Item) =5
{
if ( Cluster::proxy_pool$alive_count == 0 )
Broker::publish(indicator_topic, Intel::insert_indicator, item);
else
Cluster::relay_rr(Cluster::proxy_pool, "Intel::new_item_relay_rr",
  indicator_topic, Intel::insert_indicator, item);
}

That is, if the manager is currently connected to some proxy, it picks
one to do the work of distributing the event to workers.  Manager
sends 1 message instead of N.

I don't know if there's currently other use-cases for Broker::relay
specifically, but Cluster::relay_rr/Cluster::relay_hrw is essentially
an extension of that which just also does the work of choosing the
initial topic based upon a given pool and partition strategy.

Might have been Justin who originally pointed out potential for
avoiding manager overload in this way.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Jon Siwek

On Mon, Jul 30, 2018 at 12:16 PM Robin Sommer  wrote:

> > I'd be more comfortable if one could automate answering the question:
> > "if I add a subscription to a given node in the network, will I create
> > a cycle?".
>
> Hmm ... What about a test mode where we'd spin up a dummy cluster
> (similar to what the bests do), have each node send a message to all
> subscribed topics, and watch for TTL violations?

Seems clunky and could get dicey -- subscriptions that
transient/dynamic may not be well-tested and you'd probably want to
guarantee that sending such a dummy message actually does not result
in any side-effects at the Bro-layer.  If nodes start raising random
events at unusual/unintended times I start to doubt the stability of
things.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Jon Siwek

On Mon, Jul 30, 2018 at 11:02 AM Robin Sommer  wrote:

> True, although it's not cycles in the connection topology that matter,
> it's cycles in topic subscriptions.

Right, good point.

> I need to think about this a bit
> more (and I need to remind myself how our topics currently look like)

I think we just have the "broadcast_topic" to which all nodes
subscribe, but not sure if there's more.

> but could we set up topics so that even in a cluster, messages don't
> go into a cycle?

I don't see why not, but it takes planning and prudence on everyone's
part (including users) to not break that rule.

I'd be more comfortable if one could automate answering the question:
"if I add a subscription to a given node in the network, will I create
a cycle?".

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-07-30 Thread Jon Siwek

On Fri, Jul 27, 2018 at 7:30 PM Azoff, Justin S  wrote:
>
>
> > On Jul 27, 2018, at 6:10 PM, Jon Siwek  wrote:
> >
> > On Fri, Jul 27, 2018 at 3:55 PM Azoff, Justin S  wrote:
> >
> >> I do agree that there's room for a lot of simplification, for example a 
> >> worker broadcasting a message efficiently to all
> >> other workers needs to do something like this from the docs:
> >>
> >>Cluster::relay_rr(Cluster::proxy_pool, "example_key",
> >>  Cluster::worker_topic, worker_to_workers,
> >>  Cluster::node + " (via a proxy)");
> >>
> >> But a lot of that could have defaults:
> >>
> >> Most use cases would want to relay through the default proxy pool
> >> Since round robin is in use, they key shouldn't matter.
> >
> > At the moment, one could write their own wrapper function around that
> > if they find it too verbose and always want to use certain defaults?
>
> Yeah.. The wrapper would be trivial.. Should bro include it so that the API 
> scripts use is simpler?

Maybe.  We can see how it fits in the mix of what Robin suggested:

  # Supports variadic args in place of Broker::Event.
  Broker::publish(topic: string, args: Broker::Event, relay_topic:
string ="", process_on_relayer: bool =F)

  # Supports variadic args in place of Broker::Event.
  Cluster::publish(pool: Cluster::pool, key: any, strategy: enum,
args: Broker::Event, relay_topic: string ="",
process_on_relayer: bool =F)

  # Supports variadic args in place of Broker::Event.  Use proxy pool
and RR method w/ arbitrary, internal key by default.
  Cluster::publish_via_proxy(relay_topic: string, args: Broker::Event)

That last one being the wrapper you're asking for.  Also, I compressed
the ideas of having a separate "relay: bool" / "relay_hops: int" and
"relay_topic: string" args -- a non-empty relay topic implicitly means
you want to enable relaying on the receiving node.  Thinking more
about original idea of giving the number of relay hops: it may be
better to leave that until Broker multihop is more robust and allow
it's automatic forwarding mechanisms to take care of those scenarios
whereas a "relay" is a simple mechanism at the Bro application level
(has it's own unique message format) that serves to aid load-balancing
use-cases (rather than routing use-cases).

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-07-27 Thread Jon Siwek

On Fri, Jul 27, 2018 at 3:55 PM Azoff, Justin S  wrote:

> I do agree that there's room for a lot of simplification, for example a 
> worker broadcasting a message efficiently to all
> other workers needs to do something like this from the docs:
>
> Cluster::relay_rr(Cluster::proxy_pool, "example_key",
>   Cluster::worker_topic, worker_to_workers,
>   Cluster::node + " (via a proxy)");
>
> But a lot of that could have defaults:
>
> Most use cases would want to relay through the default proxy pool
> Since round robin is in use, they key shouldn't matter.

At the moment, one could write their own wrapper function around that
if they find it too verbose and always want to use certain defaults?

> The round robin part itself is really an implementation detail for proxy load 
> balancing and maybe not something that
> should be exposed in the API.  Now that I think of it I'm not sure why one 
> would ever use relay_hrw over relay_rr.

Theoretically, a more favorable load distribution that's consistent
over time?  e.g. if you do RR of the same messaging pattern from
multiple nodes, you could have waves of "randomly" overlapping loads
on the relayer-node since everyone is cycling through all the proxies
at their own rate when choosing the relayer.  With HRW, you'd stick
with the same relayer over time and only change on outages, but
everyone should have chosen their relayer in a uniformly distributed
fashion.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker::publish API

2018-07-27 Thread Jon Siwek

On Fri, Jul 27, 2018 at 12:40 PM Robin Sommer  wrote:

> I'm wondering if we should give it another try to simply this API
> while we still can (i.e., before 2.6 goes out). To me, the most
> intuitive publish operation is "send to topic T and propagate to
> everybody subscribed to that topic". I'd structure the API around
> that, making that the main publish function for that simply:
>
> Broker::publish(topic, args);
>
> That would send to all neighbors, which then process locally and relay
> to their neighbors. Right now, that would propagate just across one
> hop but once we have multihop that'd start being broadcasted out
> broadly.

Can you remind/clarify what's meant by "multihop" ?  I thought:

Broker already has manual multihop if you set up subscriptions on all
relevant nodes on the path yourself.  Bro doesn't use it right now.

Broker does not yet have automatic multihop where subscriptions are
globally flooded automatically.

A difference between "manual multihop" and "automatic multihop" would
be that in the later, some relaying nodes may not actually hold a
subscription to the message they are relaying and so, in the case of
Bro events, I think they would not process them locally.

> - Give publish() another argument "relay: bool =T" to prevent
>   it from going beyond the immediate receiver. Or maybe instead:
>   "relay_hops: int =-1" to specify the max number of hops
>   to relay across, with -1 meaning no limit.

Going with the generalized approach of configurable number of hops per
message via "relay_hops" from the start would be better than finding
out we need it later.

Possibly a downside is now you need to store original hop limit in
addition to current TTL in each message if you want to detect the "is
1st hop" condition for the "relay_topic" option below.

> . (I recall concerns
>   about loops being too easy to create; we could set the default
>   here to F/0 to default to no forwarding, although conceptually I
>   don't really like that :-)

It's maybe both a concern and a reality -- Bro clusters currently
contain cycles (e.g. worker -> manager -> proxy -> worker)

> - Give publish() another argument "relay_topic: string =""
>   to change the topic when relaying on the 1st hop.
>
> - Give publish() another argument "process_on_relays: bool =T"
>   to change whether a relaying hop also sees the event locally.

Those seem fine to me.

> - Add a second function publish_pool() that has all the same
>   options, but receives a pool type instead of a topic (just an
>   enum: RR, HRW).

What's the goal of the enums instead of just publish_hrw() and publish_rr() ?

Instead of the API being 2 functions, it then seems like 2 enums that
are never used elsewhere + 1 function that now always branches
internally.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] error compiling master

2018-07-10 Thread Jon Siwek

On Tue, Jul 10, 2018 at 2:10 PM Aashish Sharma  wrote:

> [ 96%] Building CXX object 
> libcaf_openssl/CMakeFiles/libcaf_openssl_shared.dir/src/manager.cpp.o
> clang: warning: argument unused during compilation: '-L/lib'
> /usr/local/src/master/aux/broker/3rdparty/caf/libcaf_openssl/src/manager.cpp:100:8:
>  error: use of undeclared identifier 'get_or'
>   if (!get_or(config(), "middleman.attach-utility-actors", false))
>^

This is at least an odd error if it were actually finding the CAF
headers that are embedded in the Bro source tree, so a guess is that
you have an old version of CAF (AKA actor-framework or libcaf)
installed in a system directory and it is finding those instead.  Can
you check?  A specific header file in question here would be
"caf/config_value.hpp".

And for reference, building on a fresh FreeBSD 10.3 system +
bro/master checkout worked for me just now, which may also suggest a
problem in the system setup (or else an inconsistent Bro
checkout/configuration, but it seems like we are both starting from a
fresh clone).

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] patterns and &&/|| vs. &/| operators

2018-06-21 Thread Jon Siwek

On Thu, Jun 21, 2018 at 4:25 PM Vern Paxson  wrote:
>
> > though maybe p1 + p2 would be even better at expressing that
> > concatenation is happening?
>
> I think this is somewhat problematic, since '+' already has a
> regular-expression meaning which is different.  In addition, '&' is
> a more natural dual to '|' than '+' is.

Yeah, agree w/ that.

> Interestingly, I discovered that we have a BIF merge_pattern(p1, p2) which
> does the same thing as "p1 & p2" (in the new syntax).  As best as I can
> tell it's not used anywhere - plus it's funky (only allows itself to be
> called if Bro isn't processing traffic yet).  Perhaps we can deprecate it, 
> too?

If there actually is no (longer) problems with concatenating patterns
at run-time, I'd agree to deprecate.

I'm imagine it existed because there was such a problem with
dynamically creating patterns at run-time, but don't know/remember
what it was.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] v += e (Re: set and vector operators)

2018-06-21 Thread Jon Siwek

On Thu, Jun 21, 2018 at 4:23 PM Vern Paxson  wrote:
>
> > > (3) Implement "v += e" to mean "append the element e to the vector v".
> >
> > Do we want to do this now, or should we potentially wait a release-cycle
> > with it (to prevent the situation where v + e and v+= e means something
> > different).
>
> Turns out that "v += e" currently generates an error ("requires two
> arithmetic or two string operands"), so it seems safe to me to introduce
> it as append-to-vector concurrent with deprecating "v + e" (which from the
> discussion on the list it seems likely current has little or no use).

Yeah, seems fine for v += e and v + e to temporarily have different
meanings given that the later will, for some time, have a clear
deprecation/warning message for each use (in any rare cases where it
might actually be used).  So my current POV is everything you outlined
looks good to do whenever.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] patterns and &&/|| vs. &/| operators

2018-06-21 Thread Jon Siwek

On Thu, Jun 21, 2018 at 1:34 PM Vern Paxson  wrote:
>
> > The meaning of "p1 & p2" would be "yields a pattern that matches both
> > p1 and p2" versus the meaning of "p1 && p2" currently being "yields a
> > pattern that matches a p1 followed by a p2" ?
>
> No, p1 & p2 would be the new way to express p1 && p2.
>
> > I'd generally say that deprecating (emit a warning message pointing to
> > each usage) for a time period is a more cautious approach.
>
> Easy 'nuf, though I'd be amazed if anyone is using p1 && p2 given it's
> not documented and not intuitive!

Ok, I see now, yeah & would be better than && for that semantic,
though maybe p1 + p2 would be even better at expressing that
concatenation is happening?

I also notice from [1]:

‘r/s’: an ‘r’ but only if it is followed by an ‘s’ ...

Maybe another option?

Just making suggestions since I didn't quite get what p1 & p2 would do at first.

- Jon

[1] http://westes.github.io/flex/manual/Patterns.html

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] set and vector operators

2018-06-21 Thread Jon Siwek

On Fri, Jun 15, 2018 at 7:40 PM Vern Paxson  wrote:

> (1) Add bitwise operators on "count" variables for &, | and ~.

Yeah, looks like everyone was in favor of those.

> (2) Deprecate element-wise arithmetic operations on vectors, such
> as "v * 3" meaning "multiply each element of v by 3".  Perhaps down
> the road we'll introduce syntax that flags things like "for this
> expression, vectorize it".
>
> (3) Implement "v += e" to mean "append the element e to the vector v".
>
> (4) Wait on whether "v + e" should mean "return a vector that is v with
> the element e appended".  (And indeed we can't do this right now if
> we're #2.)

This sounds like what everyone was thinking.  My suggestion for the
timing of these would be to implement the deprecation (2) by itself
ASAP, at least before the 2.6 release, and shortly after 2.6 it can be
removed in git/master and both (3) and (4) done.  Hopefully 2.6 is not
terribly far away.

> (5) Keep "add" and "delete" for manipulating sets in terms of individual
> elements.
>
> (6) Add "s1 & s2", "s1 | s2", and "s1 - s2" as intersection, union,
> and set difference.

Yeah, sounds right.

> I'm not clear whether we reached agreement on:
>
> (7?) Add "s1 &= s2" etc. to mean "s1 = s1 & s2".  The advantage of
>  having this as an operator is it might more easily enable efficient
>  implementation of some set operations for big sets.  I suppose
>  if we have it then we'd be expected to also have:
> (7') "c1 &= c2" etc., i.e., bitwise assignments for "count"
>  variables.
>
> Please chime in if you remember where we wound up differently, and also
> whether you disagree with #7.

I don't see much previous discussion around these, though they all
make sense to me.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] patterns and &&/|| vs. &/| operators

2018-06-21 Thread Jon Siwek

On Tue, Jun 19, 2018 at 1:21 PM Vern Paxson  wrote:

> Proposal: as part of adding bitwise &/| operators for counts, I'll
> also implement &/| operators for patterns, and remove the current
> &&/|| functionality.

The meaning of "p1 & p2" would be "yields a pattern that matches both
p1 and p2" versus the meaning of "p1 && p2" currently being "yields a
pattern that matches a p1 followed by a p2" ?

I'd generally say that deprecating (emit a warning message pointing to
each usage) for a time period is a more cautious approach.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-13 Thread Jon Siwek

On Tue, Jun 12, 2018 at 4:26 PM Robin Sommer  wrote:

> One question about Broker's endpoint::advance_time(): that's locking
> on each call when in pcap mode, i.e., once per packet. Could we limit
> that to cases where something actually needs to be done? Quick idea:
> use an atomic<> for current_time plus another atomic<> counter
> tracking if there are any pending message. And go into the locked case
> only if the latter is non-zero?

Yes, I've done that in master, but didn't measure a notable difference.

> > General changes/improvements in CAF itself may be warranted here

Made an issue for that here:

https://github.com/actor-framework/actor-framework/issues/699

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-12 Thread Jon Siwek

On Mon, Jun 11, 2018 at 6:30 PM Robin Sommer  wrote:

> Confirmed that I'm still seeing that difference even when using the
> options to turn the stores. Tried it on two different Fedora 28
> systems, with similar results.

I had previously tried on a Fedora 28 VM w/ 4 CPUs and didn't
reproduce it, but trying again on a system with 32 CPUs, I can
reproduce your results.  This has lead to the fix/workaround at [1],
now in master, which should bring things back to baseline when not
using data stores and more sane timings when using them.

General changes/improvements in CAF itself may be warranted here
because I can recreate the same performance overhead if I artificially
remove all broker functionality from Bro, but then simply create and
never use a caf::actor_system via the default configuration.

- Jon

[1] https://github.com/bro/bro/commit/c9fe9a943c4d78b18ffbae13c562b93349a5f951
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-08 Thread Jon Siwek

On Fri, Jun 8, 2018 at 10:23 AM Robin Sommer  wrote:

>
> > # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> > Known::use_host_store=F Known::use_service_store=F
> > Known::use_cert_store=F
>
> That indeed gets it way down, though still not back to the same level
> on my box:
>
> 170.49user 58.14system 1:55.94elapsed 197%CPU
>
> (pre-master: 73.72user 7.90system 1:06.53elapsed 122%CPU)

Just double-checking: same configure/build flags were used between the two?

Here's the more precise results I got from running on a macOS and Linux system:

# Linux master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup
real0m32.572s
user0m40.926s
sys 0m7.873s

# Linux master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup
Known::use_host_store=F Known::use_cert_store=F
Known::use_service_store=F
real0m25.603s
user0m23.311s
sys 0m7.952s

# Linux pre-broker (7a6f502) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup
real0m24.785s
user0m21.230s
sys 0m8.341s

# macOS master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup.bro
real0m38.775s
user0m42.365s
sys 0m8.950s

# macOS master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup.bro
Known::use_host_store=F Known::use_cert_store=F
Known::use_service_store=F
real0m32.975s
user0m33.774s
sys 0m4.409s

# macOS pre-broker (7a6f502) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup.bro
real0m30.785s
user0m31.840s
sys 0m3.788s

> Are there more places where Bro's waiting for Broker in pcap mode?

Not that I can think of.

> Yeah, I remember that discussion. It's the trade-off between
> performance and consistency. Where's the code that's doing this
> blocking?

I just know that Manager::AdvanceTime() and also
Manager::TrackStoreQuery() -> FlushPendingQueries() can block/spin
waiting on actor/threading in pcap mode.

> Would it be possible to not block right away, but sync up
> with Broker when events are flushed the next time? (Like we had
> discussed before as a general mechanism for making async operations
> consistent)

I think the way to make an async operation most consistent is to model
it as a synchronous operation?  That's what I'm doing now at least,
and if I try moving FlushPendingQueries() to later (before the
AdvanceTime() call) in an attempt to possibly have more queries in the
queue/buffer, I get the external test suites failing.  At least on my
own test systems, I don't have much performance to recover by going
this route anyway, so maybe we need to dig more into why your results
are different.  Only thing I'm thinking is that your test system maybe
does a poorer job of scheduling the right thread to run in order for
the FlushPendingQueries() spin-loop to actually finish.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-07 Thread Jon Siwek

On Thu, Jun 7, 2018 at 12:21 PM Robin Sommer  wrote:
>
> Hmm, I'm still seeing much larger runtimes on that M57 trace using our
> test configuration, even with yesterday's change:
>
> ; Master, pre-Broker
> # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> 73.72user 7.90system 1:06.53elapsed 122%CPU (0avgtext+0avgdata 
> 199092maxresident)
>
> ; Current master
> # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> 2191.59user 1606.69system 12:39.92elapsed 499%CPU (0avgtext+0avgdata 
> 228400maxresident)
>
> Bro must still be blocking somewhere when reading from trace files.

Thanks, if you pull master changes [1] again there's likely some improvement.

You can also try comparing the timing of:

# zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
Known::use_host_store=F Known::use_service_store=F
Known::use_cert_store=F

With that, I get the same timings as from before pre-Broker.  At least
a good chunk of the difference when using data stores is that, for
every query, Bro will immediately block until getting a response back
from the data store thread/actor.  Not doing this type of synchronous
data store access when reading pcaps leads to the possibility of
results differing between runs (and I recall such differences being
likely from experience in running the unit test suite).

- Jon

[1] 
https://github.com/bro/broker/commit/9b56fea4999d4e11a5cd2caaafd934759015fab5
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-06 Thread Jon Siwek

On Wed, Jun 6, 2018 at 2:10 PM Azoff, Justin S  wrote:

> I haven't noticed a huge difference on a real multi process cluster, at least 
> not a 30x change, so this is odd that pcap processing is so much slower.
> Especially since broker should be completely disabled when pcap files are 
> being read and caf shouldn't even be doing anything.

It's not that simple.

You could think of broker being enabled in this case of simply reading
a pcap because it was querying the library for whether there were any
peers (and that was actually a at least part of the perf. problem).

Beyond that, you can still think of broker being enabled when you're
reading a pcap and you are also using data stores (which the Known*
scripts in Bro now do by default).  Communication with a local master
data store is still asynchronous communication (with local
threads/actors) that ends up going through CAF messaging.  There's
also essentially a per-packet synchronization being done between Bro
and Broker/CAF now to ensure that this type of asynchronous workload
ends up producing the same results between any given Bro run.  So
there's now also that bit of extra overhead in pcap processing.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-06 Thread Jon Siwek

On Wed, Jun 6, 2018 at 12:01 PM McMullan, Tim  wrote:

> Is the Bro development team still optimizing the Broker/Actor framework?

Yes, in the sense that optimizations will be done according to
feedback.  No, in the sense that there was no one actively looking
into it at the moment you asked.

> It might be helpful to have a way to disable Broker for those of us who 
> haven’t migrated to it yet.

That's unlikely at this point as scripts have been completely ported
to Broker now and usages of the old communication system completely
removed from them.  No simple switch to it back, so your feedback is
especially important/helpful.

> #  ~1GB file time (old)
>
> $ time /hostname/bro-devel/bin/bro -r 
> 20180606-1049-prodfilers-truncated_0_20180606104904.pcap  master.bro
>
>
>
> real0m2.294s
>
> user0m1.862s
>
> sys 0m0.385s
>
>
>
> #  ~1GB file time  (new)
>
> $ time /hostname/bro-devel/bin/bro -r 
> 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro
>
>
>
> real1m11.458s
>
> user0m58.933s
>
> sys 1m34.074s

Try pulling in the change I just did at [1], which was a big part of
the problem in my own test:

# 2.5.3
$ time bro -r 2009-M57-day11-18.trace

real 0m16.187s
user 0m16.312s
sys 0m1.865s

# master before [1]
$ time bro -r ../testing/external/bro-testing/Traces/2009-M57-day11-18.trace

real 1m31.434s
user 1m44.925s
sys 1m4.947s

# master after [1]
$ time bro -r 2009-M57-day11-18.trace

real 0m24.595s
user 0m25.574s
sys 0m5.816s

- Jon

[1] https://github.com/bro/bro/commit/9822fc252d5e92208704df4a388ea31989869499

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] bro-devel package?

2018-05-24 Thread Jon Siwek

On Thu, May 24, 2018 at 2:50 PM, Vlad Grigorescu  wrote:
> There are a couple of cases where I think it'd be useful to have a bro-devel
> package -- a package that I can install on a system, and then be able to
> build plugins against Bro. (This is the same model as other *-devel
> packages, such as openssl, libpcap, etc.)

Yes, I think it's useful and something that should be done.  It was
fairly low on my list of things to try to do before 2.6.  Related:

https://bro-tracker.atlassian.net/browse/BIT-1922

> I'm curious how people are dealing with this issue, and if anyone has
> thoughts on whether this would be useful, and if so, what it would take to
> build such a package.

My guess as to what needs to be done:

* separate bifcl into its own submodule: I think this should be easy
* install Bro's headers: at least I think that's all that we need to
do on our end, it's also a bit of an open question as to whether we
just install them all for now until we get a more organized API or can
get away with a smaller subset
* update the plugin CMake/configure skeletons and/or bro-config to be
able to make use of the above two points

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker has landed in master, please test

2018-05-23 Thread Jon Siwek

On 5/23/18 3:12 PM, Michael Dopheide wrote:

> For here though, can you elaborate on the going down to one proxy?  My 
> understanding still isn't strong, but that seems to be opposed to the 
> idea of using Cluster::publish_hrw to spread memory across proxies.

The idea is to try starting with a single proxy and then scale your 
deployment based on what you actually need, and there may not be that 
great of a need at the moment as the default scripts that ship with Bro 
do not widely use the HRW/pool/partitioning APIs yet.

By default, it's currently just the Software framework that will use 
Cluster::publish_hrw.  I also plan to soon change the Intel framework to 
make use of Cluster::relay_rr.

There's also an option in the various Known::* scripts for users to 
opt-in to an alternate implementation that uses HRW + tables instead of 
the default approach of data stores.

Different sites could also have different requirements/usage of those 
default scripts and it's all too new to give better suggestions other 
than "try one proxy, add more as needed".

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Moving to GitHub?

2018-05-18 Thread Jon Siwek

On 5/18/18 11:11 AM, Robin Sommer wrote:

> What I was envisioning is more or less a clean slate: we'd migrate
> over a few tickets, but essentially we'd start with an empty list. I
> realize that sounds pretty harsh. However, I hardly ever see any
> activity on older tickets in JIRA, and I generally believe that the
> less open tickets a tracker has, the easier it is for people to
> understand what's actually relevant and worth spending cycles on.

I see those as independent issues:

(1) There's many open tickets: you solve that by actually addressing the 
tickets

(2) People don't know what tickets are relevant: you solve that by 
organizing (and maintaining) them according to relevance

My take is that there's been a lack of effort in (2) and if that's not 
solved, starting with a clean slate on GitHub now only means it's likely 
to eventually end up in the same situation as JIRA later.  What then? 
Move to another tracker again?

> Tagging tickets may help, but in the end if everybody just filters 95%
> out all the time anyways, I'm not sure what the value is.

The value is actually having a central place for all tickets and knowing 
there won't be ongoing hassle with keeping JIRA updated.

Or in reverse, I'm not sure what the value of keeping JIRA open at all 
is -- we either have to acknowledge its potential to go out of sync with 
GH in terms of duplicate reports, which makes it more of a hindrance 
IMO, or we still have to put effort to maintain it in addition to GH.

> That said, I'm open to a real porting effort if people do believe it's
> helpful to get all the JIRA tickets into GitHub. What do others think?

Pedantically, I'm not saying "port all JIRA tickets", I still want just 
the "valid" ones whatever we decide that to mean.  So more 
clarifications on potential porting criteria:

* Someone is likely to report the same problem again
* There's clear directions on how to reproduce an undesired behavior
* There been a proposed plan of action recently

And many tickets can be ruled out:

* Vague feature requests
* Not enough details  / difficult to reproduce
* Exceptionally old proposals / plans

My plea is: extend the porting criteria beyond "important" and "recent", 
perform a review of all existing tickets, and, if JIRA is going to stick 
around as a read-only archive, leave it in a coherent final state.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Moving to GitHub?

2018-05-18 Thread Jon Siwek

On 5/17/18 6:27 PM, Robin Sommer wrote:

> That may be a bit too broad though. How about "still valid and either
> (1) quite important or (2) something we expect will be addresses
> reasonably soon"? We have many old tickets that are technically still
> valid but unlikely to see any work anytime soon (otherwise they would
> have been addressed already), and I'm worried that they would just add
> noise without value.

That sounds like a general concern about project/ticket organization and 
management.  What if valid-but-old tickets were moved into GitHub with a 
simple "backlog" tag that you can filter out?  Or we could take the 
opportunity to be better about sorting on other categories as well 
(bug/feature/etc).

There's also the possibility to start a project board page [1] to aid in 
visual organization of tasks/issues and further reduce perceived noise. 
See [2] for an example of what one of those pages could look like.

> The old tickets won't go away, the JIRA will
> remain. If something becomes relevant/active, we can always bring it
> over at that time.

Keeping the JIRA around as a backlog like that decreases 
focus/visibility -- if we want to be optimistic about community 
contributions and bug fixes, we'd have to keep calling attention to 
search in two locations, GH and JIRA, instead of just one.  We'd also 
need to stay on top of maintaining JIRA to be relatively in sync with 
anything that gets resolved in GH else keeping JIRA around may become 
more liability than useful as it gets harder and harder to tell if 
something there has actually been addressed.

Doing a half-hearted effort to migrate tickets from JIRA undermines the 
goal of having an authoritative/central location for all code + tickets. 
  Can we instead try to deal with it once and for all?

- Jon

[1] https://github.com/orgs/bro/projects
[2] https://github.com/git-lfs/git-lfs/projects/3
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Moving to GitHub?

2018-05-17 Thread Jon Siwek



On 5/15/18 7:19 PM, Robin Sommer wrote:
> What do people think? Any support, or concerns?

Yeah, generally in favor with some comments:

* For porting over JIRA tickets to GitHub, "most recent" doesn't seem 
like a good metric to use.  e.g. BIT-1829 (pcap triggering assertion in 
binpac) seems kind of important and not something I'd want to have lost 
in the move, although it had no activity in almost a year.  So, I think 
it's worth it to be more comprehensive here, and as long as someone is 
going through to review all the tickets, they may as well just port all 
the older ones that are still valid over to GitHub.  (Yeah, I guess I'm 
volunteering).

* One thing I did like about using JIRA for merge requests is that I 
could make a single ticket and just say I have a given branch name in a 
bunch of repos that are ready for review/merge.  I find myself in that 
situation quite often, actually, so transitioning to GitHub PRs, I 
wonder if we'd want a PR to be created against each individual repo? 
Seems a bit much in terms of overhead.  Alternatively, could still 
create a GH issue and just say "please review branch foo in repos X, Y, 
Z and merge them".  Or else create a single PR in the "root" repo and 
mention in the PR that the same branch in child submodules also exists 
and needs merging.  That may play better with Travis CI integration 
even, although maybe not in the case where you need to change things in 
the "external testing" repos, which are not connected to bro via a 
submodule.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data store use case and questions

2018-05-14 Thread Jon Siwek

On 5/11/18 6:33 PM, Michael Dopheide wrote:

> First, can Cluster::default_master_node be changed to default to the 
> name of the current manager node rather than specifying the name as 
> 'manager'?

Maybe.  I'll try having broctl communicate that to Bro via a new 
environment variable.

> Easy to redef to the manager's name, but less easy when you 
> use the same code base on multiple clusters with different names.

If you don't want to wait for me to try the above fix, you could also 
try redef'ing it yourself with a call to getenv(), using an environment 
variable whose value you can set differently for each cluster.

> Second, when during startup should Bro know that it's persistent stores 
> exist via Cluster::stores() ?  It appears bro_init may be too soon, but 
> I'm still playing.

The comments for the Cluster::stores table may help in case you missed 
it -- Cluster::create_store() is intended to be called in bro_init() and 
will end up populating Cluster::stores.  Though, you can pre-populate 
and customize the Cluster::stores table via a redef and those will all 
automatically get picked up when during the Cluster::create_store() process.

> Also, it'd be nice if the persistence of built-in 
> stores (like known/hosts, known/certs, etc) were redef-able.

It should be possible like putting this in local.bro:

redef Cluster::stores += {
 [Known::host_store_name] = Cluster::StoreInfo($backend = 
Broker::SQLITE)
};

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data store use case and questions

2018-05-14 Thread Jon Siwek

On 5/11/18 1:38 PM, Azoff, Justin S wrote:
> 
>> On May 11, 2018, at 10:13 AM, Jon Siwek <jsi...@corelight.com> wrote:
>>
>>
>> There's no check against the local cache to first see if the key exists
>> as going down that path leads to race conditions.
> 
> What sort of race conditions?

By "local cache", I mean the data store "clone" here.  And one race with 
checking for existence in the local clone could look like:

(1) master: delete an expired key, "foo", send notification to clones
(2) clone: check for existence of key, "foo" and find it exists locally, 
then suppress further logic based on that
(3) clone: receive expiry notification for key "foo"

In that way, you can miss an (re)insertion that should have taken place 
if the query/insertion were together in sequence directly on the master 
data set.

> Things are a bit better off now in that we can use a short lived cache, since 
> the cache doesn't need to be the actual data store anymore like the old known 
> hosts set was.

A short-lived cache, separate from the data store, still has problems 
like the above: there can be times where the local cache contains the 
key and the master store does not and so you may miss some (re)insertions.

The main goal I had when re-writing these was correctness: I can't know 
what network they will run on, and so don't want to assume it will be ok 
to miss an event here or there because "typically those should be seen 
frequently enough that it will get picked up soon after the miss".

If we can optimize the scripts that ship w/ Bro while still maintaining 
correctness, that would be great, else I'd rather sites decide for 
themselves what trade-offs are acceptable and write their own scripts to 
optimize for those.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data store use case and questions

2018-05-11 Thread Jon Siwek

On 5/11/18 10:15 AM, Michael Dopheide wrote:
> Let me clarify point 4, my goal is just to keep the knownhosts data 
> persistent across restarts.  (Or any data set in the general case.)  So 
> if HRW is the best way to keep data in memory I need a way to write it 
> out to disk on Bro exit so I can read it back in later.

Ok, maybe first try a single data store for persistence (what 
known-hosts will do by default now).

If you then find you overload the centralized manager/master node, my 
next suggestion is to partition the data set across proxies via HRW 
events and combine that with a data store configured for persistence on 
each proxy.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker data store use case and questions

2018-05-11 Thread Jon Siwek

On 5/10/18 3:53 PM, Michael Dopheide wrote:

> 1) My initial gut feeling was that all of the when() calls for insertion 
> could get really expensive on a brand new cluster before the store is 
> populated.

I've not tried to explicitly measure differences yet, though my hunch is 
that the overhead of needing to use when() to drive data store 
communication patterns could be slightly more expensive than just using 
remote events (or  as the previous implementation used). 
I'm thinking of overhead more in terms of memory here, as it needs to 
save the state of the current frame making the call so it can resume 
later.

Another difference is the data store implementation of known-hosts is 
that it does always require remote communication just to check for 
whether a given key is in the data store yet, which may be a bottleneck 
for some use-cases.

You can also compare/contrast with another implementation of 
known-hosts.bro if you toggle the `Known::use_host_store = F` code path. 
  There, instead of using a data store, it sends remote events via 
Cluster::publish_hrw to uniformly partition the data set across proxies 
in a scalable manner but without persistence.

Yet another idea for an implementation, if you need persistence + 
scalability, would be combining the HRW stuff with data stores.  e.g. 
partitioning the total data set across proxies while using a data store 
on each one for local storage instead of a table/set.

I don't know if there's a general answer to which way is best.  Likely 
varies per use-case / network.

> 2) Correct me if I'm wrong, but it seems like the check for a host 
> already being in known_hosts (now host_store) no longer exists.  As a 
> result, we try to re-insert the host, calling when(), every time we see 
> an established connection with a local host.

Sounds right.

Specifically, it's Broker::put_unique() that hides the following:

(1) tell master data store "insert this key if it does not exist"
(2) wait for master data store to tell us if the key was inserted, and 
thus did not exist before

There's no check against the local cache to first see if the key exists 
as going down that path leads to race conditions.

> 3) How do I retrieve values from the store to test for existence?

Broker::exists() to just check existence of a key or Broker::get() to 
retrieve value at a key.  You can also infer existence from the result 
of Broker::get().

Either requires calling inside 'when()'.  Generally, any function in the 
API you see return a Broker::QueryResult needs to use 'when()'.

> 4) Assuming that requires another Broker call inside a when(), does it 
> make sense to pull the data store into memory at bro_init() and do 
> a Cluster::publish_hrw?

Not sure I follow since, in the current implementation of known-hosts, 
the data store and Cluster::publish_hrw code paths don't interact 
(they're alternate implementations of the same thing as mentioned 
before).  If the question is just whether it makes sense to go the 
Cluster::publish_hrw route instead of using a data store: yes, just 
depends on what you prefer.  IMO, the data store approach has downsides 
that make it less preferable to me.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Final Broker branch testing

2018-05-03 Thread Jon Siwek

On 5/2/18 9:59 AM, Johanna Amann wrote:

>>> (3) I need to try to hack our CMake system more to try to get back down
>>> to 2.8.12 while still being able to embed CAF.

I think (hope!) I was mistaken and everything already works with 2.8.12 
(structure of CMake docs previously led me to think it wouldn't) and 
just needed the version check moved back down, sorry for the noise.

Otherwise, I've stabilized some unit tests and made a merge request [1] 
for the broker branch.

- Jon

[1] https://bro-tracker.atlassian.net/browse/BIT-1653
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] set and vector operators

2018-04-30 Thread Jon Siwek

On 4/30/18 9:54 AM, Robin Sommer wrote:

> One additional piece of context here: That vector(...) syntax could
> then be used more broadly in the sense of creating a different
> semantic context for the operations inside. That kind of opens up a
> whole new set of of type-specific operator meanings, without affecting
> current/standard ones (it's like introducing R inside parentheses :-).

Yeah, framing it as type-specific operators makes sense.

> It's not the super-great, but at least it's explicit and we couldn't
> come up with anything better if we want to have such operations as
> operators. Might work for some other types as well.

Isn't the existing "v + e" an alternative that one could say is "better" 
?  Other languages also take that to mean "vectorized/array operation".

I think for Bro, it's just that vectorized/array operations will be a 
less common use-case than appending and so there's incentive to make the 
later more succinct?

Just to put it out there again: I wouldn't mind something like 
"append(v, e)" and leaving the existing vector operations like "v + e" 
alone.  It may make the common case more verbose, though there's also 
less ambiguities.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] set and vector operators

2018-04-30 Thread Jon Siwek

On 4/30/18 9:10 AM, Vern Paxson wrote:

> The question then was what would be the new "v op e" syntax.
> The best we could come up with (which we both found not-too-awful) is
> "vector(v op e)".  Wrapped in "vector(...)", the operation becomes the
> current semantics (apply "op e" separately to each element of v).

Maybe "vectorize(v op e)" ?

Implies implementing via SIMD instructions.

> "v op e" by itself would now be an error (which could point the user
> at the "vector(...)" syntax as possibly providing what they're looking
> for).  "v += e" would be "append e to v".

That still seems odd to me.  If "v += e" means "append", then I might 
expect "v + e" to do the same, except producing a new value w/ original 
vector not modified.

Maybe that's a less common use-case, though, and so "v op e" being an 
error would be less weird than suddenly changing the meaning of that 
operation.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Final Broker branch testing

2018-04-26 Thread Jon Siwek

On 4/26/18 4:30 PM, Johanna Amann wrote:

> It might be. I am honestly not sure - I suspect that this still will 
> mean that some places might not be able to easily use Bro 
> anymore--adding external package sources does not seem to be a viable 
> option everywhere.

They could still build CMake themselves?  (CMake itself is easy to build)

The options to go forward are:

(1) Users whose OS has insufficient CMake will need to compile/obtain a 
newer one.  This would mostly be Ubuntu 14.04 (LTS until April 2019) and 
RHEL/CentOS 6+7 (LTS for these is in the 2020-2024 range).

(2) We go back to CMake 2.8.12 and have people compile CAF themselves. 
(Or maybe we could conditionally require only 2.8.12 users to compile 
CAF and others get the embedded CAF).

(3) I need to try to hack our CMake system more to try to get back down 
to 2.8.12 while still being able to embed CAF.

I can give (3) a bit more time to see if I didn't miss something (the 
line I was drawing before was having to manually do platform-dependent 
RPATH manipulation), though would be nice to hear more feedback on the 
approaches.

> As a side-note, it also looks like that means that we cannot provide 
> binary packages for RedHat/CentOS anymore.
We should be able to use whatever we want to create the binary package, 
shouldn't we?  Or do you mean it wouldn't be accepted as part of the 
official repos even if it's just a build-time dependency?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] set and vector operators

2018-04-26 Thread Jon Siwek

On 4/26/18 3:43 PM, Vern Paxson wrote:

>> E.g. say you come back to some code after a few months and see "foo +=
>> 1".  Not obvious what 'foo' is anymore.
> 
> I don't think it's reasonable to have the bar be "can you tell what's going
> on in isolation".  It should include consideration of associated context,
> variable names, and comments.  In fact, even now you don't know whether
> for "foo += 1" foo is an integer, a count, or a double.

Yeah, it was maybe a bit of a stretch -- more just an observation I was 
trying to see if we could run with.  Also in relation to my recent 
experiences with trying to read/debug some C++ code with a lot of 
operator usage I found myself wishing some were just a named function 
call so I could more easily navigate the code and even just find where 
certain operators were defined/declared.

So the point I'm at now is that it would probably be nice not to have 
multiple operators for the same thing, though don't have a strong 
feeling about it.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Final Broker branch testing

2018-04-26 Thread Jon Siwek

On 4/26/18 2:04 PM, Johanna Amann wrote:

> With this change, we Bro cannot be compiled out of the Box on 
> RedHat/Centos 7 anymore. Since that is the latest release of RedHat and 
> probably used in production by quite a few people a potentially 
> significant amount of people might not be able to (easily) compile Bro 
> with this merge.
> 
> It aborts in configure, with:
> 
> -- Performing Test cxx11_header_works - Success
> CMake Error at aux/broker/CMakeLists.txt:4 (cmake_minimum_required):
>    CMake 3.0.2 or higher is required.  You are running version 2.8.12.2

Is "use cmake3 from EPEL" an acceptable answer?

The main reason for it (IIRC) is for embedding CAF as a CMake 
ExternalProject, which I was struggling to hack around with lack of 
features in CMake 2.8.

> Compiling on Debian 8 gives some CAF warnings that are a tad ugly:

It's CAF's master branch at the moment, so I don't feel much pressure to 
report/patch these unless they're still there when we're close to moving 
to beta/release version.

> I noticed one small thing while building with make -j4; in this case you 
> get several different % numbers simultaneously (one for car and one for 
> broker).

Not sure I can help that.  The thing that comes to mind would be 
enforcing that CAF does not build in parallel and thus wasting your 
time.  Or else try to patch CMake to do a better job when using external 
projects.

> There were a few test failures on the first run (that succeeded on a 
> rerun) though.

Thanks, I also still see occasional failures that pass when re-running 
-- it's lower priority on my list to stabilize these.  And I think 
doesn't prevent merging to master since they are most often problems 
with the unit test itself.

> There were also a couple that did not succeed after several reruns for 
> me. This was on a digital ocean 4cpu optimized debian8 instance for me; 
> the reruns were not parallel:

Ok, will take a closer look at those.

I should also mention that I've run unit tests myself on the following:

* MacOS 10.13.4
* FreeBSD 11
* CentOS 7
* Debian 8

I find the tests are usually stable with a few needing some reruns.  I 
haven't had any single test persistently fail on any of those systems.

I was also testing --build-type=debug everywhere and only a few places 
without, so that could be a difference.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] How to deal with stale branches?

2018-04-26 Thread Jon Siwek



On 4/26/18 11:06 AM, Vlad Grigorescu wrote:

> I'm torn between deleting the branches, in an effort to not clog up git 
> with unneeded branches, and leaving them around or perhaps archiving 
> them somewhere, in order to not completely lose the work in case it's of 
> value to someone down the road.
> 
> I'm curious if anyone has thoughts on the best way to proceed.

Maybe delete the branch from the official git repo and push it to your 
own github fork.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] set and vector operators

2018-04-26 Thread Jon Siwek

On 4/26/18 11:29 AM, Vern Paxson wrote:
>> Just one more thing still: I'm actually feeling pretty strongly
>> against having multiple different operators for the same operation
>> (set union, set addition/removal).

I'm maybe convincing myself that it's at least not that useful or 
there's alternative ways forward that don't introduce redundancies.

> I'm fine with removing "add" and "delete" for sets!  (But seems we gotta
> keep them for a good while for backward compatibility.  Plus, what would
> be the remove operator for tables?  "t -= index" seems pretty weird to me.)

A nice thing about "add" and "delete" for sets is that you can infer the 
data type that you're operating on just looking at the local code/line. 
E.g. say you come back to some code after a few months and see "foo += 
1".  Not obvious what 'foo' is anymore.  Could be vector, set, or count, 
etc.

> But I don't think we should forego '+' and '-' for sets.  It would be too
> weird that "v += e" works but "s += e" does not (and "add v[e]" blows, so
> let's not consider going down that path :-P).

Yeah, I agree with that sentiment (on the condition that we did add +/- 
for sets).

I do also notice that you had "s + e" in the proposal and not "v + e". 
Isn't that weird by the same logic or is it just an accidental omission?

> Once we have s += e, we
> certainly should have s -= e.  And once we have "s + e", "s1 + s2" seems
> very natural to me too; I don't relish having to explain "oh *that* doesn't
> work, you have to use s1 | s2" :-P.

That also makes sense, though it's worth seeing what a minimal proposal 
looks like without the contentious '+' operations:

 s1 - s2, s1 -= s2Set difference
 s1 | s2, s1 |= s2Set union
 s1 & s2, s1 &= s2Set intersection
 s1 ^ s2, s1 ^= s2Set symmetric difference

 c1 | c2, c1 |= c2Bitwise-or 'count' values
 c1 & c2, c1 &= c2Bitwise-and 'count' values
 c1 ^ c2, c1 ^= c2Bitwise-xor 'count' values

The only one now missing that I'd probably find myself using is:

 v += e   Append element to vector

And for that, a BiF or generic script-layer function call (if that were 
possible) would even make me happy:

 push(v, e)

That also could go back to what I was saying before about readability: 
it would then be more obvious than "v += e" regarding what data types 
are involved.

Same idea would apply to "s += v" and "s -= v" (if we were inclined):

 add_to_set(s, v)
 delete_from_set(s, v)

Could that all be an alternative way forward?  Or is it missing other 
important aspects?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

[Bro-Dev] Final Broker branch testing

2018-04-26 Thread Jon Siwek

The latest version of the new Broker-ized cluster/communication system 
for Bro in 'topic/actor-system' branch is wrapping up and, in my 
opinion, ready to be merged into Bro's 'master' branch.

However, since it's such a big change, I'd like a last round of feedback 
before merging.  If you want to test, the build process should now be as 
simple as:

$ git clone --recursive --branch=topic/actor-system git://git.bro.org/bro
$ cd bro && ./configure && make

Configuring BroControl is not any different from before.

If you had custom scripts, they may require porting.  There's a guide 
and examples for that at [1] and [2] (hyperlinks in those docs will 
render more nicely when it's up on bro.org).

Though, for this round of testing, I'd be most interested just in any 
general stability issues or major feature breakages from a vanilla Bro 
installation.  Mild performance issues, minor bugs, or other issues w/ 
porting custom scripts are things I think we can iron out even after 
merging into 'master'.

- Jon

[1] 
https://github.com/bro/bro/blob/topic/actor-system/doc/frameworks/broker.rst
[2] https://github.com/bro/bro/tree/topic/actor-system/doc/frameworks/broker
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] set and vector operators

2018-04-26 Thread Jon Siwek



On 4/26/18 12:19 AM, Vern Paxson wrote:
> Hmmm thinking about it, we can get away with '&' with minimal keyword
> conflict because there's such an easy (and natural-to-presume) fix -
> namely, rather than "x" you use "x & attrkeyword".  Now
> there's no problem, since the lexer only recognizes ""
> as a unit, with no whitespace allowed.

Yeah, I think it could turn out ok.  Using '&' and '|' in the set 
operations seems more natural/consistent to me and so maybe worthwhile 
to try that approach first.

It's also a nice time to introduce the bitwise operations between count 
values.  I thought about adding that on one occasion and also know 
others have wanted it.
> How does that sound?

Sounds good to me.  I'd find most of them more convenient and wish I'd 
had them in the past.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] scheduling events vs using _func ?

2018-04-16 Thread Jon Siwek

On 4/13/18 6:14 PM, Aashish Sharma wrote:
> I have a aggregation policy where I am trying to keep counts of number of
> connections an IP made in a cluster setup.
> 
> For now, I am using table on workers and manager and using expire_func to
> trigger worker2manager and manager2worker events.
> 
> All works great until tables grow to > 1 million after which expire_functions
> start clogging on manager and slowing down.
> 
> Example of Timer from prof.log on manager:
> 
> 1523636760.591416 Timers: current=57509 max=68053 mem=4942K lag=0.44s
> 1523636943.983521 Timers: current=54653 max=68053 mem=4696K lag=168.39s
> 1523638289.808519 Timers: current=49623 max=68053 mem=4264K lag=1330.82s
> 1523638364.873338 Timers: current=48441 max=68053 mem=4162K lag=60.06s
> 1523638380.344700 Timers: current=50841 max=68053 mem=4369K lag=0.47s
> 
> So Instead of using _func, I can probably try schedule {} ; but I am 
> not
> sure how scheduling events are any different internally then scheduling
> expire_funcs ?

There's a single timer per table that continuously triggers incremental 
iteration over fixed-size chunks of the table, looking for entries to 
expire.  The relevant options that you can tune here:

* `table_expire_interval`
* `table_incremental_step`
* `table_expire_delay`

> I'd like to think/guess that scheduling events is probably less taxing. but
> wanted to check with the greater group on thoughts - esp insights into their
> internal processing queues.

I'm not clear on exactly how your code would be restructured around 
scheduled events, though guessing if you just did one event per entry 
that needs to be expired, it's not going to be better.  You would then 
have one timer per table entry (up from a single timer), or possibly 
more depending on expiration scheme (e.g. if it's expiring on something 
other than create times, you're going to need a way to invalidate 
previously scheduled events).

Ultimately, you'd likely still have the same amount of equivalent 
function calls (whatever work you're doing in _func, would still 
need to happen).  With the way table expiration is implemented, my guess 
is that the actual work required to call and evaluate the _func 
code becomes too great at some point, so maybe first try decreasing 
`table_incremental_step` or reducing the work that you need to do in the 
_func.

With new features in the upcoming broker-enabled cluster framework (soon 
to be merged into git/master), I'd suggest a different way to think 
about structuring the problem: you could Rendezvous Hash the IP 
addresses across proxies, with each one managing expiration in just 
their own table.  In that way, the storage/computation can be uniformly 
distributed and you should be able to simply adjust number of proxies to 
fit the required scale.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Overload Bro Events

2018-04-12 Thread Jon Siwek



On 4/12/18 2:23 PM, DW wrote:
> Hello,
> 
> it is possible to overload events in Bro based on the event-parameter
> and trigger the "right" event based on the given parameter?
> 
> E.g. I would define events like this
> 
> event overload%(c: connection%);
> event overload%(c: connection, h: header%);
> event overload%(c: connection, h: header, d: data%);

Overloading is not supported for functions in general (function/event/hook).

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker port status

2018-03-08 Thread Jon Siwek

On Thu, Mar 8, 2018 at 6:11 PM, Azoff, Justin S  wrote:

>> redef Known::use_host_store = F;
>> redef Known::use_cert_store = F;
>> redef Known::use_device_store = F;
>> redef Known::use_service_store = F;

> but if you load the script first, the
>
> @if ( Known::use_service_store )
>
> block is already evaluated before you change the value.

Yeah, I realized that and fixed it yesterday.  Have you pulled the
latest commits from the topic/actor-system branch?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Broker port status

2018-03-08 Thread Jon Siwek

On Thu, Mar 8, 2018 at 5:07 PM, Azoff, Justin S  wrote:

> One thing I notice is that the traffic to/from the manager box and cpu has 
> increased quite a bit.
>
> Ignoring the large CPU spikes from building the new branch just before the 
> switch at 15:30, the overall cpu load on the manager box is about 3x higher.
> The bandwidth isn't terribly excessive, but it increased from about 16mbit to 
> 40mbit (stupid graph is in mebibytes).
>
> Based on some capstats runs, it's all going to the logger port 47761.  I have 
> another graph that shows the total log volume being written to disk and that 
> hasn't changed.
> So it looks like this branch uses 2x the cpu and bandwidth to write the same 
> volume of logs.

Interesting, I would have thought maybe the manager could be utilized
more, though not the logger.  Will have to think about that.

An interesting experiment you could try is switching to an alternate
implementation of the Known scripts.  E.g. stick this in local.bro:

redef Known::use_host_store = F;
redef Known::use_cert_store = F;
redef Known::use_device_store = F;
redef Known::use_service_store = F;

I'd expect that could reduce bandwidth a bit.

Using data stores the Known scripts do:

worker -> manager: put this key into your store if it doesn't exist
manager -> worker: the key was accepted (or rejected)
worker -> logger: log the key (sent only if manager accepted key as unique)

Else the alternate version of Known scripts do:

worker -> proxy: here's a key
proxy -> logger: log the key (if proxy hasn't seen it recently)

Maybe cuts out 1/3 the message volume contributed by Known scripts that way.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

1 2 3 4 5 6 7 >

1 - 100 of 616 matches

Mail list logo