Re: [Bro-Dev] Writing SumStats plugin

2018-08-13 Thread Jon Siwek
On Tue, Aug 7, 2018 at 5:15 PM Jim Mellander  wrote:

> Incidentally, I think theres a bug in the observe() function:
>
> These two lines are run in the loop thru the reducers:
>if ( r?$normalize_key )
> key = r$normalize_key(copy(key));
> which has the effect of modifying the key for subsequent loops, rather than 
> just for the one reducer it applies to.  The fix is easy and and obvious

Yeah, looked wrong to me also.  Fixed via [1] in master branch now.
Sorry I don't have much knowledge of the existing sumstats code to
drive the other discussion/suggestions forward.

- Jon

https://github.com/bro/bro/commit/5821c16490e731a68c0efc9c1aaba2d7aec28f48
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Writing SumStats plugin

2018-08-07 Thread Jim Mellander
It seems that there's some inconsistency in SumStats plugin usage and
implementation.  There appear to be 2 classes of plugins with differing
calling mechanisms and action:

   1. Item to be measured is in the Key, and the measurement is in
   Observation
  1. These include Average, Last X Observations, Max, Min, Sample,
  Standard Deviation, Sum, Unique, Variance
 1. These are exact measurements.
 2. Some of these have dependencies: StdDev depends on Variance,
 which depends on Average
 2. Item to be measured is in Observation, and the measurement is
   implicitly 1, and the Key is generally null
   1. These include HyperLogLog (number of Unique), TopK (top count)
  1. These are probabilistic data structures.

The Key is not passed to the plugin, but is used to allocate a table that
includes, among other things, the processed observations.  Both classes
call the epoch_result function once per key at the end of the epoch.  Since
class 2 plugins often use a null key, there is only one call to
epoch_result, and a special function is used to extract the results from
the probabilistic data structure (
https://www.bro.org/current/exercises/sumstats/sumstats-5.bro).  The
epoch_finished function is called when all keys have been returned to
finish up.  This is unneeded with this sort of class 2 plugin, since all
the work can be done in the sole call to epoch_result.  Multiple keys could
be used with class 2 plugins, which allows for groupings (
https://www.bro.org/current/exercises/sumstats/sumstats-4.bro).

I have a use case where I want to pass both a key and measurement to a
plugin maintaining a probabilistic data store [1].  I don't want to
allocate a table for each key, since many/most will not be reflected in the
final results.  Since the Observation is a record containing both a string
& a number, a hack would be to coerce the key to a string, and pass both in
the Observation to a class 2 plugin, with a null key - which is what I am
doing currently.

It would be nice to have a conversation on how to unify these two classes
of plugins.  A few thoughts on this:

   - Pass Key to the plugins - maybe Key could be added to the Observation
   structure.
   - Provide a mechanism to *not* allocate the table structure with every
   new Key (this and the previous can possibly be done with some hackiness
   with the normalize_key function in the reducer record)
   - Some sort of epoch_result factory function that by default just
   performs the class 1 plugin behavior.  For class 2 plugins, the function
   would feed the results one by one into epoch_result.

Incidentally, I think theres a bug in the observe() function:

These two lines are run in the loop thru the reducers:
   if ( r?$normalize_key )
key = r$normalize_key(copy(key));
which has the effect of modifying the key for subsequent loops, rather than
just for the one reducer it applies to.  The fix is easy and and obvious

Jim


[1] Implementation of algorithms 4&5 (with enhancements) of
https://arxiv.org/pdf/1705.07001.pdf



On Thu, Aug 2, 2018 at 4:44 PM, Jim Mellander  wrote:

> Hi all:
>
> I'm thinking of writing a SumStats plugin, probably with the initial
> implementation in bro scriptland, with a re-implementation as BIFs if
> initial tests successful.
>
> From examining several plugins, it appears that I need to:
>
>- Add NAME of my plugin as an enum to Calculation
>- Add optional tunables to Reducer
>- Add my data structure to ResultVal
>- In register_observe_plugins, register the function to take an
>observation.
>- In init_result_val_hook, add code to initialize data structure.
>- In compose_resultvals_hook, add code to merge multiple data
>structures
>- Create function to extract
>from data structure either at epoch_result, or epoch_finished
>
> Any thing else I should be aware of?
>
> Thanks in advance,
>
> Jim
>
>
>
>
>
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] Writing SumStats plugin

2018-08-02 Thread Jim Mellander
Hi all:

I'm thinking of writing a SumStats plugin, probably with the initial
implementation in bro scriptland, with a re-implementation as BIFs if
initial tests successful.

>From examining several plugins, it appears that I need to:

   - Add NAME of my plugin as an enum to Calculation
   - Add optional tunables to Reducer
   - Add my data structure to ResultVal
   - In register_observe_plugins, register the function to take an
   observation.
   - In init_result_val_hook, add code to initialize data structure.
   - In compose_resultvals_hook, add code to merge multiple data structures
   - Create function to extract
   from data structure either at epoch_result, or epoch_finished

Any thing else I should be aware of?

Thanks in advance,

Jim
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev