Re: [Bro-Dev] Broker data store use case and questions

2018-05-14 Thread Azoff, Justin S

> On May 14, 2018, at 10:12 AM, Jon Siwek  wrote:
> 
> A short-lived cache, separate from the data store, still has problems like 
> the above: there can be times where the local cache contains the key and the 
> master store does not and so you may miss some (re)insertions.

I see what you mean.. I can almost see a solution involving using create_expire 
and expire_func to trigger a re-submit when the local cache expires, but that 
may cause the opposite problem.  This would mean that a record would be sent 
the first time it was seen and then at most once again N minutes after that.  
If N minutes after that is 00:03 the entry would be logged on the following day 
even if it was not seen yet.  I suppose if the value in the cache table was the 
network_time of the last time seen that could used to fill in the HostInfo 
record.



— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data store use case and questions

2018-05-14 Thread Jon Siwek


On 5/11/18 6:33 PM, Michael Dopheide wrote:

> First, can Cluster::default_master_node be changed to default to the 
> name of the current manager node rather than specifying the name as 
> 'manager'?

Maybe.  I'll try having broctl communicate that to Bro via a new 
environment variable.

> Easy to redef to the manager's name, but less easy when you 
> use the same code base on multiple clusters with different names.

If you don't want to wait for me to try the above fix, you could also 
try redef'ing it yourself with a call to getenv(), using an environment 
variable whose value you can set differently for each cluster.

> Second, when during startup should Bro know that it's persistent stores 
> exist via Cluster::stores() ?  It appears bro_init may be too soon, but 
> I'm still playing.

The comments for the Cluster::stores table may help in case you missed 
it -- Cluster::create_store() is intended to be called in bro_init() and 
will end up populating Cluster::stores.  Though, you can pre-populate 
and customize the Cluster::stores table via a redef and those will all 
automatically get picked up when during the Cluster::create_store() process.

> Also, it'd be nice if the persistence of built-in 
> stores (like known/hosts, known/certs, etc) were redef-able.

It should be possible like putting this in local.bro:

redef Cluster::stores += {
 [Known::host_store_name] = Cluster::StoreInfo($backend = 
Broker::SQLITE)
};

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker data store use case and questions

2018-05-14 Thread Jon Siwek


On 5/11/18 1:38 PM, Azoff, Justin S wrote:
> 
>> On May 11, 2018, at 10:13 AM, Jon Siwek  wrote:
>>
>>
>> There's no check against the local cache to first see if the key exists
>> as going down that path leads to race conditions.
> 
> What sort of race conditions?

By "local cache", I mean the data store "clone" here.  And one race with 
checking for existence in the local clone could look like:

(1) master: delete an expired key, "foo", send notification to clones
(2) clone: check for existence of key, "foo" and find it exists locally, 
then suppress further logic based on that
(3) clone: receive expiry notification for key "foo"

In that way, you can miss an (re)insertion that should have taken place 
if the query/insertion were together in sequence directly on the master 
data set.

> Things are a bit better off now in that we can use a short lived cache, since 
> the cache doesn't need to be the actual data store anymore like the old known 
> hosts set was.

A short-lived cache, separate from the data store, still has problems 
like the above: there can be times where the local cache contains the 
key and the master store does not and so you may miss some (re)insertions.

The main goal I had when re-writing these was correctness: I can't know 
what network they will run on, and so don't want to assume it will be ok 
to miss an event here or there because "typically those should be seen 
frequently enough that it will get picked up soon after the miss".

If we can optimize the scripts that ship w/ Bro while still maintaining 
correctness, that would be great, else I'd rather sites decide for 
themselves what trade-offs are acceptable and write their own scripts to 
optimize for those.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev