Re: [Bro-Dev] Broker port status

2018-03-09 Thread Azoff, Justin S
> On Mar 9, 2018, at 8:24 AM, Azoff, Justin S  wrote:
> 
> Ah.. I grabbed the latest commit from the wrong checkout and was using a 
> commit from a few weeks ago.
> 

Ok.. running the latest commit now (105fc386ef0f65a91839706641abae664c7f3e49)

Have noticed a problem where at the top of an hour, the logger runs into some 
issue while rotating the logs
and all logging stops but the logger buffers everything in memory until the box 
OOMs:

[root@bro-test ~]# broctl top manager logger proxy-{1,2,3}
Name TypeHost Pid ProcVSize  Rss  Cpu   Cmd
logger   logger  bro-test 14267   parent5G 5G  93%  bro
manager  manager bro-test 15478   parent  649M   277M  25%  bro
proxy-1  proxy   bro-test 32061   parent  643M   309M   0%  bro
proxy-2  proxy   bro-test 32063   parent  642M   351M  12%  bro
proxy-3  proxy   bro-test 15732   parent  619M   322M  18%  bro
[root@bro-test ~]# cat /bro/logs/current/*.log|wc -l
2627
[root@bro-test ~]# broctl top manager logger proxy-{1,2,3}
Name TypeHost Pid ProcVSize  Rss  Cpu   Cmd
logger   logger  bro-test 14267   parent6G 5G  66%  bro
manager  manager bro-test 15478   parent  649M   277M  33%  bro
proxy-1  proxy   bro-test 32061   parent  643M   309M  26%  bro
proxy-2  proxy   bro-test 32063   parent  642M   351M  20%  bro
proxy-3  proxy   bro-test 15732   parent  619M   322M  26%  bro
[root@bro-test ~]# cat /bro/logs/current/*.log|wc -l
2627
[root@bro-test ~]# broctl top manager logger proxy-{1,2,3}
Name TypeHost Pid ProcVSize  Rss  Cpu   Cmd
logger   logger  bro-test 14267   parent7G 6G  75%  bro
manager  manager bro-test 15478   parent  649M   277M  25%  bro
proxy-1  proxy   bro-test 32061   parent  643M   309M  12%  bro
proxy-2  proxy   bro-test 32063   parent  642M   353M  25%  bro
proxy-3  proxy   bro-test 15732   parent  619M   322M  12%  bro
[root@bro-test ~]# cat /bro/logs/current/*.log|wc -l
2627
[root@bro-test ~]# broctl top manager logger proxy-{1,2,3}
Name TypeHost Pid ProcVSize  Rss  Cpu   Cmd
logger   logger  bro-test 14267   parent   12G11G  68%  bro
manager  manager bro-test 15478   parent  649M   277M  31%  bro
proxy-1  proxy   bro-test 32061   parent  643M   313M  18%  bro
proxy-2  proxy   bro-test 32063   parent  642M   359M  12%  bro
proxy-3  proxy   bro-test 15732   parent  619M   322M   6%  bro
[root@bro-test ~]# ls -tl /bro/logs/current/|head
total 412
-rw-r--r--. 1 root root682 Mar  9 11:00 stats.log
-rw-r--r--. 1 root root  14470 Mar  9 11:00 conn.log
[root@bro-test ~]# date
Fri Mar  9 11:06:59 CST 2018

up until log rotation is supposed to occur the log rate is normal.

— 
Justin Azoff

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-09 Thread Azoff, Justin S

> On Mar 8, 2018, at 6:32 PM, Jon Siwek  wrote:
> 
> Yeah, I realized that and fixed it yesterday.  Have you pulled the
> latest commits from the topic/actor-system branch?
> 
> - Jon

Ah.. I grabbed the latest commit from the wrong checkout and was using a commit 
from a few weeks ago.




— 
Justin Azoff


___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-08 Thread Robin Sommer


On Thu, Mar 08, 2018 at 13:28 -0600, you wrote:

> * Rename "proxy" nodes?

"compute" maybe?

> (1) , ,   Seems fine to deprecate these 
> now.

Agree.

> (2) Communication framework scripts.  It's awkward to deprecate stuff
> here since they internally will be using what is deprecated.  I think
> it makes sense to just remove it and let users manually make a copy if
> they still need it.

Likewise agree.

> (3) Old C++ comm. code, e.g. RemoteSerializer.  I think we'll leave
> this untouched for the 2.6 release?

Yep, I'm looking forward to ripping that out for 2.7. :)

> (4) BIFs associated with old comm. system.  Depends on (3) (and also
> actually (2)), though if the internal core code remains, I think it
> makes sense to add  to these.

Deprecating them sounds good.

> It makes more sense to me to assume the user wants to insert the key
> with a sane default value (e.g. zero/empty) in those cases, otherwise,
> it's awkward/race-prone to require they do it themselves.

Agree with this as well, has always felt a bit awkward to me too.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-08 Thread Jon Siwek
On Thu, Mar 8, 2018 at 6:11 PM, Azoff, Justin S  wrote:

>> redef Known::use_host_store = F;
>> redef Known::use_cert_store = F;
>> redef Known::use_device_store = F;
>> redef Known::use_service_store = F;

> but if you load the script first, the
>
> @if ( Known::use_service_store )
>
> block is already evaluated before you change the value.

Yeah, I realized that and fixed it yesterday.  Have you pulled the
latest commits from the topic/actor-system branch?

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-08 Thread Azoff, Justin S

> On Mar 8, 2018, at 5:46 PM, Jon Siwek  wrote:
> 
> On Thu, Mar 8, 2018 at 5:07 PM, Azoff, Justin S  wrote:
> 
>> One thing I notice is that the traffic to/from the manager box and cpu has 
>> increased quite a bit.
>> 
>> Ignoring the large CPU spikes from building the new branch just before the 
>> switch at 15:30, the overall cpu load on the manager box is about 3x higher.
>> The bandwidth isn't terribly excessive, but it increased from about 16mbit 
>> to 40mbit (stupid graph is in mebibytes).
>> 
>> Based on some capstats runs, it's all going to the logger port 47761.  I 
>> have another graph that shows the total log volume being written to disk and 
>> that hasn't changed.
>> So it looks like this branch uses 2x the cpu and bandwidth to write the same 
>> volume of logs.
> 
> Interesting, I would have thought maybe the manager could be utilized
> more, though not the logger.  Will have to think about that.

Yeah.. it's definitely worker -> logger:

logger=47761, manager=47762, proxy=47763

[root@bro-test ~]# capstats -i enp3s0f0 -I 1 -f 'port 47761' -n 5
1520553009.488934 pkts=1173 kpps=1.1 kbytes=2837 mbps=22.6 nic_pkts=1175 
nic_drops=0 u=0 t=1173 i=0 o=0 nonip=0
1520553010.489053 pkts=1280 kpps=1.3 kbytes=3218 mbps=26.4 nic_pkts=2455 
nic_drops=0 u=0 t=1280 i=0 o=0 nonip=0
1520553011.489177 pkts=1086 kpps=1.1 kbytes=2912 mbps=23.9 nic_pkts=3541 
nic_drops=0 u=0 t=1086 i=0 o=0 nonip=0
1520553012.489310 pkts=1382 kpps=1.4 kbytes=3757 mbps=30.8 nic_pkts=4923 
nic_drops=0 u=0 t=1382 i=0 o=0 nonip=0
1520553013.489471 pkts=1563 kpps=1.6 kbytes=3816 mbps=31.3 nic_pkts=6486 
nic_drops=0 u=0 t=1563 i=0 o=0 nonip=0
[root@bro-test ~]# capstats -i enp3s0f0 -I 1 -f 'port 47762' -n 5
1520553018.874154 pkts=346 kpps=0.3 kbytes=92 mbps=0.7 nic_pkts=348 nic_drops=0 
u=0 t=346 i=0 o=0 nonip=0
1520553019.875111 pkts=1044 kpps=1.0 kbytes=283 mbps=2.3 nic_pkts=1391 
nic_drops=0 u=0 t=1044 i=0 o=0 nonip=0
1520553020.875448 pkts=759 kpps=0.8 kbytes=199 mbps=1.6 nic_pkts=2150 
nic_drops=0 u=0 t=759 i=0 o=0 nonip=0
1520553021.875575 pkts=654 kpps=0.7 kbytes=196 mbps=1.6 nic_pkts=2804 
nic_drops=0 u=0 t=654 i=0 o=0 nonip=0
1520553022.875920 pkts=470 kpps=0.5 kbytes=123 mbps=1.0 nic_pkts=3274 
nic_drops=0 u=0 t=470 i=0 o=0 nonip=0
[root@bro-test ~]# capstats -i enp3s0f0 -I 1 -f 'port 47763' -n 5
1520553025.700919 pkts=26 kpps=0.0 kbytes=8 mbps=0.1 nic_pkts=31 nic_drops=0 
u=0 t=26 i=0 o=0 nonip=0
1520553026.701047 pkts=27 kpps=0.0 kbytes=9 mbps=0.1 nic_pkts=58 nic_drops=0 
u=0 t=27 i=0 o=0 nonip=0
1520553027.701185 pkts=24 kpps=0.0 kbytes=8 mbps=0.1 nic_pkts=82 nic_drops=0 
u=0 t=24 i=0 o=0 nonip=0
1520553028.701306 pkts=55 kpps=0.1 kbytes=18 mbps=0.2 nic_pkts=137 nic_drops=0 
u=0 t=55 i=0 o=0 nonip=0
1520553029.701434 pkts=9 kpps=0.0 kbytes=3 mbps=0.0 nic_pkts=146 nic_drops=0 
u=0 t=9 i=0 o=0 nonip=0
[root@bro-test ~]#

On another cluster seeing the same traffic with 56 workers running 2.5.3ish 
shows this for the logger port at the same time:

1520553725.875230 pkts=681 kpps=0.7 kbytes=1946 mbps=15.5 nic_pkts=768 
nic_drops=0 u=0 t=681 i=0 o=0 nonip=0
1520553726.875374 pkts=508 kpps=0.5 kbytes=1605 mbps=13.1 nic_pkts=1392 
nic_drops=0 u=0 t=508 i=0 o=0 nonip=0
1520553727.875520 pkts=554 kpps=0.6 kbytes=1681 mbps=13.8 nic_pkts=2247 
nic_drops=0 u=0 t=554 i=0 o=0 nonip=0
1520553728.875713 pkts=536 kpps=0.5 kbytes=1708 mbps=14.0 nic_pkts=2902 
nic_drops=0 u=0 t=536 i=0 o=0 nonip=0
1520553729.875845 pkts=518 kpps=0.5 kbytes=1669 mbps=13.7 nic_pkts=3641 
nic_drops=0 u=0 t=518 i=0 o=0 nonip=0

[root@bro-test ~]# broctl top manager logger
Name TypeHost Pid ProcVSize  Rss  Cpu   Cmd
logger   logger  bro-test 2016parent  938M   283M  81%  bro
manager  manager bro-test 2107parent  619M   244M  25%  bro
[root@bro-test ~]#

[root@bro-dev ~]# broctl top manager logger
Name TypeHost Pid ProcVSize  Rss  Cpu   Cmd
logger   logger  bro-dev10704   child   492M   150M   6%  bro
logger   logger  bro-dev10571   parent1G   221M   0%  bro
manager  manager bro-dev10734   parent  909M   497M   0%  bro
manager  manager bro-dev10782   child   482M   110M   0%  bro

cpus are about the same, test has X5650  @ 2.67GHz and dev has E5-2420 @ 
1.90GHz 
according to passmark, the older X5650 has a slightly faster single core 
performance.

> An interesting experiment you could try is switching to an alternate
> implementation of the Known scripts.  E.g. stick this in local.bro:
> 
> redef Known::use_host_store = F;
> redef Known::use_cert_store = F;
> redef Known::use_device_store = F;
> redef Known::use_service_store = F;

I tried doing that for an unrelated reason.. but I'm not sure if it works right.

You can't redef Known::use_service_store before the script is loaded, because 
otherwise you get

Can't document redef of Known::use_service_store, identifier lookup failed

but 

Re: [Bro-Dev] Broker port status

2018-03-08 Thread Jon Siwek
On Thu, Mar 8, 2018 at 5:07 PM, Azoff, Justin S  wrote:

> One thing I notice is that the traffic to/from the manager box and cpu has 
> increased quite a bit.
>
> Ignoring the large CPU spikes from building the new branch just before the 
> switch at 15:30, the overall cpu load on the manager box is about 3x higher.
> The bandwidth isn't terribly excessive, but it increased from about 16mbit to 
> 40mbit (stupid graph is in mebibytes).
>
> Based on some capstats runs, it's all going to the logger port 47761.  I have 
> another graph that shows the total log volume being written to disk and that 
> hasn't changed.
> So it looks like this branch uses 2x the cpu and bandwidth to write the same 
> volume of logs.

Interesting, I would have thought maybe the manager could be utilized
more, though not the logger.  Will have to think about that.

An interesting experiment you could try is switching to an alternate
implementation of the Known scripts.  E.g. stick this in local.bro:

redef Known::use_host_store = F;
redef Known::use_cert_store = F;
redef Known::use_device_store = F;
redef Known::use_service_store = F;

I'd expect that could reduce bandwidth a bit.

Using data stores the Known scripts do:

worker -> manager: put this key into your store if it doesn't exist
manager -> worker: the key was accepted (or rejected)
worker -> logger: log the key (sent only if manager accepted key as unique)

Else the alternate version of Known scripts do:

worker -> proxy: here's a key
proxy -> logger: log the key (if proxy hasn't seen it recently)

Maybe cuts out 1/3 the message volume contributed by Known scripts that way.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-08 Thread Jon Siwek
On Thu, Mar 8, 2018 at 2:18 PM, Azoff, Justin S  wrote:

>> (1) , ,   Seems fine to deprecate these 
>> now.
>
> I'm fine with it going away, but there needs to be some sort of replacement 
> for ,
> minimally a how-to for porting existing scripts to something else.

The answer is generally "use the new store API" to replace
 or   It's difficult to give a canonical
porting strategy as it depends on how the data is being
accessed/modified, though you can find a preview of the how-to port
document at [1].  Links and stuff will get rendered more nicely when
it's live on bro.org.

- Jon

[1] https://github.com/bro/bro/blob/topic/actor-system/doc/frameworks/broker.rst
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Broker port status

2018-03-08 Thread Azoff, Justin S
> On Mar 8, 2018, at 1:28 PM, Jon Siwek  wrote:
> 
> Hi all, this is just a status update on porting Bro to use the new
> Broker communication system.  I'd say the topic/actor-system branch is
> now functionally complete with still a few weeks left of internal
> cleanup/improvements.

Awesome, I'll get it deployed here on a test cluster.

> Open questions:
> 
> * Rename "proxy" nodes?
> 
> The previous idea was to call them "data nodes", though I don't see
> the benefit.  It's awkward to talk about because there's no shorthand
> for that node type: you can say "logger", "manager", or "worker",
> though you'd have to say the whole "data node" phrase to make any
> sense.  That also shows that maybe it's helpful to name nodes based
> upon a personifiable role that they play: "data" isn't a role/action
> performed by a node/person.  IMO, "proxy" is still accurate to the
> role that these nodes perform: they are intermediaries for offloading
> analysis and/or storage.  Are there other ideas or is everyone wanting
> to go ahead with "data node" ?

I think I was the one calling them data nodes, but I only did that because 
that's what
they were called in the original broker integration branch that Mathias Fischer 
started.

I don't care about the name, as long as it's documented as a proxy node is for 
"offloading
analysis and storage" that works for me.

> * How much of the old comm. system to deprecate vs. remove?
> 
> (1) , ,   Seems fine to deprecate these 
> now.

I'm fine with it going away, but there needs to be some sort of replacement for 
,
minimally a how-to for porting existing scripts to something else.

I don't think anyone currently uses mergeable or persistent.


— 
Justin Azoff



___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] Broker port status

2018-03-08 Thread Jon Siwek
Hi all, this is just a status update on porting Bro to use the new
Broker communication system.  I'd say the topic/actor-system branch is
now functionally complete with still a few weeks left of internal
cleanup/improvements.

Open questions:

* Rename "proxy" nodes?

The previous idea was to call them "data nodes", though I don't see
the benefit.  It's awkward to talk about because there's no shorthand
for that node type: you can say "logger", "manager", or "worker",
though you'd have to say the whole "data node" phrase to make any
sense.  That also shows that maybe it's helpful to name nodes based
upon a personifiable role that they play: "data" isn't a role/action
performed by a node/person.  IMO, "proxy" is still accurate to the
role that these nodes perform: they are intermediaries for offloading
analysis and/or storage.  Are there other ideas or is everyone wanting
to go ahead with "data node" ?

* How much of the old comm. system to deprecate vs. remove?

(1) , ,   Seems fine to deprecate these now.

(2) Communication framework scripts.  It's awkward to deprecate stuff
here since they internally will be using what is deprecated.  I think
it makes sense to just remove it and let users manually make a copy if
they still need it.  Thoughts?

(3) Old C++ comm. code, e.g. RemoteSerializer.  I think we'll leave
this untouched for the 2.6 release?

(4) BIFs associated with old comm. system.  Depends on (3) (and also
actually (2)), though if the internal core code remains, I think it
makes sense to add  to these.

Anything else?

And just a quick summary of planned internal improvements:

There's still several usability issues with Broker data stores.

* Online vs. offline operation discrepancies.  Expiration times in
Broker are driven by wall time, though for a more familiar/expected
offline usage experience, Bro would need to be able to drive
expiration via network time.  There's also some wall-time-based data
store initialization delays that make offline Bro usage difficult as
the Bro-based query timeouts are network-time-based.

* The way clone snapshots are distributed needs to be optimized.
Currently, for any clone that connects, the master floods *all* clones
with a snapshot.

* Certain store operations like "increment" fail if the key they are
to operate on does not exist.  It makes more sense to me to assume the
user wants to insert the key with a sane default value (e.g.
zero/empty) in those cases, otherwise, it's awkward/race-prone to
require they do it themselves.

Let me know if there's thoughts on anything else still left to add/improve.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev