Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-13 Thread Jon Siwek
On Tue, Jun 12, 2018 at 4:26 PM Robin Sommer  wrote:

> One question about Broker's endpoint::advance_time(): that's locking
> on each call when in pcap mode, i.e., once per packet. Could we limit
> that to cases where something actually needs to be done? Quick idea:
> use an atomic<> for current_time plus another atomic<> counter
> tracking if there are any pending message. And go into the locked case
> only if the latter is non-zero?

Yes, I've done that in master, but didn't measure a notable difference.

> > General changes/improvements in CAF itself may be warranted here

Made an issue for that here:

https://github.com/actor-framework/actor-framework/issues/699

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-12 Thread Robin Sommer



On Tue, Jun 12, 2018 at 14:16 -0500, you wrote:

> This has lead to the fix/workaround at [1], now in master, which

Cool, that indeed solved it! It also helps significantly when data
stores *are* being used; that still takes about 2x the time, but
that's much less than before. Now I'm wondering if we can get that
back down to normal, too ...

One question about Broker's endpoint::advance_time(): that's locking
on each call when in pcap mode, i.e., once per packet. Could we limit
that to cases where something actually needs to be done? Quick idea:
use an atomic<> for current_time plus another atomic<> counter
tracking if there are any pending message. And go into the locked case
only if the latter is non-zero?

> General changes/improvements in CAF itself may be warranted here

Yeah, sounds like it.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-12 Thread Jon Siwek
On Mon, Jun 11, 2018 at 6:30 PM Robin Sommer  wrote:

> Confirmed that I'm still seeing that difference even when using the
> options to turn the stores. Tried it on two different Fedora 28
> systems, with similar results.

I had previously tried on a Fedora 28 VM w/ 4 CPUs and didn't
reproduce it, but trying again on a system with 32 CPUs, I can
reproduce your results.  This has lead to the fix/workaround at [1],
now in master, which should bring things back to baseline when not
using data stores and more sane timings when using them.

General changes/improvements in CAF itself may be warranted here
because I can recreate the same performance overhead if I artificially
remove all broker functionality from Bro, but then simply create and
never use a caf::actor_system via the default configuration.

- Jon

[1] https://github.com/bro/bro/commit/c9fe9a943c4d78b18ffbae13c562b93349a5f951
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-11 Thread Robin Sommer


On Fri, Jun 08, 2018 at 10:47 -0700, I wrote:

> Ok, I'll dig around a bit more and also double-check that I didn't
> change any settings in the meantime.

Confirmed that I'm still seeing that difference even when using the
options to turn the stores. Tried it on two different Fedora 28
systems, with similar results.

I haven't been able to nail down what's going on though. The
AdvanceTime() method does seem to do a lot of locking in pcap mode,
independent of whether there's actually any data store operations.
However, I tried a quick hack to get that down and that didn't change
anything.

I then ran it through oprofile. Output is attached. There's indeed
some locking showing up high in there, but I can't tell if that's just
expected idling in CAF. I am bit surprised to see a number of
std::chrono() methods showing rather prominently that I would expect
to be cheap.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
Using /home/robin/bro/master/tmp/oprofile_data/samples/ for samples directory.

WARNING! Some of the events were throttled. Throttling occurs when
the initial sample rate is too high, causing an excessive number of
interrupts.  Decrease the sampling frequency. Check the directory
/home/robin/bro/master/tmp/oprofile_data/samples/current/stats/throttled
for the throttled event names.


WARNING: Lost samples detected! See 
/home/robin/bro/master/tmp/oprofile_data/samples/operf.log for details.
warning: /hpsa could not be found.
warning: /kvm could not be found.
warning: /nf_conntrack could not be found.
warning: /nf_defrag_ipv4 could not be found.
warning: /nf_nat could not be found.
warning: /nf_nat_ipv4 could not be found.
warning: /tg3 could not be found.
CPU: Intel Sandy Bridge microarchitecture, speed 2000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %image name   symbol name
48605 3.3738  kallsyms find_busiest_group
31108 2.1593  libtcmalloc.so.4.5.1 /usr/lib64/libtcmalloc.so.4.5.1
24088 1.6720  bro  RE_Match_State::Match(unsigned char 
const*, int, bool, bool, bool)
22833 1.5849  kallsyms native_write_msr
20525 1.4247  kallsyms finish_task_switch
20314 1.4100  kallsyms syscall_return_via_sysret
16822 1.1677  kallsyms __softirqentry_text_start
16520 1.1467  libcaf_core.so.0.15.7
caf::detail::double_ended_queue::lock_guard::lock_guard(std::atomic_flag&)
15112 1.0490  kallsyms update_blocked_averages
12897 0.8952  kallsyms run_timer_softirq
12890 0.8947  kallsyms pick_next_task_fair
12729 0.8836  libpthread-2.27.so   nanosleep
12495 0.8673  kallsyms update_curr
12209 0.8475  kallsyms _raw_spin_lock
12186 0.8459  libcaf_core.so.0.15.7caf::resumable* 
caf::policy::work_stealing::dequeue
 >(caf::scheduler::worker*)
11979 0.8315  kallsyms __schedule
11886 0.8250  kallsyms __update_load_avg_cfs_rq.isra.34
11463 0.7957  kallsyms idle_cpu
11239 0.7801  kallsyms __update_load_avg_se.isra.33
11178 0.7759  kallsyms native_sched_clock
11010 0.7642  kallsyms update_load_avg
10854 0.7534  libcaf_core.so.0.15.7
std::atomic::node*>::load(std::memory_order)
 const
10770 0.7476  kallsyms load_balance
10737 0.7453  libcaf_core.so.0.15.7decltype (({parm#1}->data)()) 
caf::policy::unprofiled::d 
>(caf::scheduler::worker*)
10582 0.7345  bro  DFA_State::Xtion(int, DFA_Machine*)
10554 0.7326  libcaf_core.so.0.15.7
caf::detail::double_ended_queue::take_head()
10185 0.7070  bro  RandTest::add(void const*, int)
10133 0.7034  libcaf_core.so.0.15.7
std::__uniq_ptr_impl::node, 
std::default_delete::node> 
>::_M_ptr()
9920  0.6886  libstdc++.so.6.0.25  /usr/lib64/libstdc++.so.6.0.25
9813  0.6811  kallsyms 
swapgs_restore_regs_and_return_to_usermode
9685  0.6723  kallsyms trigger_load_balance
9431  0.6546  libcaf_core.so.0.15.7std::tuple_element<0ul, 
std::tuple::node*, 
std::default_delete::node> > 
>::type& std::get<0ul, caf::detail::double_ended_queue::node*, 
std::default_delete::node> 
>(std::tuple::node*, 
std::default_delete::node> >&)
9404  0.6528  libcaf_core.so.0.15.7
caf::scheduler::worker::data()
9350  0.6490  kallsyms do_syscall_64
9311  0.6463  libcaf_core.so.0.15.7
std::enable_if::node*>
 >, 
std::is_move_constructible::node*>,
 
std::is_move_assignable::node*> 
>::value, void>::type 
std::swap::node*>(caf::detail::double_ended_queue::node*&,
 caf::detail::double_ended_queue::node*&)
9297  0.6453  

Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-08 Thread Robin Sommer


Ok, I'll dig around a bit more and also double-check that I didn't
change any settings in the meantime.

Robin

On Fri, Jun 08, 2018 at 12:40 -0500, you wrote:

> On Fri, Jun 8, 2018 at 12:16 PM Jon Siwek  wrote:
> 
> > Only thing I'm thinking is that your test system maybe
> > does a poorer job of scheduling the right thread to run in order for
> > the FlushPendingQueries() spin-loop to actually finish.
> 
> Actually, realized you still had bad timing with data stores off, so
> it would more likely be a problem with the AdvanceTime() code path.
> There's some mutex locking going on there, but w/o data stores
> involved there shouldn't be anyone competing for them.
> 
> - Jon
> 


-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-08 Thread Jon Siwek
On Fri, Jun 8, 2018 at 10:23 AM Robin Sommer  wrote:

>
> > # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> > Known::use_host_store=F Known::use_service_store=F
> > Known::use_cert_store=F
>
> That indeed gets it way down, though still not back to the same level
> on my box:
>
> 170.49user 58.14system 1:55.94elapsed 197%CPU
>
> (pre-master: 73.72user 7.90system 1:06.53elapsed 122%CPU)

Just double-checking: same configure/build flags were used between the two?

Here's the more precise results I got from running on a macOS and Linux system:

# Linux master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup
real0m32.572s
user0m40.926s
sys 0m7.873s

# Linux master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup
Known::use_host_store=F Known::use_cert_store=F
Known::use_service_store=F
real0m25.603s
user0m23.311s
sys 0m7.952s

# Linux pre-broker (7a6f502) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup
real0m24.785s
user0m21.230s
sys 0m8.341s

# macOS master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup.bro
real0m38.775s
user0m42.365s
sys 0m8.950s

# macOS master (b51e6f3) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup.bro
Known::use_host_store=F Known::use_cert_store=F
Known::use_service_store=F
real0m32.975s
user0m33.774s
sys 0m4.409s

# macOS pre-broker (7a6f502) --build-type=debug
$ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r
2009-M57-day11-18.trace test-all-policy testing-setup.bro
real0m30.785s
user0m31.840s
sys 0m3.788s

> Are there more places where Bro's waiting for Broker in pcap mode?

Not that I can think of.

> Yeah, I remember that discussion. It's the trade-off between
> performance and consistency. Where's the code that's doing this
> blocking?

I just know that Manager::AdvanceTime() and also
Manager::TrackStoreQuery() -> FlushPendingQueries() can block/spin
waiting on actor/threading in pcap mode.

> Would it be possible to not block right away, but sync up
> with Broker when events are flushed the next time? (Like we had
> discussed before as a general mechanism for making async operations
> consistent)

I think the way to make an async operation most consistent is to model
it as a synchronous operation?  That's what I'm doing now at least,
and if I try moving FlushPendingQueries() to later (before the
AdvanceTime() call) in an attempt to possibly have more queries in the
queue/buffer, I get the external test suites failing.  At least on my
own test systems, I don't have much performance to recover by going
this route anyway, so maybe we need to dig more into why your results
are different.  Only thing I'm thinking is that your test system maybe
does a poorer job of scheduling the right thread to run in order for
the FlushPendingQueries() spin-loop to actually finish.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-08 Thread Robin Sommer



On Thu, Jun 07, 2018 at 17:05 -0500, you wrote:

> Thanks, if you pull master changes [1] again there's likely some improvement.

Yeah, a little bit, not much though.

> # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> Known::use_host_store=F Known::use_service_store=F
> Known::use_cert_store=F

That indeed gets it way down, though still not back to the same level
on my box:

170.49user 58.14system 1:55.94elapsed 197%CPU

(pre-master: 73.72user 7.90system 1:06.53elapsed 122%CPU)

Are there more places where Bro's waiting for Broker in pcap mode?

> With that, I get the same timings as from before pre-Broker.  At least
> a good chunk of the difference when using data stores is that, for
> every query, Bro will immediately block until getting a response back
> from the data store thread/actor.

Yeah, I remember that discussion. It's the trade-off between
performance and consistency. Where's the code that's doing this
blocking? Would it be possible to not block right away, but sync up
with Broker when events are flushed the next time? (Like we had
discussed before as a general mechanism for making async operations
consistent)

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-07 Thread Jon Siwek
On Thu, Jun 7, 2018 at 12:21 PM Robin Sommer  wrote:
>
> Hmm, I'm still seeing much larger runtimes on that M57 trace using our
> test configuration, even with yesterday's change:
>
> ; Master, pre-Broker
> # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> 73.72user 7.90system 1:06.53elapsed 122%CPU (0avgtext+0avgdata 
> 199092maxresident)
>
> ; Current master
> # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
> 2191.59user 1606.69system 12:39.92elapsed 499%CPU (0avgtext+0avgdata 
> 228400maxresident)
>
> Bro must still be blocking somewhere when reading from trace files.

Thanks, if you pull master changes [1] again there's likely some improvement.

You can also try comparing the timing of:

# zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
Known::use_host_store=F Known::use_service_store=F
Known::use_cert_store=F

With that, I get the same timings as from before pre-Broker.  At least
a good chunk of the difference when using data stores is that, for
every query, Bro will immediately block until getting a response back
from the data store thread/actor.  Not doing this type of synchronous
data store access when reading pcaps leads to the possibility of
results differing between runs (and I recall such differences being
likely from experience in running the unit test suite).

- Jon

[1] 
https://github.com/bro/broker/commit/9b56fea4999d4e11a5cd2caaafd934759015fab5
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-07 Thread Robin Sommer
Hmm, I'm still seeing much larger runtimes on that M57 trace using our
test configuration, even with yesterday's change:

; Master, pre-Broker
# zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
73.72user 7.90system 1:06.53elapsed 122%CPU (0avgtext+0avgdata 
199092maxresident)

; Current master
# zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro
2191.59user 1606.69system 12:39.92elapsed 499%CPU (0avgtext+0avgdata 
228400maxresident)

Bro must still be blocking somewhere when reading from trace files.

Robin

-- 
Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-06 Thread Jon Siwek
On Wed, Jun 6, 2018 at 2:10 PM Azoff, Justin S  wrote:

> I haven't noticed a huge difference on a real multi process cluster, at least 
> not a 30x change, so this is odd that pcap processing is so much slower.
> Especially since broker should be completely disabled when pcap files are 
> being read and caf shouldn't even be doing anything.

It's not that simple.

You could think of broker being enabled in this case of simply reading
a pcap because it was querying the library for whether there were any
peers (and that was actually a at least part of the perf. problem).

Beyond that, you can still think of broker being enabled when you're
reading a pcap and you are also using data stores (which the Known*
scripts in Bro now do by default).  Communication with a local master
data store is still asynchronous communication (with local
threads/actors) that ends up going through CAF messaging.  There's
also essentially a per-packet synchronization being done between Bro
and Broker/CAF now to ensure that this type of asynchronous workload
ends up producing the same results between any given Bro run.  So
there's now also that bit of extra overhead in pcap processing.

- Jon
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-06 Thread Jon Siwek
On Wed, Jun 6, 2018 at 12:01 PM McMullan, Tim  wrote:

> Is the Bro development team still optimizing the Broker/Actor framework?

Yes, in the sense that optimizations will be done according to
feedback.  No, in the sense that there was no one actively looking
into it at the moment you asked.

> It might be helpful to have a way to disable Broker for those of us who 
> haven’t migrated to it yet.

That's unlikely at this point as scripts have been completely ported
to Broker now and usages of the old communication system completely
removed from them.  No simple switch to it back, so your feedback is
especially important/helpful.

> #  ~1GB file time (old)
>
> $ time /hostname/bro-devel/bin/bro -r 
> 20180606-1049-prodfilers-truncated_0_20180606104904.pcap  master.bro
>
>
>
> real0m2.294s
>
> user0m1.862s
>
> sys 0m0.385s
>
>
>
> #  ~1GB file time  (new)
>
> $ time /hostname/bro-devel/bin/bro -r 
> 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro
>
>
>
> real1m11.458s
>
> user0m58.933s
>
> sys 1m34.074s

Try pulling in the change I just did at [1], which was a big part of
the problem in my own test:

# 2.5.3
$ time bro -r 2009-M57-day11-18.trace

real 0m16.187s
user 0m16.312s
sys 0m1.865s

# master before [1]
$ time bro -r ../testing/external/bro-testing/Traces/2009-M57-day11-18.trace

real 1m31.434s
user 1m44.925s
sys 1m4.947s

# master after [1]
$ time bro -r 2009-M57-day11-18.trace

real 0m24.595s
user 0m25.574s
sys 0m5.816s

- Jon

[1] https://github.com/bro/bro/commit/9822fc252d5e92208704df4a388ea31989869499

___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?

2018-06-06 Thread Azoff, Justin S

> On Jun 6, 2018, at 12:54 PM, McMullan, Tim  wrote:
> 
> We are running into performance issues (30x slower) since the Broker patch 
> (fe7e1ee) –
>  
> We have 40G connections tapped from our storage filers feeding multiple Bro 
> instances which analyze specifically only NFS and SMB traffic; all other 
> analyzers are disabled.  With the broker patch we are seeing processing times 
> for a ~1GB pcap jump from around 2 seconds to over  1 minute. Profiling Bro, 
> it looks like the culprit is the new Actor functions --
>  
> # Before patch
> Overhead  Shared Object  Symbol
> 14.57%  [kernel]  [k] copy_user_enhanced_fast_string
>3.20%  bro   [.] EventHandler::operator bool
>2.99%  bro   [.] _siphash
>2.89%  bro   [.] Dictionary::Lookup
>  
> # After patch
> Overhead  Shared Object  Symbol
>5.71%  [kernel]   [k] native_write_msr_safe
>3.84%  libcaf_core.so.0.15.7  [.] 
> caf::scheduler::worker::run
>3.71%  libcaf_core.so.0.15.7  [.] 
> caf::detail::double_ended_queue::take_head
>3.29%  [kernel]   [k] _raw_spin_lock
>  
> Is the Bro development team still optimizing the Broker/Actor framework? It 
> might be helpful to have a way to disable Broker for those of us who haven’t 
> migrated to it yet.
>  
> #  ~1GB file time (old)
> $ time /hostname/bro-devel/bin/bro -r 
> 20180606-1049-prodfilers-truncated_0_20180606104904.pcap  master.bro
>  
> real0m2.294s
> user0m1.862s
> sys 0m0.385s
>  
> #  ~1GB file time  (new)
> $ time /hostname/bro-devel/bin/bro -r 
> 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro
>  
> real1m11.458s
> user0m58.933s
> sys 1m34.074s

Well, I can at least reproduce that here on one of the standard test pcap files 
and no scripts loaded:

$ /usr/local/bin/bro --version
/usr/local/bin/bro version 2.5.2
$ time /usr/local/bin/bro -b -r 
~/src/bro/testing/external/bro-testing/2009-M57-day11-18.trace

real0m2.317s
user0m1.874s
sys 0m0.327s

$ . ~/src/bro/build/bro-path-dev.sh
$ bro --version
bro version 2.5-648
$ time bro -b -r ~/src/bro/testing/external/bro-testing/2009-M57-day11-18.trace

real1m5.523s
user0m30.565s
sys 0m32.661s

and the same thing in perf:

Samples: 270K of event 'cycles:u', Event count (approx.): 131984875188
Overhead  Command  Shared Object Symbol
   8.95%  bro  libcaf_core.so.0.15.7 [.] 
caf::detail::double_ended_queue::take_tail
   5.90%  bro  libcaf_core.so.0.15.7 [.] 
caf::detail::double_ended_queue::take_head
   4.92%  bro  libcaf_core.so.0.15.7 [.] 
caf::scheduler::worker::run
   2.67%  bro  libc-2.17.so  [.] __GI___libc_nanosleep
   1.98%  bro  libtcmalloc.so.4.4.5  [.] operator new[]
   1.79%  bro  libc-2.17.so  [.] __sleep
   1.68%  bro  libtcmalloc.so.4.4.5  [.] tc_deletearray_nothrow
   1.47%  bro  libc-2.17.so  [.] __libc_disable_asynccancel
   1.28%  bro  libstdc++.so.6.0.19   [.] 
std::this_thread::__sleep_for
   1.10%  bro  libc-2.17.so  [.] usleep
   1.05%  bro  bro   [.] ones_complement_checksum



I haven't noticed a huge difference on a real multi process cluster, at least 
not a 30x change, so this is odd that pcap processing is so much slower.
Especially since broker should be completely disabled when pcap files are being 
read and caf shouldn't even be doing anything.

— 
Justin Azoff



___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev