Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Tue, Jun 12, 2018 at 4:26 PM Robin Sommer wrote: > One question about Broker's endpoint::advance_time(): that's locking > on each call when in pcap mode, i.e., once per packet. Could we limit > that to cases where something actually needs to be done? Quick idea: > use an atomic<> for current_time plus another atomic<> counter > tracking if there are any pending message. And go into the locked case > only if the latter is non-zero? Yes, I've done that in master, but didn't measure a notable difference. > > General changes/improvements in CAF itself may be warranted here Made an issue for that here: https://github.com/actor-framework/actor-framework/issues/699 - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Tue, Jun 12, 2018 at 14:16 -0500, you wrote: > This has lead to the fix/workaround at [1], now in master, which Cool, that indeed solved it! It also helps significantly when data stores *are* being used; that still takes about 2x the time, but that's much less than before. Now I'm wondering if we can get that back down to normal, too ... One question about Broker's endpoint::advance_time(): that's locking on each call when in pcap mode, i.e., once per packet. Could we limit that to cases where something actually needs to be done? Quick idea: use an atomic<> for current_time plus another atomic<> counter tracking if there are any pending message. And go into the locked case only if the latter is non-zero? > General changes/improvements in CAF itself may be warranted here Yeah, sounds like it. Robin -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Mon, Jun 11, 2018 at 6:30 PM Robin Sommer wrote: > Confirmed that I'm still seeing that difference even when using the > options to turn the stores. Tried it on two different Fedora 28 > systems, with similar results. I had previously tried on a Fedora 28 VM w/ 4 CPUs and didn't reproduce it, but trying again on a system with 32 CPUs, I can reproduce your results. This has lead to the fix/workaround at [1], now in master, which should bring things back to baseline when not using data stores and more sane timings when using them. General changes/improvements in CAF itself may be warranted here because I can recreate the same performance overhead if I artificially remove all broker functionality from Bro, but then simply create and never use a caf::actor_system via the default configuration. - Jon [1] https://github.com/bro/bro/commit/c9fe9a943c4d78b18ffbae13c562b93349a5f951 ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Fri, Jun 08, 2018 at 10:47 -0700, I wrote: > Ok, I'll dig around a bit more and also double-check that I didn't > change any settings in the meantime. Confirmed that I'm still seeing that difference even when using the options to turn the stores. Tried it on two different Fedora 28 systems, with similar results. I haven't been able to nail down what's going on though. The AdvanceTime() method does seem to do a lot of locking in pcap mode, independent of whether there's actually any data store operations. However, I tried a quick hack to get that down and that didn't change anything. I then ran it through oprofile. Output is attached. There's indeed some locking showing up high in there, but I can't tell if that's just expected idling in CAF. I am bit surprised to see a number of std::chrono() methods showing rather prominently that I would expect to be cheap. Robin -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin Using /home/robin/bro/master/tmp/oprofile_data/samples/ for samples directory. WARNING! Some of the events were throttled. Throttling occurs when the initial sample rate is too high, causing an excessive number of interrupts. Decrease the sampling frequency. Check the directory /home/robin/bro/master/tmp/oprofile_data/samples/current/stats/throttled for the throttled event names. WARNING: Lost samples detected! See /home/robin/bro/master/tmp/oprofile_data/samples/operf.log for details. warning: /hpsa could not be found. warning: /kvm could not be found. warning: /nf_conntrack could not be found. warning: /nf_defrag_ipv4 could not be found. warning: /nf_nat could not be found. warning: /nf_nat_ipv4 could not be found. warning: /tg3 could not be found. CPU: Intel Sandy Bridge microarchitecture, speed 2000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 48605 3.3738 kallsyms find_busiest_group 31108 2.1593 libtcmalloc.so.4.5.1 /usr/lib64/libtcmalloc.so.4.5.1 24088 1.6720 bro RE_Match_State::Match(unsigned char const*, int, bool, bool, bool) 22833 1.5849 kallsyms native_write_msr 20525 1.4247 kallsyms finish_task_switch 20314 1.4100 kallsyms syscall_return_via_sysret 16822 1.1677 kallsyms __softirqentry_text_start 16520 1.1467 libcaf_core.so.0.15.7 caf::detail::double_ended_queue::lock_guard::lock_guard(std::atomic_flag&) 15112 1.0490 kallsyms update_blocked_averages 12897 0.8952 kallsyms run_timer_softirq 12890 0.8947 kallsyms pick_next_task_fair 12729 0.8836 libpthread-2.27.so nanosleep 12495 0.8673 kallsyms update_curr 12209 0.8475 kallsyms _raw_spin_lock 12186 0.8459 libcaf_core.so.0.15.7caf::resumable* caf::policy::work_stealing::dequeue >(caf::scheduler::worker*) 11979 0.8315 kallsyms __schedule 11886 0.8250 kallsyms __update_load_avg_cfs_rq.isra.34 11463 0.7957 kallsyms idle_cpu 11239 0.7801 kallsyms __update_load_avg_se.isra.33 11178 0.7759 kallsyms native_sched_clock 11010 0.7642 kallsyms update_load_avg 10854 0.7534 libcaf_core.so.0.15.7 std::atomic::node*>::load(std::memory_order) const 10770 0.7476 kallsyms load_balance 10737 0.7453 libcaf_core.so.0.15.7decltype (({parm#1}->data)()) caf::policy::unprofiled::d >(caf::scheduler::worker*) 10582 0.7345 bro DFA_State::Xtion(int, DFA_Machine*) 10554 0.7326 libcaf_core.so.0.15.7 caf::detail::double_ended_queue::take_head() 10185 0.7070 bro RandTest::add(void const*, int) 10133 0.7034 libcaf_core.so.0.15.7 std::__uniq_ptr_impl::node, std::default_delete::node> >::_M_ptr() 9920 0.6886 libstdc++.so.6.0.25 /usr/lib64/libstdc++.so.6.0.25 9813 0.6811 kallsyms swapgs_restore_regs_and_return_to_usermode 9685 0.6723 kallsyms trigger_load_balance 9431 0.6546 libcaf_core.so.0.15.7std::tuple_element<0ul, std::tuple::node*, std::default_delete::node> > >::type& std::get<0ul, caf::detail::double_ended_queue::node*, std::default_delete::node> >(std::tuple::node*, std::default_delete::node> >&) 9404 0.6528 libcaf_core.so.0.15.7 caf::scheduler::worker::data() 9350 0.6490 kallsyms do_syscall_64 9311 0.6463 libcaf_core.so.0.15.7 std::enable_if::node*> >, std::is_move_constructible::node*>, std::is_move_assignable::node*> >::value, void>::type std::swap::node*>(caf::detail::double_ended_queue::node*&, caf::detail::double_ended_queue::node*&) 9297 0.6453
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
Ok, I'll dig around a bit more and also double-check that I didn't change any settings in the meantime. Robin On Fri, Jun 08, 2018 at 12:40 -0500, you wrote: > On Fri, Jun 8, 2018 at 12:16 PM Jon Siwek wrote: > > > Only thing I'm thinking is that your test system maybe > > does a poorer job of scheduling the right thread to run in order for > > the FlushPendingQueries() spin-loop to actually finish. > > Actually, realized you still had bad timing with data stores off, so > it would more likely be a problem with the AdvanceTime() code path. > There's some mutex locking going on there, but w/o data stores > involved there shouldn't be anyone competing for them. > > - Jon > -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Fri, Jun 8, 2018 at 10:23 AM Robin Sommer wrote: > > > # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro > > Known::use_host_store=F Known::use_service_store=F > > Known::use_cert_store=F > > That indeed gets it way down, though still not back to the same level > on my box: > > 170.49user 58.14system 1:55.94elapsed 197%CPU > > (pre-master: 73.72user 7.90system 1:06.53elapsed 122%CPU) Just double-checking: same configure/build flags were used between the two? Here's the more precise results I got from running on a macOS and Linux system: # Linux master (b51e6f3) --build-type=debug $ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r 2009-M57-day11-18.trace test-all-policy testing-setup real0m32.572s user0m40.926s sys 0m7.873s # Linux master (b51e6f3) --build-type=debug $ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r 2009-M57-day11-18.trace test-all-policy testing-setup Known::use_host_store=F Known::use_cert_store=F Known::use_service_store=F real0m25.603s user0m23.311s sys 0m7.952s # Linux pre-broker (7a6f502) --build-type=debug $ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r 2009-M57-day11-18.trace test-all-policy testing-setup real0m24.785s user0m21.230s sys 0m8.341s # macOS master (b51e6f3) --build-type=debug $ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r 2009-M57-day11-18.trace test-all-policy testing-setup.bro real0m38.775s user0m42.365s sys 0m8.950s # macOS master (b51e6f3) --build-type=debug $ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r 2009-M57-day11-18.trace test-all-policy testing-setup.bro Known::use_host_store=F Known::use_cert_store=F Known::use_service_store=F real0m32.975s user0m33.774s sys 0m4.409s # macOS pre-broker (7a6f502) --build-type=debug $ export BROPATH=$BROPATH:../testing/external/scripts; time bro -r 2009-M57-day11-18.trace test-all-policy testing-setup.bro real0m30.785s user0m31.840s sys 0m3.788s > Are there more places where Bro's waiting for Broker in pcap mode? Not that I can think of. > Yeah, I remember that discussion. It's the trade-off between > performance and consistency. Where's the code that's doing this > blocking? I just know that Manager::AdvanceTime() and also Manager::TrackStoreQuery() -> FlushPendingQueries() can block/spin waiting on actor/threading in pcap mode. > Would it be possible to not block right away, but sync up > with Broker when events are flushed the next time? (Like we had > discussed before as a general mechanism for making async operations > consistent) I think the way to make an async operation most consistent is to model it as a synchronous operation? That's what I'm doing now at least, and if I try moving FlushPendingQueries() to later (before the AdvanceTime() call) in an attempt to possibly have more queries in the queue/buffer, I get the external test suites failing. At least on my own test systems, I don't have much performance to recover by going this route anyway, so maybe we need to dig more into why your results are different. Only thing I'm thinking is that your test system maybe does a poorer job of scheduling the right thread to run in order for the FlushPendingQueries() spin-loop to actually finish. - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Thu, Jun 07, 2018 at 17:05 -0500, you wrote: > Thanks, if you pull master changes [1] again there's likely some improvement. Yeah, a little bit, not much though. > # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro > Known::use_host_store=F Known::use_service_store=F > Known::use_cert_store=F That indeed gets it way down, though still not back to the same level on my box: 170.49user 58.14system 1:55.94elapsed 197%CPU (pre-master: 73.72user 7.90system 1:06.53elapsed 122%CPU) Are there more places where Bro's waiting for Broker in pcap mode? > With that, I get the same timings as from before pre-Broker. At least > a good chunk of the difference when using data stores is that, for > every query, Bro will immediately block until getting a response back > from the data store thread/actor. Yeah, I remember that discussion. It's the trade-off between performance and consistency. Where's the code that's doing this blocking? Would it be possible to not block right away, but sync up with Broker when events are flushed the next time? (Like we had discussed before as a general mechanism for making async operations consistent) Robin -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Thu, Jun 7, 2018 at 12:21 PM Robin Sommer wrote: > > Hmm, I'm still seeing much larger runtimes on that M57 trace using our > test configuration, even with yesterday's change: > > ; Master, pre-Broker > # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro > 73.72user 7.90system 1:06.53elapsed 122%CPU (0avgtext+0avgdata > 199092maxresident) > > ; Current master > # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro > 2191.59user 1606.69system 12:39.92elapsed 499%CPU (0avgtext+0avgdata > 228400maxresident) > > Bro must still be blocking somewhere when reading from trace files. Thanks, if you pull master changes [1] again there's likely some improvement. You can also try comparing the timing of: # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro Known::use_host_store=F Known::use_service_store=F Known::use_cert_store=F With that, I get the same timings as from before pre-Broker. At least a good chunk of the difference when using data stores is that, for every query, Bro will immediately block until getting a response back from the data store thread/actor. Not doing this type of synchronous data store access when reading pcaps leads to the possibility of results differing between runs (and I recall such differences being likely from experience in running the unit test suite). - Jon [1] https://github.com/bro/broker/commit/9b56fea4999d4e11a5cd2caaafd934759015fab5 ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
Hmm, I'm still seeing much larger runtimes on that M57 trace using our test configuration, even with yesterday's change: ; Master, pre-Broker # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro 73.72user 7.90system 1:06.53elapsed 122%CPU (0avgtext+0avgdata 199092maxresident) ; Current master # zcat 2009-M57-day11-18.trace.gz | time bro -r - tests/m57-long.bro 2191.59user 1606.69system 12:39.92elapsed 499%CPU (0avgtext+0avgdata 228400maxresident) Bro must still be blocking somewhere when reading from trace files. Robin -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Wed, Jun 6, 2018 at 2:10 PM Azoff, Justin S wrote: > I haven't noticed a huge difference on a real multi process cluster, at least > not a 30x change, so this is odd that pcap processing is so much slower. > Especially since broker should be completely disabled when pcap files are > being read and caf shouldn't even be doing anything. It's not that simple. You could think of broker being enabled in this case of simply reading a pcap because it was querying the library for whether there were any peers (and that was actually a at least part of the perf. problem). Beyond that, you can still think of broker being enabled when you're reading a pcap and you are also using data stores (which the Known* scripts in Bro now do by default). Communication with a local master data store is still asynchronous communication (with local threads/actors) that ends up going through CAF messaging. There's also essentially a per-packet synchronization being done between Bro and Broker/CAF now to ensure that this type of asynchronous workload ends up producing the same results between any given Bro run. So there's now also that bit of extra overhead in pcap processing. - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
On Wed, Jun 6, 2018 at 12:01 PM McMullan, Tim wrote: > Is the Bro development team still optimizing the Broker/Actor framework? Yes, in the sense that optimizations will be done according to feedback. No, in the sense that there was no one actively looking into it at the moment you asked. > It might be helpful to have a way to disable Broker for those of us who > haven’t migrated to it yet. That's unlikely at this point as scripts have been completely ported to Broker now and usages of the old communication system completely removed from them. No simple switch to it back, so your feedback is especially important/helpful. > # ~1GB file time (old) > > $ time /hostname/bro-devel/bin/bro -r > 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro > > > > real0m2.294s > > user0m1.862s > > sys 0m0.385s > > > > # ~1GB file time (new) > > $ time /hostname/bro-devel/bin/bro -r > 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro > > > > real1m11.458s > > user0m58.933s > > sys 1m34.074s Try pulling in the change I just did at [1], which was a big part of the problem in my own test: # 2.5.3 $ time bro -r 2009-M57-day11-18.trace real 0m16.187s user 0m16.312s sys 0m1.865s # master before [1] $ time bro -r ../testing/external/bro-testing/Traces/2009-M57-day11-18.trace real 1m31.434s user 1m44.925s sys 1m4.947s # master after [1] $ time bro -r 2009-M57-day11-18.trace real 0m24.595s user 0m25.574s sys 0m5.816s - Jon [1] https://github.com/bro/bro/commit/9822fc252d5e92208704df4a388ea31989869499 ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] Performance Issues after the fe7e1ee commit?
> On Jun 6, 2018, at 12:54 PM, McMullan, Tim wrote: > > We are running into performance issues (30x slower) since the Broker patch > (fe7e1ee) – > > We have 40G connections tapped from our storage filers feeding multiple Bro > instances which analyze specifically only NFS and SMB traffic; all other > analyzers are disabled. With the broker patch we are seeing processing times > for a ~1GB pcap jump from around 2 seconds to over 1 minute. Profiling Bro, > it looks like the culprit is the new Actor functions -- > > # Before patch > Overhead Shared Object Symbol > 14.57% [kernel] [k] copy_user_enhanced_fast_string >3.20% bro [.] EventHandler::operator bool >2.99% bro [.] _siphash >2.89% bro [.] Dictionary::Lookup > > # After patch > Overhead Shared Object Symbol >5.71% [kernel] [k] native_write_msr_safe >3.84% libcaf_core.so.0.15.7 [.] > caf::scheduler::worker::run >3.71% libcaf_core.so.0.15.7 [.] > caf::detail::double_ended_queue::take_head >3.29% [kernel] [k] _raw_spin_lock > > Is the Bro development team still optimizing the Broker/Actor framework? It > might be helpful to have a way to disable Broker for those of us who haven’t > migrated to it yet. > > # ~1GB file time (old) > $ time /hostname/bro-devel/bin/bro -r > 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro > > real0m2.294s > user0m1.862s > sys 0m0.385s > > # ~1GB file time (new) > $ time /hostname/bro-devel/bin/bro -r > 20180606-1049-prodfilers-truncated_0_20180606104904.pcap master.bro > > real1m11.458s > user0m58.933s > sys 1m34.074s Well, I can at least reproduce that here on one of the standard test pcap files and no scripts loaded: $ /usr/local/bin/bro --version /usr/local/bin/bro version 2.5.2 $ time /usr/local/bin/bro -b -r ~/src/bro/testing/external/bro-testing/2009-M57-day11-18.trace real0m2.317s user0m1.874s sys 0m0.327s $ . ~/src/bro/build/bro-path-dev.sh $ bro --version bro version 2.5-648 $ time bro -b -r ~/src/bro/testing/external/bro-testing/2009-M57-day11-18.trace real1m5.523s user0m30.565s sys 0m32.661s and the same thing in perf: Samples: 270K of event 'cycles:u', Event count (approx.): 131984875188 Overhead Command Shared Object Symbol 8.95% bro libcaf_core.so.0.15.7 [.] caf::detail::double_ended_queue::take_tail 5.90% bro libcaf_core.so.0.15.7 [.] caf::detail::double_ended_queue::take_head 4.92% bro libcaf_core.so.0.15.7 [.] caf::scheduler::worker::run 2.67% bro libc-2.17.so [.] __GI___libc_nanosleep 1.98% bro libtcmalloc.so.4.4.5 [.] operator new[] 1.79% bro libc-2.17.so [.] __sleep 1.68% bro libtcmalloc.so.4.4.5 [.] tc_deletearray_nothrow 1.47% bro libc-2.17.so [.] __libc_disable_asynccancel 1.28% bro libstdc++.so.6.0.19 [.] std::this_thread::__sleep_for 1.10% bro libc-2.17.so [.] usleep 1.05% bro bro [.] ones_complement_checksum I haven't noticed a huge difference on a real multi process cluster, at least not a 30x change, so this is odd that pcap processing is so much slower. Especially since broker should be completely disabled when pcap files are being read and caf shouldn't even be doing anything. — Justin Azoff ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev