Although I have quite a bit more work to do on this, I wrote a preliminary
document, reproduced below:
Using Profile Guided Optimization with Zeek (draft)
Background
At Brocon 2017, Packetsled gave a lightning talk that included, among other
items, information on strategically inserting likely() & unlikely() macros
into (then) Bro source code, which boosted performance a reported 3%. This
code was later released on github as ‘Community Bro’, along with other
modifications. For various reasons, detailed below, we did not use this
strategy to test performance enhancements, electing to go with automatic
code profiling, which yielded a performance increase greater than 14%.
Discussion
There exist (at least) two strategies for enhancing code performance by
optimization of compiled code.
First, likely()/unlikely() macros can be manually inserted, typically in if
statements, to signal the likelihood of the condition being met. This
hinting provides the opportunity for compiler technology to efficiently
organize the assembly code to avoid branch mispredictions and cache misses
for the most common cases. This capability is supported in the gcc & clang
compilers. For other compilers, the macros can be set up to be no-ops.
The likely()/unlikely() macros are used extensively in the Linux kernel to
(hopefully) improve efficiency. A 2008 blog post (
https://bitsup.blogspot.com/2008/04/measuring-performance-of-linux-kernel.html)
disputes the value of using these macros in the kernel, although both
kernel & compiler technology have advanced. In any event, they have
remained in the kernel. More information on these macros available at:
https://kernelnewbies.org/FAQ/LikelyUnlikely
The gcc documentation indicates the following: In general, you should
prefer to use actual profile feedback for this (`-fprofile-arcs'), as
programmers
are notoriously bad at predicting how their programs actually perform.
However, there are applications in which this data is hard to collect.
As the gcc documentation snippet above indicates, it is possible to
automatically collect information on branches taken on a test run with
representative data, and use that information to compile production code
with branch prediction. The test run is compiled to collect statistics on
each branch and function call. Of course, the overhead is significant for
such a test run, however, the data gathered is extremely valuable.
Fortunately, performing these steps is relatively painless, as detailed
below.
Instrumenting zeek
Step 1: Perform a baseline run in standalone mode using a default
compile (./configure
--build-type=Release), against a test pcap (I used a ~150gig pcap cobbled
together from public domain sources), capturing run times. (
--build-type=Release compiles with -O3 optimization.)
Step 2: An instrumented version of zeek is compiled, using:
CFLAGS='--coverage' CXXFLAGS='--coverage' ./configure --build-type=Release
make
make install
-
(The --coverage flag to CFLAGS/CXXFLAGS includes -fprofile-arcs, as
described above, as well as other data capture options). The instrumented
zeek can be run against the same pcap, or if desired, against live network
traffic in standalone mode, or both. Running against live traffic will
exercise the networking code. Multiple runs can be made against various
sources, so both pcaps and live network traffic can be used, as the
profiling code will update the profiling files if they already exist in the
source tree.
-
Standalone mode is more convenient, as otherwise cluster nodes on the
same physical box will overwrite each other’s profile data. This can
possibly be overcome by passing environment variables to each cluster node,
which can specify different locations for each node’s profiling data. I
didn’t go down this road, as:
-
Communication overhead probably dwarfs any potential savings in
branch prediction.
-
Communication is likely only a small fraction of total zeek CPU
anyway.
-
Custom code would need to be written to merge profiling data from
multiple sources.
-
When the instrumented zeek is stopped, it will output profile
information in the build directory of the source tree as *.gcno & *.gcda
files. It is probably a good idea to make a backup copy of the source
tree, in case of problems with the following.
Step 3: Zeek can then be recompiled to take advantage of this profiling
information:
cd bro-2.6.x (top level of zeek source directory)
tar cvf gc.tar `find . -name '*.gc*'` (tarball of the *.gcno & *.gcda files)
make distclean (clear all vestiges of prior build)
CFLAGS='-fprofile-use -fprofile-correction -flto' CXXFLAGS='-fprofile-use
-fprofile-correction -flto' ./configure --build-type=Release
tar xvf gc.tar (restore profiling information into build tree)
make
make install
Run the newly compiled zeek against the initial test pcap & capture run
times.
-