Add the USDT documentation and a bpftrace example using the
bridge run USDT probes.

Signed-off-by: Eelco Chaudron <[email protected]>
Acked-by: Paolo Valerio <[email protected]>
---
v2:
 - Fixed path naming
 - Added Paolo's Acked-by

 Documentation/automake.mk             |    1 
 Documentation/topics/index.rst        |    1 
 Documentation/topics/usdt-probes.rst  |  267 +++++++++++++++++++++++++++++++++
 NEWS                                  |    1 
 utilities/automake.mk                 |   13 +-
 utilities/usdt_scripts/bridge_loop.bt |  120 +++++++++++++++
 6 files changed, 397 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/topics/usdt-probes.rst
 create mode 100755 utilities/usdt_scripts/bridge_loop.bt

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 137cc57c5..9b9c72cff 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -58,6 +58,7 @@ DOC_SOURCE = \
        Documentation/topics/record-replay.rst \
        Documentation/topics/tracing.rst \
        Documentation/topics/userspace-tso.rst \
+       Documentation/topics/usdt-probes.rst \
        Documentation/topics/windows.rst \
        Documentation/howto/index.rst \
        Documentation/howto/dpdk.rst \
diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst
index d8ccbd757..7ca364340 100644
--- a/Documentation/topics/index.rst
+++ b/Documentation/topics/index.rst
@@ -55,3 +55,4 @@ OVS
    userspace-tso
    idl-compound-indexes
    ovs-extensions
+   usdt-probes
diff --git a/Documentation/topics/usdt-probes.rst 
b/Documentation/topics/usdt-probes.rst
new file mode 100644
index 000000000..345f078dd
--- /dev/null
+++ b/Documentation/topics/usdt-probes.rst
@@ -0,0 +1,267 @@
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+=============================================
+User Statically-Defined Tracing (USDT) probes
+=============================================
+
+Sometimes it's desired to troubleshoot one of OVS's components in the field.
+One of the techniques used is to add dynamic tracepoints, for example using
+perf_. However, the desired dynamic tracepoint and/or the desired variable,
+might not be available due to compiler optimizations.
+
+In this case, a well-thought-off, static tracepoint could be permanently added,
+so it's always available. For OVS we use the DTrace probe macro's, which have
+little to no overhead when disabled. Various tools exist to enable them. See
+some examples below.
+
+
+Compiling with USDT probes enabled
+----------------------------------
+
+Since USDT probes are compiled out by default, a compile-time option is
+available to include them. To add the probes to the generated code, use the
+following configure option ::
+
+    $ ./configure --enable-usdt-probes
+
+The following line should be seen in the configure output when the above option
+is used ::
+
+    checking whether USDT probes are enabled... yes
+
+
+Listing available probes
+------------------------
+
+There are various ways to display USDT probes available in a userspace
+application. Here we show three examples. All assuming ovs-vswitchd is in the
+search path with USDT probes enabled:
+
+You can use the **perf** tool as follows ::
+
+    $ perf buildid-cache --add $(which ovs-vswitchd)
+    $ perf list | grep sdt_
+      sdt_main:poll_block                                [SDT event]
+      sdt_main:run_start                                 [SDT event]
+
+You can use the bpftrace_ tool ::
+
+    # bpftrace -l "usdt:$(which ovs-vswitchd):*"
+    usdt:/usr/sbin/ovs-vswitchd:main:poll_block
+    usdt:/usr/sbin/ovs-vswitchd:main:run_start
+
+NOTE: If you execute this on a running process,
+``bpftrace -lp $(pidof ovs-vswitchd) "usdt:*"`` , it will list all USDT events,
+i.e., also the ones available in the used shared libraries.
+
+Finally, you can use the **tplist** tool which is part of the bcc_ framework ::
+
+    $ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd)
+    b'main':b'poll_block' [sema 0x0]
+      location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc
+    b'main':b'run_start' [sema 0x0]
+      location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6
+
+
+Using probes
+------------
+
+We will use the OVS sandbox environment in combination with the probes shown
+above to try out some of the available trace tools. To start up the virtual
+environment use the ``make sandbox`` command. In addition we have to create
+a bridge to kick of the main bridge loop ::
+
+    $ ovs-vsctl add-br test_bridge
+    $ ovs-vsctl show
+    055acdca-2f0c-4f6e-b542-f4b6d2c44e08
+        Bridge test_bridge
+            Port test_bridge
+                Interface test_bridge
+                    type: internal
+
+perf
+~~~~
+
+Perf is using Linux uprobe based event tracing to for capturing the events.
+To enable the main:\* probes as displayed above and take an actual trace, you
+need to execute the following sequence of perf commands ::
+
+    # perf buildid-cache --add $(which ovs-vswitchd)
+
+    # perf list | grep sdt_
+      sdt_main:poll_block                                [SDT event]
+      sdt_main:run_start                                 [SDT event]
+
+    # perf probe --add=sdt_main:poll_block --add=sdt_main:run_start
+    Added new events:
+      sdt_main:poll_block  (on %poll_block in /usr/sbin/ovs-vswitchd)
+      sdt_main:run_start   (on %run_start in /usr/sbin/ovs-vswitchd)
+
+    You can now use it in all perf tools, such as:
+
+      perf record -e sdt_main:run_start -aR sleep 1
+
+    # perf record -e sdt_main:run_start -e sdt_main:poll_block \
+        -p $(pidof ovs-vswitchd) sleep 30
+    [ perf record: Woken up 1 times to write data ]
+    [ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ]
+
+    # perf script
+        ovs-vswitchd  8576 [011] 21031.340433:  sdt_main:run_start: (407ff6)
+        ovs-vswitchd  8576 [011] 21031.340516: sdt_main:poll_block: (407fdc)
+        ovs-vswitchd  8576 [011] 21031.841726:  sdt_main:run_start: (407ff6)
+        ovs-vswitchd  8576 [011] 21031.842088: sdt_main:poll_block: (407fdc)
+    ...
+
+Note that the above examples works with the sandbox environment, so make sure
+you execute the above command while in the sandbox shell!
+
+There are a lot more options available with perf, for example, the
+``--call-graph dwarf`` option, which would give you a call graph in the
+``perf script`` output. See the perf documentation for more information.
+
+One other interesting feature is that the perf data can be converted for use
+by the trace visualizer `Trace Compass`_. This can be done using the
+``--all --to-ctf`` option to the ``perf data convert`` tool.
+
+
+bpftrace
+~~~~~~~~
+
+bpftrace is a high-level tracing language based on eBPF, which can be used to
+script USDT probes. Here we will show a simple one-liner to display the
+USDT probes being hit. However, the script section below reference some more
+advanced bpftrace scripts.
+
+This is a simple bpftrace one-liner to show all ``main:*`` USDT probes ::
+
+    # bpftrace -p $(pidof ovs-vswitchd) -e \
+        'usdt::main:* { printf("%s %u [%u] %u %s\n",
+          comm, pid, cpu, elapsed, probe); }'
+    Attaching 2 probes...
+    ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start
+    ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block
+    ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start
+    ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block
+
+
+bcc
+~~~
+
+The BPF Compiler Collection (BCC) is a set of tools and scripts that also use
+eBPF for tracing. The example below uses the ``trace`` tool to show the events
+while they are being generated ::
+
+    # /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \
+        'u::main:run_start' 'u::main:poll_block'
+    TIME     PID     TID     COMM            FUNC
+    15:49:06 8576    8576    ovs-vswitchd    main:run_start
+    15:49:06 8576    8576    ovs-vswitchd    main:poll_block
+    15:49:06 8576    8576    ovs-vswitchd    main:run_start
+    15:49:06 8576    8576    ovs-vswitchd    main:poll_block
+    ^C
+
+
+Scripts
+-------
+To not have to re-invent the wheel when trying to debug complex OVS issues, a
+set of scripts are provided in the source repository. They are located in the
+``utilities/usdt_scripts/`` directory, and each script contains detailed
+information on how they should be used, and what information they provide.
+
+
+Available probes
+----------------
+The next sections describes all the available probes, their use case, and if
+used in any script, which one. Any new probes being added to OVS should get
+their own section. See the below "Adding your own probes" section for the
+used naming convention.
+
+Available probes in ``ovs_vswitchd``:
+
+- main:poll_block
+- main:run_start
+
+
+probe main:run_start
+~~~~~~~~~~~~~~~~~~~~
+
+**Description**:
+The ovs-vswitchd's main process contains a loop that runs every time some work
+needs to be done. This probe gets triggered every time the loop starts from the
+beginning. See also the ``main:poll_block`` probe below.
+
+**Arguments**:
+
+*None*
+
+**Script references**:
+
+- ``utilities/usdt_scripts/bridge_loop.bt``
+
+
+probe main:poll_block
+~~~~~~~~~~~~~~~~~~~~~
+
+**Description**:
+The ovs-vswitchd's main process contains a loop that runs every time some work
+needs to be done. This probe gets triggered every time the loop is done, and
+it's about to wait for being re-started by a poll_block() call returning.
+See also the ``main:run_start`` probe above.
+
+**Arguments**:
+
+*None*
+
+**Script references**:
+
+- ``utilities/usdt_scripts/bridge_loop.bt``
+
+
+Adding your own probes
+----------------------
+
+Adding your own probes is as simple as adding the ``OVS_USDT_PROBE()`` macro
+to the code. It's similar to the ``DTRACE_PROBExx`` macro's with the difference
+that it does automatically determine the number of optional arguments.
+
+The macro requires at least two arguments. The first one being the *provider*,
+and the second one being the *name*. To keep some consistency with the probe
+naming, please use the following convention. The *provider* should be the
+function name, and the *name* should be the name of the tracepoint. If you do
+function entry and exit like probes, please use ``entry`` and ``exit``.
+
+If, for some reason, you do not like to use the function name as a *provider*,
+please prefix it with ``__``, so we know it's not a function name.
+
+The remaining parameters, up to 10, can be variables, pointers, etc., that
+might be of interest to capture at this point in the code. Note that the
+provided variables can cause the compiler to be less effective in optimizing.
+
+
+
+.. _perf : 
https://developers.redhat.com/blog/2020/05/29/debugging-vhost-user-tx-contention-in-open-vswitch#
+.. _bpftrace : https://github.com/iovisor/bpftrace
+.. _bcc : https://github.com/iovisor/bcc
+.. _Trace Compass : https://www.eclipse.org/tracecompass/
diff --git a/NEWS b/NEWS
index 2a4856c1a..903fd2dc1 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,7 @@ Post-v2.16.0
      * Default selection method for select groups with up to 256 buckets is
        now dp_hash.  Previously this was limited to 64 buckets.  This change
        is mainly for the benefit of OVN load balancing configurations.
+   - Add User Statically-Defined Tracing (USDT) probe framework support.
 
 
 v2.16.0 - 16 Aug 2021
diff --git a/utilities/automake.mk b/utilities/automake.mk
index e2e22c39a..55c7b0022 100644
--- a/utilities/automake.mk
+++ b/utilities/automake.mk
@@ -56,12 +56,13 @@ EXTRA_DIST += \
        utilities/ovs-vlan-test.in \
        utilities/ovs-vsctl-bashcomp.bash \
        utilities/checkpatch.py \
-        utilities/docker/Makefile \
-        utilities/docker/ovs-override.conf \
-        utilities/docker/start-ovs \
-        utilities/docker/create_ovs_db.sh \
-        utilities/docker/debian/Dockerfile \
-        utilities/docker/debian/build-kernel-modules.sh
+       utilities/docker/Makefile \
+       utilities/docker/ovs-override.conf \
+       utilities/docker/start-ovs \
+       utilities/docker/create_ovs_db.sh \
+       utilities/docker/debian/Dockerfile \
+       utilities/docker/debian/build-kernel-modules.sh \
+       utilities/usdt_scripts/bridge_loop.bt
 MAN_ROOTS += \
        utilities/ovs-testcontroller.8.in \
        utilities/ovs-dpctl.8.in \
diff --git a/utilities/usdt_scripts/bridge_loop.bt 
b/utilities/usdt_scripts/bridge_loop.bt
new file mode 100755
index 000000000..011626947
--- /dev/null
+++ b/utilities/usdt_scripts/bridge_loop.bt
@@ -0,0 +1,120 @@
+#!/usr/bin/env bpftrace
+/*
+ * Copyright (c) 2021 Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ *
+ * Script information:
+ * -------------------
+ * bridge_loop.bt uses the USDTs in the ovs-vswitchd's main bridge loop
+ * to report how long it spends running a single loop, as well as the time
+ * it spends waiting in the poll_loop(). Once done, it will also print a
+ * histogram for the time spent.
+ *
+ * The following is an example of how to use the script on the running
+ * ovs-vswitchd process:
+ *
+ *   $ bridge_loop.bt -p `pgrep -n ovs-vswitchd`
+ *     Attaching 4 probes...
+ *     --------------------------------------------------------------
+ *     Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.
+ *     --------------------------------------------------------------
+ *     - [886467@ovs-vswitchd] bridge run loop time  : 0:00:00.000230706
+ *     - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.501854292
+ *     - [886467@ovs-vswitchd] bridge run loop time  : 0:00:00.000266445
+ *     - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499750288
+ *     - [886467@ovs-vswitchd] bridge run loop time  : 0:00:00.000254856
+ *     - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499944280
+ *     - [886467@ovs-vswitchd] bridge run loop time  : 0:00:00.000267390
+ *     - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.093566288
+ *     - [886467@ovs-vswitchd] bridge run loop time  : 0:00:00.000316774
+ *     - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.406697754
+ *     - [886467@ovs-vswitchd] bridge run loop time  : 0:00:00.000264505
+ *     ^C
+ *     --------------------------------------------------------------
+ *     Showing run time histograms in micro seconds:
+ *     --------------------------------------------------------------
+ *
+ *     @bridge_run_time:
+ *     [0, 1000)              6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+ *
+ *
+ *     @poll_block_wait_time:
+ *     [90000, 120000)        1 |@@@@@@@@@@@@@@@@@                         |
+ *     [120000, 150000)       0 |                                          |
+ *     [150000, 180000)       0 |                                          |
+ *     [180000, 210000)       0 |                                          |
+ *     [210000, 240000)       0 |                                          |
+ *     [240000, 270000)       0 |                                          |
+ *     [270000, 300000)       0 |                                          |
+ *     [300000, 330000)       0 |                                          |
+ *     [330000, 360000)       0 |                                          |
+ *     [360000, 390000)       0 |                                          |
+ *     [390000, 420000)       1 |@@@@@@@@@@@@@@@@@                         |
+ *     [420000, 450000)       0 |                                          |
+ *     [450000, 480000)       0 |                                          |
+ *     [480000, 510000)       3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+ */
+
+
+BEGIN
+{
+    printf("--------------------------------------------------------------\n");
+    printf("Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.\n");
+    printf("--------------------------------------------------------------\n");
+}
+
+
+usdt::main:poll_block
+{
+    @pb_start[tid] = nsecs;
+    if (@rs_start[tid] != 0) {
+        $delta = nsecs - @rs_start[tid];
+        printf("- [%d@%s] bridge run loop time  : %u:%2.2u:%2.2u.%9.9u\n",
+               tid, comm,
+               $delta / 3600 / 1000000000,
+               $delta / 60   / 1000000000 % 60,
+               $delta / 1000000000 % 60,
+               $delta % 1000000000);
+        @bridge_run_time = lhist($delta / 1000, 0, 1000000, 1000);
+    }
+}
+
+
+usdt::main:run_start
+{
+    @rs_start[tid] = nsecs;
+    if (@pb_start[tid] != 0) {
+        $delta = nsecs - @pb_start[tid];
+        printf("- [%d@%s] poll_block() wait time: %u:%2.2u:%2.2u.%9.9u\n",
+               tid, comm,
+               $delta / 3600 / 1000000000,
+               $delta / 60   / 1000000000 % 60,
+               $delta / 1000000000 % 60,
+               $delta % 1000000000);
+        @poll_block_wait_time = lhist($delta / 1000, 0, 30000000, 30000);
+    }
+}
+
+
+END
+{
+    clear(@rs_start);
+    clear(@pb_start);
+
+    printf("\n");
+    printf("--------------------------------------------------------------\n");
+    printf("Showing run time histograms in micro seconds:\n");
+    printf("--------------------------------------------------------------");
+}

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to