On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote:
> This patch installs a kernel return probe on ovs_dp_upcall() to record
> all successful and failed calls per CPU. The statistics are reported
> when the script exits, providing insights into the upcall behavior.
>
> This is an example output:
>
> # UPCALL STATISTICS:
> Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...]
> Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
> Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
> 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
>
> Signed-off-by: Eelco Chaudron <[email protected]>
Thanks for this patch. Correlating upcall failures to kernel delays is
extremely useful. Some comments below.
> ---
> utilities/automake.mk | 1
> utilities/usdt-scripts/kernel_delay.py | 127
> +++++++++++++++++++++++++++++++
> utilities/usdt-scripts/kernel_delay.rst | 9 ++
> 3 files changed, 136 insertions(+), 1 deletion(-)
>
> diff --git a/utilities/automake.mk b/utilities/automake.mk
> index acc1af4e0..22260b254 100644
> --- a/utilities/automake.mk
> +++ b/utilities/automake.mk
> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
> utilities/ovs-pipegen.py \
> utilities/usdt-scripts/dpif_op_nl_monitor.py \
> utilities/usdt-scripts/flow_reval_monitor.py \
> + utilities/usdt-scripts/kernel_delay.py \
> utilities/usdt-scripts/upcall_monitor.py \
> utilities/usdt-scripts/upcall_cost.py
>
> diff --git a/utilities/usdt-scripts/kernel_delay.py
> b/utilities/usdt-scripts/kernel_delay.py
> index de6b0c9de..a5aa1a3d4 100755
> --- a/utilities/usdt-scripts/kernel_delay.py
> +++ b/utilities/usdt-scripts/kernel_delay.py
> @@ -39,6 +39,7 @@
> #
> import argparse
> import datetime
> +import errno
> import os
> import pytz
> import psutil
> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
> data->start_ns = 0;
> return 0;
> }
> +
> +
> +/*
> + * For measuring upcall statistics (per CPU).
> + */
> +#define MAX_ERROR 255
Why this limit? The kernel seems to use 4095 in the PTR_ERR
infrastructure. Although not exposed to user applications,
kernel has some > 255 values. They don't seem networking related for
now but still.
In order to reduce this arbitrary limit and avoid needless iteration
over an almost empty array, I'd make it a percpu_hash. It supports
"lookup_or_try_init" and "increment" so it's as easy to use from ebpf
and almost identically from python.
> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1);
> +
> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
> +{
When trying the script I was a bit confused that numbers didn't add up,
e.g:
# UPCALL STATISTICS:
Total upcalls : 776234 [0: 204345, 1: 38773, 2: 38206, 3:
38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220,
11: 53145]
Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3:
13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11:
17023]
Failed upcalls : 508564 [0: 131516, 1: 25187, 2: 25330, 3:
25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
36122]
11, EAGAIN : 509485 [0: 131516, 1: 26112, 2: 25330, 3:
25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
36122]
Total of EAGAIN erros != total Failed and
Total Failed + Total Successfull != Total.
I believe the reason is that events of this kind are still being
triggered while results are being processed. The problem seems to go
away if I add the following lines at the top of this probe:
if (!capture_enabled__())
return 0;
> + int ret = PT_REGS_RC(ctx);
> +
> + if (ret > 0)
> + ret = 0;
> +
> + ret = -ret;
> + if (ret > MAX_ERROR)
> + ret = MAX_ERROR;
> +
> + upcall_count.increment(ret);
> + return 0;
> +}
> +#endif /* For measuring upcall statistics/errors. */
> """
>
>
> @@ -900,6 +925,95 @@ def print_timestamp(msg):
> print(time_string)
>
>
> +#
> +# Get errno short string
> +#
> +def get_errno_short(err):
> + try:
> + return errno.errorcode[err]
> + except KeyError:
> + return "_unknown_"
> +
> +
> +#
> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
> +#
s/bfd/bpf?
Also, "per cpu" twice makes the sentense a bit difficult to read.
> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
> + per_cpu = ""
> +
> + if key is not None:
"is not None" is redundant: "if key:" is equivalent.
> + total = cpu_array.sum(key).value
> + if total > 0:
> + for cpu, value in enumerate(cpu_array.getvalue(key)):
> + if value == 0:
> + continue
> +
> + per_cpu += " {}: {},".format(cpu, value)
> + else:
> + total = 0
> + total_cpu = None
> +
> + for key in cpu_array.keys():
See below.
> + # I've seen extrem odd hight values, to avoid those, exclude any
typo? "extremely odd high values"?
> + # index higher thant 1024 for now.
What values you were getting? I'm not seing any weird value.
Besides, isn't the maximum set in the ebpf side to 255?
Plus, it's an array so its size is limitted and no key can exceed 256
right?
> + if (not (0 <= key.value <= 1024)) or (skip_key is not None and
> + skip_key.value ==
> key.value):
ditto: key.value in range(0, 1024)
s/skip_key is not None/skip_key/?
OTOH, from include/linux/err.h:
#define MAX_ERRNO 4095
Maybe that value makes more sense although I really don't think we need
a check here.
> + continue
> +
> + if total_cpu is None:
if not total_cpu
> + total_cpu = [0] * len(cpu_array.getvalue(key))
> +
> + for cpu, value in enumerate(cpu_array.getvalue(key)):
> + total_cpu[cpu] += value
> + total += value
> +
> + if total >= 0:
> + for cpu, value in enumerate(total_cpu):
> + if value == 0:
> + continue
> +
> + per_cpu += " {}: {},".format(cpu, value)
> +
> + return str(total), per_cpu.strip(", ")
> +
> +
> +#
> +# Display kernel upcall statistics
> +#
> +def display_upcall_results():
> + upcalls = bpf.get_table("upcall_count")
>From bcc's reference: get_table()
Returns a table object. This is no longer used, as tables can now be
read as items from BPF. Eg: ```BPF[name]```.
Not sure if this means the API will be deprecated...
> + have_upcalls = False
> +
> + for k in upcalls.keys():
Interestingly, "keys()" is not documented as an API. Nevertheless, it's
not necessary, since "table" iterates over keys (same as list or dict in
python). So it could be written as "for k in upcalls:".
> + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
I think the pythonic way is "if k.value not in range(0, 255). Still how
would it be possible to read keys greater than 255 if it's being limited
at the ebpf side?
> + continue
> + have_upcalls = True
> + break
> +
> + if not have_upcalls:
> + return
> +
> + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} [{}]".format(
> + *format_per_cpu_array(upcalls)))
> +
When looking at theese statistics, users may take a while to figure out
what each number means, e.g:
# UPCALL STATISTICS:
Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
An option could be to describe the format in the header, e.g:
# UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...])
Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
Another option could be to print them in a table:
Result TOTAL CPU 0 CPU 2 CPU 3 CPU 4 CPU 5
CPU 6 CPU 8 CPU 9 CPU 10
-------------- --------- ------- ------- -------- ------- -------
------- ------- -------- --------
TOTAL 240933 33165 27495 18270 15300 33885
4050 51348 33075 24345
Successful 80538 11804 8382 5660 5716 11606
2845 13717 10333 10475
Failed 161587 21361 19113 12610 9584 22279
2397 37631 22742 13870
11, EAGAIN 162528 21361 19113 12610 9584 22279
3345 37631 22742 1387
> + for k in upcalls.keys():
nit: Same as above, "for k in upcalls:"
> + error = k.value
> + total, per_cpu = format_per_cpu_array(upcalls, key=k)
> +
> + if error != 0 and total == "0":
Maybe return the total as integer to avoid this strange looking '== "0"'?
> + continue
> +
> + if error == 0:
> + print(" Successfull upcalls : {} [{}]".format(total, per_cpu))
> +
> + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
> + print(" Failed upcalls : {} [{}]".format(total, per_cpu))
> + else:
> + print(" {:3}, {:13}: {} [{}]".format(error,
> + get_errno_short(error),
> + total, per_cpu))
> +
> +
> #
> # process_results()
> #
> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None,
> trigger_delta=None):
> indent, "TOTAL:", "", total_count, total_ns))
>
> #
> - # Print events
> + # Print upcall statistics
> + #
> + display_upcall_results()
> +
> + #
> + # Print syscall events
> #
> lost_stack_traces = 0
> if syscall_events:
> @@ -1194,6 +1313,9 @@ def main():
> parser.add_argument("--skip-syscall-poll-events",
> help="Skip poll() syscalls with --syscall-events",
> action="store_true")
> + parser.add_argument("--skip-upcall-stats",
> + help="Skip the collection of upcall statistics",
> + action="store_true")
> parser.add_argument("--stack-trace-size",
> help="Number of unique stack traces that can be "
> "recorded, default 4096. 0 to disable",
> @@ -1298,6 +1420,9 @@ def main():
> source = source.replace("<STACK_TRACE_ENABLED>", "true"
> if options.stack_trace_size > 0 else "false")
>
> + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
> + if options.skip_upcall_stats else "1")
> +
> #
> # Handle start/stop probes
> #
> diff --git a/utilities/usdt-scripts/kernel_delay.rst
> b/utilities/usdt-scripts/kernel_delay.rst
> index 0f6f916a7..41620d760 100644
> --- a/utilities/usdt-scripts/kernel_delay.rst
> +++ b/utilities/usdt-scripts/kernel_delay.rst
> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
> - ``THREAD STOPPED STATISTICS``
> - ``HARD IRQ STATISTICS``
> - ``SOFT IRQ STATISTICS``
> +- ``UPCALL STATISTICS``
>
> The following sections will describe in detail what statistics they report.
>
> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent
> in the interrupt
> handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
>
>
> +``UPCALL STATISTICS``
> +~~~~~~~~~~~~~~~~~~~~~
> +The ``UPCALL STATISTICS`` section provides information on the number of
> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
> +the specific return codes are recorded. Statistics are presented both as
> +a total and on a per-CPU basis.
> +
> +
> The ``--syscall-events`` option
> -------------------------------
> In addition to reporting global syscall statistics in ``SYSCALL_STATISTICS``,
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev