On 13 Dec 2024, at 13:56, Adrián Moreno wrote:
> On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote: >> This patch installs a kernel return probe on ovs_dp_upcall() to record >> all successful and failed calls per CPU. The statistics are reported >> when the script exits, providing insights into the upcall behavior. >> >> This is an example output: >> >> # UPCALL STATISTICS: >> Total upcalls : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, ...] >> Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...] >> Failed upcalls : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] >> 11, EAGAIN : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...] >> >> Signed-off-by: Eelco Chaudron <[email protected]> > > Thanks for this patch. Correlating upcall failures to kernel delays is > extremely useful. Some comments below. Thanks Adrian for the review. I’ll send out a v2 in a minute, but see my responses below. //Eelco >> --- >> utilities/automake.mk | 1 >> utilities/usdt-scripts/kernel_delay.py | 127 >> +++++++++++++++++++++++++++++++ >> utilities/usdt-scripts/kernel_delay.rst | 9 ++ >> 3 files changed, 136 insertions(+), 1 deletion(-) >> >> diff --git a/utilities/automake.mk b/utilities/automake.mk >> index acc1af4e0..22260b254 100644 >> --- a/utilities/automake.mk >> +++ b/utilities/automake.mk >> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ >> utilities/ovs-pipegen.py \ >> utilities/usdt-scripts/dpif_op_nl_monitor.py \ >> utilities/usdt-scripts/flow_reval_monitor.py \ >> + utilities/usdt-scripts/kernel_delay.py \ >> utilities/usdt-scripts/upcall_monitor.py \ >> utilities/usdt-scripts/upcall_cost.py >> >> diff --git a/utilities/usdt-scripts/kernel_delay.py >> b/utilities/usdt-scripts/kernel_delay.py >> index de6b0c9de..a5aa1a3d4 100755 >> --- a/utilities/usdt-scripts/kernel_delay.py >> +++ b/utilities/usdt-scripts/kernel_delay.py >> @@ -39,6 +39,7 @@ >> # >> import argparse >> import datetime >> +import errno >> import os >> import pytz >> import psutil >> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit) >> data->start_ns = 0; >> return 0; >> } >> + >> + >> +/* >> + * For measuring upcall statistics (per CPU). >> + */ >> +#define MAX_ERROR 255 > > Why this limit? The kernel seems to use 4095 in the PTR_ERR > infrastructure. Although not exposed to user applications, > kernel has some > 255 values. They don't seem networking related for > now but still. > > In order to reduce this arbitrary limit and avoid needless iteration > over an almost empty array, I'd make it a percpu_hash. It supports > "lookup_or_try_init" and "increment" so it's as easy to use from ebpf > and almost identically from python. Good point, I will use a per-CPU hash table instead. >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 1); >> + >> +#if <INSTALL_OVS_DP_UPCALL_PROBE> >> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx) >> +{ > > When trying the script I was a bit confused that numbers didn't add up, > e.g: > > # UPCALL STATISTICS: > Total upcalls : 776234 [0: 204345, 1: 38773, 2: 38206, 3: > 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220, > 11: 53145] > Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3: > 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11: > 17023] > Failed upcalls : 508564 [0: 131516, 1: 25187, 2: 25330, 3: > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: > 36122] > 11, EAGAIN : 509485 [0: 131516, 1: 26112, 2: 25330, 3: > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11: > 36122] > > Total of EAGAIN erros != total Failed and > Total Failed + Total Successfull != Total. > > I believe the reason is that events of this kind are still being > triggered while results are being processed. The problem seems to go > away if I add the following lines at the top of this probe: > > if (!capture_enabled__()) > return 0; ACK, I will fix this, including resetting the values between runs… >> + int ret = PT_REGS_RC(ctx); >> + >> + if (ret > 0) >> + ret = 0; >> + >> + ret = -ret; >> + if (ret > MAX_ERROR) >> + ret = MAX_ERROR; >> + >> + upcall_count.increment(ret); >> + return 0; >> +} >> +#endif /* For measuring upcall statistics/errors. */ >> """ >> >> >> @@ -900,6 +925,95 @@ def print_timestamp(msg): >> print(time_string) >> >> >> +# >> +# Get errno short string >> +# >> +def get_errno_short(err): >> + try: >> + return errno.errorcode[err] >> + except KeyError: >> + return "_unknown_" >> + >> + >> +# >> +# Format a bfd per cpu array entry per cpu (if the count is > 0) >> +# > > s/bfd/bpf? > Also, "per cpu" twice makes the sentense a bit difficult to read. Ack > >> +def format_per_cpu_array(cpu_array, key=None, skip_key=None): >> + per_cpu = "" >> + >> + if key is not None: > > "is not None" is redundant: "if key:" is equivalent. This and the one below are not the same for my code; ‘if key:’ behaves incorrectly: # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]): Total upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] Successfull upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] Failed upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 8930,...] 11, EAGAIN : 59526 [0: 7701, 2: 7580, 4: 4443, 6: 2745, 8: 3013,...] vs ‘if key is not None’: # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]): Total upcalls : 209023 [0: 16271, 2: 14830, 4: 11787, 6: 10025, 8: 4875,...] Successfull upcalls : 127677 [0: 6342, 2: 6250, 4: 5547, 6: 5464, 8: 4875,...] Failed upcalls : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...] 11, EAGAIN : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 4602,...] >> + total = cpu_array.sum(key).value >> + if total > 0: >> + for cpu, value in enumerate(cpu_array.getvalue(key)): >> + if value == 0: >> + continue >> + >> + per_cpu += " {}: {},".format(cpu, value) >> + else: >> + total = 0 >> + total_cpu = None >> + >> + for key in cpu_array.keys(): > > See below. > >> + # I've seen extrem odd hight values, to avoid those, exclude any > > typo? "extremely odd high values"? > >> + # index higher thant 1024 for now. > > What values you were getting? I'm not seing any weird value. > Besides, isn't the maximum set in the ebpf side to 255? > Plus, it's an array so its size is limitted and no key can exceed 256 > right? This was a while back, I cant remember if it was a specific kernel or BCC version. I would get indexes that look like integer overlows. However with the hash implementation this is not longer requered. >> + if (not (0 <= key.value <= 1024)) or (skip_key is not None and >> + skip_key.value == >> key.value): > > ditto: key.value in range(0, 1024) > s/skip_key is not None/skip_key/? > > OTOH, from include/linux/err.h: > #define MAX_ERRNO 4095 > > Maybe that value makes more sense although I really don't think we need > a check here. See above >> + continue > >> + >> + if total_cpu is None: > > if not total_cpu >> + total_cpu = [0] * len(cpu_array.getvalue(key)) >> + >> + for cpu, value in enumerate(cpu_array.getvalue(key)): >> + total_cpu[cpu] += value >> + total += value >> + >> + if total >= 0: >> + for cpu, value in enumerate(total_cpu): >> + if value == 0: >> + continue >> + >> + per_cpu += " {}: {},".format(cpu, value) >> + >> + return str(total), per_cpu.strip(", ") >> + >> + >> +# >> +# Display kernel upcall statistics >> +# >> +def display_upcall_results(): >> + upcalls = bpf.get_table("upcall_count") > > From bcc's reference: get_table() > Returns a table object. This is no longer used, as tables can now be > read as items from BPF. Eg: ```BPF[name]```. > > Not sure if this means the API will be deprecated... Changed it, let not wait for the deprecation notice ;) >> + have_upcalls = False >> + >> + for k in upcalls.keys(): > > Interestingly, "keys()" is not documented as an API. Nevertheless, it's > not necessary, since "table" iterates over keys (same as list or dict in > python). So it could be written as "for k in upcalls:". Good catch; changed here and below. >> + if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0: > > I think the pythonic way is "if k.value not in range(0, 255). Still how > would it be possible to read keys greater than 255 if it's being limited > at the ebpf side? See above. >> + continue >> + have_upcalls = True >> + break >> + >> + if not have_upcalls: >> + return >> + >> + print("\n\n# UPCALL STATISTICS:\n Total upcalls : {} >> [{}]".format( >> + *format_per_cpu_array(upcalls))) >> + > > When looking at theese statistics, users may take a while to figure out > what each number means, e.g: > > # UPCALL STATISTICS: > Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] > Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] > Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] > 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 > > An option could be to describe the format in the header, e.g: > > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]) > Total upcalls : 240933 [0: 33165, 2: 27495, 3: 18270, 4: > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345] > Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5: > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475] > Failed upcalls : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870] > 11, EAGAIN : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584, > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387 I went for the above, as the CPU set might not be the same in all lines, and I would like to condense this as much as possible (I was testing on 64 CPUs ;). > Another option could be to print them in a table: > > Result TOTAL CPU 0 CPU 2 CPU 3 CPU 4 CPU 5 > CPU 6 CPU 8 CPU 9 CPU 10 > -------------- --------- ------- ------- -------- ------- ------- > ------- ------- -------- -------- > TOTAL 240933 33165 27495 18270 15300 33885 > 4050 51348 33075 24345 > Successful 80538 11804 8382 5660 5716 11606 > 2845 13717 10333 10475 > Failed 161587 21361 19113 12610 9584 22279 > 2397 37631 22742 13870 > 11, EAGAIN 162528 21361 19113 12610 9584 22279 > 3345 37631 22742 1387 > > > > >> + for k in upcalls.keys(): > > nit: Same as above, "for k in upcalls:" Ack >> + error = k.value >> + total, per_cpu = format_per_cpu_array(upcalls, key=k) >> + >> + if error != 0 and total == "0": > > Maybe return the total as integer to avoid this strange looking '== "0"'? Ack, yes was too much C-minded here always returning a string… >> + continue >> + >> + if error == 0: >> + print(" Successfull upcalls : {} [{}]".format(total, per_cpu)) >> + >> + total, per_cpu = format_per_cpu_array(upcalls, skip_key=k) >> + print(" Failed upcalls : {} [{}]".format(total, per_cpu)) >> + else: >> + print(" {:3}, {:13}: {} [{}]".format(error, >> + get_errno_short(error), >> + total, per_cpu)) >> + >> + >> # >> # process_results() >> # >> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, >> trigger_delta=None): >> indent, "TOTAL:", "", total_count, total_ns)) >> >> # >> - # Print events >> + # Print upcall statistics >> + # >> + display_upcall_results() >> + >> + # >> + # Print syscall events >> # >> lost_stack_traces = 0 >> if syscall_events: >> @@ -1194,6 +1313,9 @@ def main(): >> parser.add_argument("--skip-syscall-poll-events", >> help="Skip poll() syscalls with --syscall-events", >> action="store_true") >> + parser.add_argument("--skip-upcall-stats", >> + help="Skip the collection of upcall statistics", >> + action="store_true") >> parser.add_argument("--stack-trace-size", >> help="Number of unique stack traces that can be " >> "recorded, default 4096. 0 to disable", >> @@ -1298,6 +1420,9 @@ def main(): >> source = source.replace("<STACK_TRACE_ENABLED>", "true" >> if options.stack_trace_size > 0 else "false") >> >> + source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0" >> + if options.skip_upcall_stats else "1") >> + >> # >> # Handle start/stop probes >> # >> diff --git a/utilities/usdt-scripts/kernel_delay.rst >> b/utilities/usdt-scripts/kernel_delay.rst >> index 0f6f916a7..41620d760 100644 >> --- a/utilities/usdt-scripts/kernel_delay.rst >> +++ b/utilities/usdt-scripts/kernel_delay.rst >> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are: >> - ``THREAD STOPPED STATISTICS`` >> - ``HARD IRQ STATISTICS`` >> - ``SOFT IRQ STATISTICS`` >> +- ``UPCALL STATISTICS`` >> >> The following sections will describe in detail what statistics they report. >> >> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time spent >> in the interrupt >> handler (``TOTAL ns``), and the worst-case duration (``MAX ns``). >> >> >> +``UPCALL STATISTICS`` >> +~~~~~~~~~~~~~~~~~~~~~ >> +The ``UPCALL STATISTICS`` section provides information on the number of >> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent, >> +the specific return codes are recorded. Statistics are presented both as >> +a total and on a per-CPU basis. >> + >> + >> The ``--syscall-events`` option >> ------------------------------- >> In addition to reporting global syscall statistics in >> ``SYSCALL_STATISTICS``, >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
