On Tue, Dec 17, 2024 at 03:30:51PM +0100, Eelco Chaudron wrote:
>
>
> On 13 Dec 2024, at 13:56, Adrián Moreno wrote:
>
> > On Mon, Nov 25, 2024 at 03:14:20PM +0100, Eelco Chaudron wrote:
> >> This patch installs a kernel return probe on ovs_dp_upcall() to record
> >> all successful and failed calls per CPU. The statistics are reported
> >> when the script exits, providing insights into the upcall behavior.
> >>
> >> This is an example output:
> >>
> >>   # UPCALL STATISTICS:
> >>     Total upcalls       : 183247 [0: 8937, 2: 14464, 4: 10372, 6: 4254, 
> >> ...]
> >>     Successfull upcalls : 120195 [0: 5945, 2: 6379, 4: 5966, 6: 4254, ...]
> >>     Failed upcalls      : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
> >>        11, EAGAIN       : 63052 [0: 2992, 2: 8085, 4: 4406, 8: 2275, ...]
> >>
> >> Signed-off-by: Eelco Chaudron <[email protected]>
> >
> > Thanks for this patch. Correlating upcall failures to kernel delays is
> > extremely useful. Some comments below.
>
> Thanks Adrian for the review. I’ll send out a v2 in a minute, but see my 
> responses below.
>
> //Eelco
>
> >> ---
> >>  utilities/automake.mk                   |    1
> >>  utilities/usdt-scripts/kernel_delay.py  |  127 
> >> +++++++++++++++++++++++++++++++
> >>  utilities/usdt-scripts/kernel_delay.rst |    9 ++
> >>  3 files changed, 136 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/utilities/automake.mk b/utilities/automake.mk
> >> index acc1af4e0..22260b254 100644
> >> --- a/utilities/automake.mk
> >> +++ b/utilities/automake.mk
> >> @@ -149,6 +149,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
> >>    utilities/ovs-pipegen.py \
> >>    utilities/usdt-scripts/dpif_op_nl_monitor.py \
> >>    utilities/usdt-scripts/flow_reval_monitor.py \
> >> +  utilities/usdt-scripts/kernel_delay.py \
> >>    utilities/usdt-scripts/upcall_monitor.py \
> >>    utilities/usdt-scripts/upcall_cost.py
> >>
> >> diff --git a/utilities/usdt-scripts/kernel_delay.py 
> >> b/utilities/usdt-scripts/kernel_delay.py
> >> index de6b0c9de..a5aa1a3d4 100755
> >> --- a/utilities/usdt-scripts/kernel_delay.py
> >> +++ b/utilities/usdt-scripts/kernel_delay.py
> >> @@ -39,6 +39,7 @@
> >>  #
> >>  import argparse
> >>  import datetime
> >> +import errno
> >>  import os
> >>  import pytz
> >>  import psutil
> >> @@ -556,6 +557,30 @@ TRACEPOINT_PROBE(irq, softirq_exit)
> >>      data->start_ns = 0;
> >>      return 0;
> >>  }
> >> +
> >> +
> >> +/*
> >> + * For measuring upcall statistics (per CPU).
> >> + */
> >> +#define MAX_ERROR 255
> >
> > Why this limit? The kernel seems to use 4095 in the PTR_ERR
> > infrastructure. Although not exposed to user applications,
> > kernel has some > 255 values. They don't seem networking related for
> > now but still.
> >
> > In order to reduce this arbitrary limit and avoid needless iteration
> > over an almost empty array, I'd make it a percpu_hash. It supports
> > "lookup_or_try_init" and "increment" so it's as easy to use from ebpf
> > and almost identically from python.
>
> Good point, I will use a per-CPU hash table instead.
>
> >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, upcall_count, MAX_ERROR + 
> >> 1);
> >> +
> >> +#if <INSTALL_OVS_DP_UPCALL_PROBE>
> >> +int kretprobe__ovs_dp_upcall(struct pt_regs *ctx)
> >> +{
> >
> > When trying the script I was a bit confused that numbers didn't add up,
> > e.g:
> >
> > # UPCALL STATISTICS:
> >   Total upcalls       : 776234 [0: 204345, 1: 38773, 2: 38206, 3:
> > 38700, 4: 104040, 5: 77400, 6: 70290, 8: 69705, 9: 67410, 10: 14220,
> > 11: 53145]
> >   Successfull upcalls : 268640 [0: 72829, 1: 14556, 2: 12876, 3:
> > 13192, 4: 34227, 5: 26480, 6: 24130, 8: 23791, 9: 23372, 10: 6164, 11:
> > 17023]
> >   Failed upcalls      : 508564 [0: 131516, 1: 25187, 2: 25330, 3:
> > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
> > 36122]
> >      11, EAGAIN       : 509485 [0: 131516, 1: 26112, 2: 25330, 3:
> > 25508, 4: 69813, 5: 50920, 6: 46160, 8: 45914, 9: 44038, 10: 8056, 11:
> > 36122]
> >
> > Total of EAGAIN erros != total Failed and
> > Total Failed + Total Successfull != Total.
> >
> > I believe the reason is that events of this kind are still being
> > triggered while results are being processed. The problem seems to go
> > away if I add the following lines at the top of this probe:
> >
> > if (!capture_enabled__())
> >     return 0;
>
> ACK, I will fix this, including resetting the values between runs…
>
> >> +    int ret = PT_REGS_RC(ctx);
> >> +
> >> +    if (ret > 0)
> >> +        ret = 0;
> >> +
> >> +    ret = -ret;
> >> +    if (ret > MAX_ERROR)
> >> +        ret = MAX_ERROR;
> >> +
> >> +    upcall_count.increment(ret);
> >> +    return 0;
> >> +}
> >> +#endif /* For measuring upcall statistics/errors. */
> >>  """
> >>
> >>
> >> @@ -900,6 +925,95 @@ def print_timestamp(msg):
> >>      print(time_string)
> >>
> >>
> >> +#
> >> +# Get errno short string
> >> +#
> >> +def get_errno_short(err):
> >> +    try:
> >> +        return errno.errorcode[err]
> >> +    except KeyError:
> >> +        return "_unknown_"
> >> +
> >> +
> >> +#
> >> +# Format a bfd per cpu array entry per cpu (if the count is > 0)
> >> +#
> >
> > s/bfd/bpf?
> > Also, "per cpu" twice makes the sentense a bit difficult to read.
>
> Ack
>
> >
> >> +def format_per_cpu_array(cpu_array, key=None, skip_key=None):
> >> +    per_cpu = ""
> >> +
> >> +    if key is not None:
> >
> > "is not None" is redundant: "if key:" is equivalent.
>
> This and the one below are not the same for my code; ‘if key:’ behaves 
> incorrectly:
>

Ugh, you're right. If key is zero the check fails.

> # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]):
>   Total upcalls       : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 
> 8930,...]
>   Successfull upcalls : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 
> 8930,...]
>   Failed upcalls      : 182611 [0: 14506, 2: 13830, 4: 10348, 6: 8752, 8: 
> 8930,...]
>      11, EAGAIN       : 59526 [0: 7701, 2: 7580, 4: 4443, 6: 2745, 8: 
> 3013,...]
>
> vs ‘if key is not None’:
>
> # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...]):
>   Total upcalls       : 209023 [0: 16271, 2: 14830, 4: 11787, 6: 10025, 8: 
> 4875,...]
>   Successfull upcalls : 127677 [0: 6342, 2: 6250, 4: 5547, 6: 5464, 8: 
> 4875,...]
>   Failed upcalls      : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 
> 4602,...]
>      11, EAGAIN       : 81346 [0: 9929, 2: 8580, 4: 6240, 6: 4561, 10: 
> 4602,...]
>
>
> >> +        total = cpu_array.sum(key).value
> >> +        if total > 0:
> >> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> >> +                if value == 0:
> >> +                    continue
> >> +
> >> +                per_cpu += " {}: {},".format(cpu, value)
> >> +    else:
> >> +        total = 0
> >> +        total_cpu = None
> >> +
> >> +        for key in cpu_array.keys():
> >
> > See below.
> >
> >> +            # I've seen extrem odd hight values, to avoid those, exclude 
> >> any
> >
> > typo? "extremely odd high values"?
> >
> >> +            # index higher thant 1024 for now.
> >
> > What values you were getting? I'm not seing any weird value.
> > Besides, isn't the maximum set in the ebpf side to 255?
> > Plus, it's an array so its size is limitted and no key can exceed 256
> > right?
>
> This was a while back, I cant remember if it was a specific kernel or BCC 
> version.
> I would get indexes that look like integer overlows. However with the hash 
> implementation this is not longer requered.
>
> >> +            if (not (0 <= key.value <= 1024)) or (skip_key is not None and
> >> +                                                  skip_key.value == 
> >> key.value):
> >
> > ditto: key.value in range(0, 1024)
> > s/skip_key is not None/skip_key/?
> >
> > OTOH, from include/linux/err.h:
> > #define MAX_ERRNO   4095
> >
> > Maybe that value makes more sense although I really don't think we need
> > a check here.
>
> See above
> >> +                continue
> >
> >> +
> >> +            if total_cpu is None:
> >
> > if not total_cpu
> >> +                total_cpu = [0] * len(cpu_array.getvalue(key))
> >> +
> >> +            for cpu, value in enumerate(cpu_array.getvalue(key)):
> >> +                total_cpu[cpu] += value
> >> +                total += value
> >> +
> >> +        if total >= 0:
> >> +            for cpu, value in enumerate(total_cpu):
> >> +                if value == 0:
> >> +                    continue
> >> +
> >> +                per_cpu += " {}: {},".format(cpu, value)
> >> +
> >> +    return str(total), per_cpu.strip(", ")
> >> +
> >> +
> >> +#
> >> +# Display kernel upcall statistics
> >> +#
> >> +def display_upcall_results():
> >> +    upcalls = bpf.get_table("upcall_count")
> >
> > From bcc's reference: get_table()
> > Returns a table object. This is no longer used, as tables can now be
> > read as items from BPF. Eg: ```BPF[name]```.
> >
> > Not sure if this means the API will be deprecated...
>
> Changed it, let not wait for the deprecation notice ;)
>
> >> +    have_upcalls = False
> >> +
> >> +    for k in upcalls.keys():
> >
> > Interestingly, "keys()" is not documented as an API. Nevertheless, it's
> > not necessary, since "table" iterates over keys (same as list or dict in
> > python). So it could be written as "for k in upcalls:".
>
> Good catch; changed here and below.
>
> >> +        if not (0 <= k.value <= 255) or upcalls.sum(k).value == 0:
> >
> > I think the pythonic way is "if k.value not in range(0, 255). Still how
> > would it be possible to read keys greater than 255 if it's being limited
> > at the ebpf side?
>
> See above.
>
> >> +            continue
> >> +        have_upcalls = True
> >> +        break
> >> +
> >> +    if not have_upcalls:
> >> +        return
> >> +
> >> +    print("\n\n# UPCALL STATISTICS:\n  Total upcalls       : {} 
> >> [{}]".format(
> >> +        *format_per_cpu_array(upcalls)))
> >> +
> >
> > When looking at theese statistics, users may take a while to figure out
> > what each number means, e.g:
> >
> > # UPCALL STATISTICS:
> >   Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
> > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
> >   Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
> > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
> >   Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
> >      11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
> >
> > An option could be to describe the format in the header, e.g:
> >
> > # UPCALL STATISTICS (TOTAL [CPU_ID: UPCALL_PER_CPU, ...])
> >   Total upcalls       : 240933 [0: 33165, 2: 27495, 3: 18270, 4:
> > 15300, 5: 33885, 6: 4050, 8: 51348, 9: 33075, 10: 24345]
> >   Successfull upcalls : 80538 [0: 11804, 2: 8382, 3: 5660, 4: 5716, 5:
> > 11606, 6: 2845, 8: 13717, 9: 10333, 10: 10475]
> >   Failed upcalls      : 161587 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 2397, 8: 37631, 9: 22742, 10: 13870]
> >      11, EAGAIN       : 162528 [0: 21361, 2: 19113, 3: 12610, 4: 9584,
> > 5: 22279, 6: 3345, 8: 37631, 9: 22742, 10: 1387
>
> I went for the above, as the CPU set might not be the same in all lines, and 
> I would like to condense this as much as possible (I was testing on 64 CPUs 
> ;).
>
> > Another option could be to print them in a table:
> >
> >      Result       TOTAL    CPU 0   CPU 2   CPU 3    CPU 4   CPU 5
> > CPU 6   CPU 8   CPU 9    CPU 10
> >  -------------- --------- ------- ------- -------- ------- -------
> > ------- ------- -------- --------
> >   TOTAL           240933   33165   27495    18270   15300   33885
> > 4050   51348    33075    24345
> >    Successful     80538    11804    8382     5660    5716   11606
> > 2845   13717    10333    10475
> >    Failed        161587    21361   19113    12610    9584   22279
> > 2397   37631    22742    13870
> >     11, EAGAIN   162528    21361   19113    12610    9584   22279
> > 3345   37631    22742     1387
> >
> >
> >
> >
> >> +    for k in upcalls.keys():
> >
> > nit: Same as above, "for k in upcalls:"
>
> Ack
>
> >> +        error = k.value
> >> +        total, per_cpu = format_per_cpu_array(upcalls, key=k)
> >> +
> >> +        if error != 0 and total == "0":
> >
> > Maybe return the total as integer to avoid this strange looking '== "0"'?
>
> Ack, yes was too much C-minded here always returning a string…
>
> >> +            continue
> >> +
> >> +        if error == 0:
> >> +            print("  Successfull upcalls : {} [{}]".format(total, 
> >> per_cpu))
> >> +
> >> +            total, per_cpu = format_per_cpu_array(upcalls, skip_key=k)
> >> +            print("  Failed upcalls      : {} [{}]".format(total, 
> >> per_cpu))
> >> +        else:
> >> +            print("    {:3}, {:13}: {} [{}]".format(error,
> >> +                                                    
> >> get_errno_short(error),
> >> +                                                    total, per_cpu))
> >> +
> >> +
> >>  #
> >>  # process_results()
> >>  #
> >> @@ -1074,7 +1188,12 @@ def process_results(syscall_events=None, 
> >> trigger_delta=None):
> >>                  indent, "TOTAL:", "", total_count, total_ns))
> >>
> >>      #
> >> -    # Print events
> >> +    # Print upcall statistics
> >> +    #
> >> +    display_upcall_results()
> >> +
> >> +    #
> >> +    # Print syscall events
> >>      #
> >>      lost_stack_traces = 0
> >>      if syscall_events:
> >> @@ -1194,6 +1313,9 @@ def main():
> >>      parser.add_argument("--skip-syscall-poll-events",
> >>                          help="Skip poll() syscalls with --syscall-events",
> >>                          action="store_true")
> >> +    parser.add_argument("--skip-upcall-stats",
> >> +                        help="Skip the collection of upcall statistics",
> >> +                        action="store_true")
> >>      parser.add_argument("--stack-trace-size",
> >>                          help="Number of unique stack traces that can be "
> >>                          "recorded, default 4096. 0 to disable",
> >> @@ -1298,6 +1420,9 @@ def main():
> >>      source = source.replace("<STACK_TRACE_ENABLED>", "true"
> >>                              if options.stack_trace_size > 0 else "false")
> >>
> >> +    source = source.replace("<INSTALL_OVS_DP_UPCALL_PROBE>", "0"
> >> +                            if options.skip_upcall_stats else "1")
> >> +
> >>      #
> >>      # Handle start/stop probes
> >>      #
> >> diff --git a/utilities/usdt-scripts/kernel_delay.rst 
> >> b/utilities/usdt-scripts/kernel_delay.rst
> >> index 0f6f916a7..41620d760 100644
> >> --- a/utilities/usdt-scripts/kernel_delay.rst
> >> +++ b/utilities/usdt-scripts/kernel_delay.rst
> >> @@ -109,6 +109,7 @@ followed by resource-specific data. Which are:
> >>  - ``THREAD STOPPED STATISTICS``
> >>  - ``HARD IRQ STATISTICS``
> >>  - ``SOFT IRQ STATISTICS``
> >> +- ``UPCALL STATISTICS``
> >>
> >>  The following sections will describe in detail what statistics they 
> >> report.
> >>
> >> @@ -187,6 +188,14 @@ number of interrupts (``COUNT``), the total time 
> >> spent in the interrupt
> >>  handler (``TOTAL ns``), and the worst-case duration (``MAX ns``).
> >>
> >>
> >> +``UPCALL STATISTICS``
> >> +~~~~~~~~~~~~~~~~~~~~~
> >> +The ``UPCALL STATISTICS`` section provides information on the number of
> >> +upcalls sent by the kernel to userspace. If any upcalls fail to be sent,
> >> +the specific return codes are recorded. Statistics are presented both as
> >> +a total and on a per-CPU basis.
> >> +
> >> +
> >>  The ``--syscall-events`` option
> >>  -------------------------------
> >>  In addition to reporting global syscall statistics in 
> >> ``SYSCALL_STATISTICS``,
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to