On 20 Feb 2024, at 19:10, Adrian Moreno wrote:

> On 2/20/24 13:43, Eelco Chaudron wrote:
>>
>>
>> On 19 Feb 2024, at 19:57, Aaron Conole wrote:
>>
>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>
>>>> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>>>>
>>>>> Aaron Conole <acon...@redhat.com> writes:
>>>>>
>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>
>>>>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>>>>
>>>>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>>>>
>>>>>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>>>>>
>>>>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>>>>
>>>>>>>>>>>> From: Kevin Sprague <ksprague0...@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>>>>> particular flow
>>>>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>>>>> performance
>>>>>>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>>>>>>> traffic
>>>>>>>>>>>> patterns, or while debugging dynamic systems where ports come and 
>>>>>>>>>>>> go.
>>>>>>>>>>>>
>>>>>>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>>>>>>> expiration.
>>>>>>>>>>>> The existing debugging infrastructure could tell us when a flow 
>>>>>>>>>>>> was added to
>>>>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>>>>
>>>>>>>>>>>> This change introduces a USDT probe at the point where the 
>>>>>>>>>>>> revalidator
>>>>>>>>>>>> determines that the flow should be removed.  Additionally, we 
>>>>>>>>>>>> track the
>>>>>>>>>>>> reason for the flow eviction and provide that information as well. 
>>>>>>>>>>>>  With
>>>>>>>>>>>> this change, we can track the complete flow lifecycle for the 
>>>>>>>>>>>> netlink
>>>>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put 
>>>>>>>>>>>> USDT, and
>>>>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and 
>>>>>>>>>>>> removed from
>>>>>>>>>>>> the kernel datapath.
>>>>>>>>>>>>
>>>>>>>>>>>> This change only enables this information via USDT probe, so it 
>>>>>>>>>>>> won't be
>>>>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>>>>
>>>>>>>>>>>> Also included is a script 
>>>>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>>>>> which serves as a demonstration of how the new USDT probe might be 
>>>>>>>>>>>> used
>>>>>>>>>>>> going forward.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Kevin Sprague <ksprague0...@gmail.com>
>>>>>>>>>>>> Co-authored-by: Aaron Conole <acon...@redhat.com>
>>>>>>>>>>>> Signed-off-by: Aaron Conole <acon...@redhat.com>
>>>>>>>>>>>
>>>>>>>>>>> Thanks for following this up Aaron! See comments on this patch
>>>>>>> below. I have no additional comments on patch 2.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Eelco
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>>    Documentation/topics/usdt-probes.rst         |   1 +
>>>>>>>>>>>>    ofproto/ofproto-dpif-upcall.c                |  42 +-
>>>>>>>>>>>>    utilities/automake.mk                        |   3 +
>>>>>>>>>>>>    utilities/usdt-scripts/flow_reval_monitor.py | 653 
>>>>>>>>>>>> +++++++++++++++++++
>>>>>>>>>>>>    4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>>>>>>>    create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>>>>>>>>> b/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>>>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>>>>>>>    - dpif_recv:recv_upcall
>>>>>>>>>>>>    - main:poll_block
>>>>>>>>>>>>    - main:run_start
>>>>>>>>>>>> +- revalidate:flow_result
>>>>>>>>>>>>    - revalidate_ukey\_\_:entry
>>>>>>>>>>>>    - revalidate_ukey\_\_:exit
>>>>>>>>>>>>    - udpif_revalidator:start_dump
>>>>>>>>>>>
>>>>>>>>>>> You are missing the specific flow_result result section. This is 
>>>>>>>>>>> from the previous patch:
>>>>>>>>>>
>>>>>>>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>>>>>>>
>>>>>>>>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>>>>>>>>    - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> +probe revalidate:flow_result
>>>>>>>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>> +
>>>>>>>>>>> +**Description**:
>>>>>>>>>>> +This probe is triggered when the revalidator decides whether or 
>>>>>>>>>>> not to
>>>>>>>>>>> +revalidate a flow. ``reason`` is an enum that denotes that either 
>>>>>>>>>>> the flow
>>>>>>>>>>> +is being kept, or the reason why the flow is being deleted. The
>>>>>>>>>>> +``flow_reval_monitor.py`` script uses this probe to notify users 
>>>>>>>>>>> when flows
>>>>>>>>>>> +matching user-provided criteria are deleted.
>>>>>>>>>>> +
>>>>>>>>>>> +**Arguments**:
>>>>>>>>>>> +
>>>>>>>>>>> +- *arg0*: ``(enum flow_del_reason) reason``
>>>>>>>>>>> +- *arg1*: ``(struct udpif *) udpif``
>>>>>>>>>>> +- *arg2*: ``(struct udpif_key *) ukey``
>>>>>>>>>>> +
>>>>>>>>>>> +**Script references**:
>>>>>>>>>>> +
>>>>>>>>>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>>>>>>>>>>> +
>>>>>>>>>>> +
>>>>>>>>>>>    Adding your own probes
>>>>>>>>>>>    ----------------------
>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c 
>>>>>>>>>>>> b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>> index b5cbeed878..97d75833f7 100644
>>>>>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>> @@ -269,6 +269,18 @@ enum ukey_state {
>>>>>>>>>>>>    };
>>>>>>>>>>>>    #define N_UKEY_STATES (UKEY_DELETED + 1)
>>>>>>>>>>>>
>>>>>>>>>>>> +enum flow_del_reason {
>>>>>>>>>>>> +    FDR_REVALIDATE = 0,     /* The flow was revalidated. */
>>>>>>>>>>>
>>>>>>>>>>> It was called FDR_FLOW_LIVE before, which might make more
>>>>>>> sense. As the flow is just NOT deleted. It might or might not have
>>>>>>> been revalidated. Thoughts?
>>>>>>>>>>
>>>>>>>>>> I think it had to have been revalidated if we emit the reason, 
>>>>>>>>>> because
>>>>>>>>>> we only emit the reason code after revalidation.  IE: there are many
>>>>>>>>>> places where we skip revalidation but the flow stays live - and we 
>>>>>>>>>> don't
>>>>>>>>>> emit reasons in those cases.
>>>>>>>>>>
>>>>>>>>>> So at least for this patch, it MUST have been revalidated.  But 
>>>>>>>>>> maybe in
>>>>>>>>>> the future, we would want to catch cases where the flow hasn't been. 
>>>>>>>>>>  In
>>>>>>>>>> that case, it makes sense to add the FDR_FLOW_LIVE at that time - I
>>>>>>>>>> think.
>>>>>>>>>>
>>>>>>>>>> Maybe you disagree?
>>>>>>>>>
>>>>>>>>> Well, it depends on how you define revalidation, it might only have
>>>>>>> updated the counters. i.e. it all depends on ‘bool need_revalidate =
>>>>>>> ukey->reval_seq != reval_seq;’ in revalidate_ukey(). That was why I
>>>>>>> opted for a more general name.
>>>>>>>>>
>>>>>>>>>>>> +    FDR_FLOW_IDLE,          /* The flow went unused and was 
>>>>>>>>>>>> deleted. */
>>>>>>>>>>>> +    FDR_TOO_EXPENSIVE,      /* The flow was too expensive to 
>>>>>>>>>>>> revalidate. */
>>>>>>>>>>>> +    FDR_FLOW_WILDCARDED,    /* The flow needed a narrower 
>>>>>>>>>>>> wildcard mask. */
>>>>>>>>>>>> +    FDR_BAD_ODP_FIT,        /* The flow had a bad ODP flow fit. */
>>>>>>>>>>>> +    FDR_NO_OFPROTO,         /* The flow didn't have an associated 
>>>>>>>>>>>> ofproto. */
>>>>>>>>>>>> +    FDR_XLATION_ERROR,      /* There was an error translating the 
>>>>>>>>>>>> flow. */
>>>>>>>>>>>> +    FDR_AVOID_CACHING,      /* Flow deleted to avoid caching. */
>>>>>>>>>>>> +    FDR_FLOW_LIMIT,         /* All flows being killed. */
>>>>>>>>>>>
>>>>>>>>>>> Looking at the comment from Han on FDR_PURGE, and this patch
>>>>>>> needing another spin, we should probably add it.
>>>>>>>>>>
>>>>>>>>>> I can do that, sure.  In that case, we will need to have a new flow 
>>>>>>>>>> op
>>>>>>>>>> added to revalidator_sweep__ so that we can catch it.  But in that 
>>>>>>>>>> case,
>>>>>>>>>> it will be a different usdt probe, so I still don't know if we need
>>>>>>>>>> FDR_PURGE right?  WDYT?
>>>>>>>>>
>>>>>>>>> In revalidator_sweep__() you have sort of the following:
>>>>>>>>>
>>>>>>>>>                   if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>>>>                       result = UKEY_DELETE;
>>>>>>>>>                   } else if (!seq_mismatch) {
>>>>>>>>>
>>>>>>>>> And I’m afraid that if we use this tool to debug we miss the
>>>>>>> ukey_state == UKEY_INCONSISTENT when debugging and spent a long time
>>>>>>> figuring this out.
>>>>>>>>> Maybe add something general like this (did not give it a lot of
>>>>>>> thought), and only take the FDR_PURGE : FDR_UPDATE_FAIL results in the
>>>>>>> script?
>>>>>>>>>
>>>>>>>>>    /* 'udpif_key's are responsible for tracking the little bit of 
>>>>>>>>> state udpif
>>>>>>>>> @@ -2991,13 +2993,13 @@ revalidator_sweep__(struct revalidator 
>>>>>>>>> *revalidator, bool purge)
>>>>>>>>>            uint64_t odp_actions_stub[1024 / 8];
>>>>>>>>>            struct ofpbuf odp_actions = 
>>>>>>>>> OFPBUF_STUB_INITIALIZER(odp_actions_stub);
>>>>>>>>>
>>>>>>>>> -        enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>            struct ukey_op ops[REVALIDATE_MAX_BATCH];
>>>>>>>>>            struct udpif_key *ukey;
>>>>>>>>>            struct umap *umap = &udpif->ukeys[i];
>>>>>>>>>            size_t n_ops = 0;
>>>>>>>>>
>>>>>>>>>            CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>                enum ukey_state ukey_state;
>>>>>>>>>
>>>>>>>>>                /* Handler threads could be holding a ukey lock while 
>>>>>>>>> it installs a
>>>>>>>>> @@ -3016,8 +3018,10 @@ revalidator_sweep__(struct revalidator 
>>>>>>>>> *revalidator, bool purge)
>>>>>>>>>
>>>>>>>>>                    if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>>>>                        result = UKEY_DELETE;
>>>>>>>>> +                    reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>>>>>>>>>                    } else if (!seq_mismatch) {
>>>>>>>>>                        result = UKEY_KEEP;
>>>>>>>>> +                    reason = FDR_REVALIDATE; //_KEEP
>>>>>>>>>                    } else {
>>>>>>>>>                        struct dpif_flow_stats stats;
>>>>>>>>>                        COVERAGE_INC(revalidate_missed_dp_flow);
>>>>>>>>> @@ -3030,6 +3034,8 @@ revalidator_sweep__(struct revalidator 
>>>>>>>>> *revalidator, bool purge)
>>>>>>>>>                        reval_op_init(&ops[n_ops++], result, udpif, 
>>>>>>>>> ukey, &recircs,
>>>>>>>>>                                      &odp_actions);
>>>>>>>>>                    }
>>>>>>>>> +                OVS_USDT_PROBE(revalidator_sweep__, flow_result, 
>>>>>>>>> result,
>>>>>>>>> +                               reason, udpif, ukey);
>>>>>>>>>                }
>>>>>>>>>                ovs_mutex_unlock(&ukey->mutex);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In addition in revalidator_sweep__() should the “enum
>>>>>>> flow_del_reason reason = FDR_REVALIDATE;” not be moved to the
>>>>>>> CMAP_FOR_EACH() loop?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> +};
>>>>>>>>>>>> +
>>>>>>>>>>>>    /* 'udpif_key's are responsible for tracking the little bit of 
>>>>>>>>>>>> state udpif
>>>>>>>>>>>>     * needs to do flow expiration which can't be pulled directly 
>>>>>>>>>>>> from the
>>>>>>>>>>>>     * datapath.  They may be created by any handler or revalidator 
>>>>>>>>>>>> thread at any
>>>>>>>>>>>> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct 
>>>>>>>>>>>> udpif_key *ukey,
>>>>>>>>>>>>    static enum reval_result
>>>>>>>>>>>>    revalidate_ukey__(struct udpif *udpif, const struct udpif_key 
>>>>>>>>>>>> *ukey,
>>>>>>>>>>>>                      uint16_t tcp_flags, struct ofpbuf 
>>>>>>>>>>>> *odp_actions,
>>>>>>>>>>>> -                  struct recirc_refs *recircs, struct xlate_cache 
>>>>>>>>>>>> *xcache)
>>>>>>>>>>>> +                  struct recirc_refs *recircs, struct xlate_cache 
>>>>>>>>>>>> *xcache,
>>>>>>>>>>>> +                  enum flow_del_reason *reason)
>>>>>>>>>>>>    {
>>>>>>>>>>>>        struct xlate_out *xoutp;
>>>>>>>>>>>>        struct netflow *netflow;
>>>>>>>>>>>> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>        netflow = NULL;
>>>>>>>>>>>>
>>>>>>>>>>>>        if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) {
>>>>>>>>>>>> +        *reason = FDR_XLATION_ERROR;
>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>        }
>>>>>>>>>>>>        xoutp = &ctx.xout;
>>>>>>>>>>>>
>>>>>>>>>>>>        if (xoutp->avoid_caching) {
>>>>>>>>>>>> +        *reason = FDR_AVOID_CACHING;
>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>            ofpbuf_clear(odp_actions);
>>>>>>>>>>>>
>>>>>>>>>>>>            if (!ofproto) {
>>>>>>>>>>>> +            *reason = FDR_NO_OFPROTO;
>>>>>>>>>>>>                goto exit;
>>>>>>>>>>>>            }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>        if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, 
>>>>>>>>>>>> &dp_mask, &ctx.flow,
>>>>>>>>>>>>                                 NULL)
>>>>>>>>>>>>            == ODP_FIT_ERROR) {
>>>>>>>>>>>> +        *reason = FDR_BAD_ODP_FIT;
>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>         * down.  Note that we do not know if the datapath has 
>>>>>>>>>>>> ignored any of the
>>>>>>>>>>>>         * wildcarded bits, so we may be overly conservative here. 
>>>>>>>>>>>> */
>>>>>>>>>>>>        if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) {
>>>>>>>>>>>> +        *reason = FDR_FLOW_WILDCARDED;
>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -2400,7 +2418,7 @@ static enum reval_result
>>>>>>>>>>>>    revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>>>>>>>>>>>                    const struct dpif_flow_stats *stats,
>>>>>>>>>>>>                    struct ofpbuf *odp_actions, uint64_t reval_seq,
>>>>>>>>>>>> -                struct recirc_refs *recircs)
>>>>>>>>>>>> +                struct recirc_refs *recircs, enum flow_del_reason 
>>>>>>>>>>>> *reason)
>>>>>>>>>>>>        OVS_REQUIRES(ukey->mutex)
>>>>>>>>>>>>    {
>>>>>>>>>>>>        bool need_revalidate = ukey->reval_seq != reval_seq;
>>>>>>>>>>>> @@ -2430,8 +2448,12 @@ revalidate_ukey(struct udpif *udpif, struct 
>>>>>>>>>>>> udpif_key *ukey,
>>>>>>>>>>>>                    xlate_cache_clear(ukey->xcache);
>>>>>>>>>>>>                }
>>>>>>>>>>>>                result = revalidate_ukey__(udpif, ukey, 
>>>>>>>>>>>> push.tcp_flags,
>>>>>>>>>>>> -                                       odp_actions, recircs, 
>>>>>>>>>>>> ukey->xcache);
>>>>>>>>>>>> -        } /* else delete; too expensive to revalidate */
>>>>>>>>>>>> +                                       odp_actions, recircs, 
>>>>>>>>>>>> ukey->xcache,
>>>>>>>>>>>> +                                       reason);
>>>>>>>>>>>> +        } else {
>>>>>>>>>>>> +            /* delete; too expensive to revalidate */
>>>>>>>>>>>> +            *reason = FDR_TOO_EXPENSIVE;
>>>>>>>>>>>> +        }
>>>>>>>>>>>>        } else if (!push.n_packets || ukey->xcache
>>>>>>>>>>>>                   || !populate_xcache(udpif, ukey, 
>>>>>>>>>>>> push.tcp_flags)) {
>>>>>>>>>>>>            result = UKEY_KEEP;
>>>>>>>>>>>> @@ -2831,6 +2853,7 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>>>>            for (f = flows; f < &flows[n_dumped]; f++) {
>>>>>>>>>>>>                long long int used = f->stats.used;
>>>>>>>>>>>>                struct recirc_refs recircs = 
>>>>>>>>>>>> RECIRC_REFS_EMPTY_INITIALIZER;
>>>>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>>>                struct dpif_flow_stats stats = f->stats;
>>>>>>>>>>>>                enum reval_result result;
>>>>>>>>>>>>                struct udpif_key *ukey;
>>>>>>>>>>>> @@ -2905,9 +2928,14 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>>>>                }
>>>>>>>>>>>>                if (kill_them_all || (used && used < now - 
>>>>>>>>>>>> max_idle)) {
>>>>>>>>>>>>                    result = UKEY_DELETE;
>>>>>>>>>>>> +                if (kill_them_all) {
>>>>>>>>>>>> +                    reason = FDR_FLOW_LIMIT;
>>>>>>>>>>>> +                } else {
>>>>>>>>>>>> +                    reason = FDR_FLOW_IDLE;
>>>>>>>>>>>> +                }
>>>>>>>>>>>>                } else {
>>>>>>>>>>>>                    result = revalidate_ukey(udpif, ukey, &stats, 
>>>>>>>>>>>> &odp_actions,
>>>>>>>>>>>> -                                         reval_seq, &recircs);
>>>>>>>>>>>> +                                         reval_seq, &recircs, 
>>>>>>>>>>>> &reason);
>>>>>>>>>>>>                }
>>>>>>>>>>>>                ukey->dump_seq = dump_seq;
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -2916,6 +2944,7 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>>>>                    udpif_update_flow_pps(udpif, ukey, f);
>>>>>>>>>>>>                }
>>>>>>>>>>>>
>>>>>>>>>>>> +            OVS_USDT_PROBE(revalidate, flow_result, reason, 
>>>>>>>>>>>> udpif, ukey);
>>>>>>>>
>>>>>>>> I have been experimenting with several upcall tracking techniques
>>>>>>> that would make it easier to correlate upcalls with their subsequent
>>>>>>> related events.
>>>>>>>> To achieve that, we need (among other things) some easy-to-compare
>>>>>>> unique value in the events. For revalidation events, I think a good
>>>>>>> candidate would be "ukey->ufid" and so does the script in this patch.
>>>>>>>>
>>>>>>>> However, requiring all external tools to know the layout of "struct
>>>>>>> udpif_key" in order to get that value makes things quite complicated
>>>>>>> for CORE tools (e.g: retis).
>>>>>>>>
>>>>>>>> With all this, would you consider adding the ufid to probe payload 
>>>>>>>> directly?
>>>>>>>
>>>>>>> Dont see why we can not, but if we need anything else it would quickly
>>>>>>> explode in too much arguments. I guess CORE needs a good solution for
>>>>>>> userspace.
>>>>>>
>>>>>> I think having the ufid and udpif makes sense for now.  Actually, maybe
>>>>>> for the long term we can do something as an appctl command which will
>>>>>> dump the relevant datastructures.  That way a tool like this, which
>>>>>> needs the pid anyway, can generate an appctl command against the daemon
>>>>>> and get the exact right header layout.
>>>>>>
>>>>>> WDYT?
>>>>>
>>>>> Update on this - I have it ready to go with the exception of this
>>>>> particular change.  The USDT probe code generation seems to have a bug,
>>>>> or maybe it just won't work using these kinds of data types.  For
>>>>> example, the bcc tools gets confused and ends up generating code for the
>>>>> probe like:
>>>>>
>>>>>      __attribute__((always_inline))
>>>>> static __always_inline int _bpf_readarg_usdt__flow_result_3(struct
>>>> pt_regs *ctx, void *dest, size_t len) {
>>>>>        if (len != sizeof(d)) return -1;
>>>>> { u64 __addr = ctx->r13 + 40; __asm__ __volatile__("": : :"memory");
>>>> uint128_t __res = 0x0; bpf_probe_read_user(&__res, sizeof(__res),
>>>> (void *)__addr); *((d *)dest) = __res; }
>>>>>        return 0;
>>>>>      }
>>>>>
>>>>> This code isn't valid - the (d) variable isn't properly represented
>>>>> here.  I guess maybe there's some issue with the way they use ctypes
>>>>> internally?  It isn't clear to me.
>>>>
>>>> I miss some context, but is it because you point to ufid, which is a
>>>> union (typedef union ovs_u128)? What if you point to the u32 member of
>>>> the union?
>>>
>>> I'm not sure, but that doesn't seem like the best approach to me, tbh.
>>>
>>>>> We could work around this by making the ufid always pass as a void *,
>>>>> and passing a length that tells the size.  However, if the underlying
>>>>> data type ever changes, it will be totally incompatible.  I doubt it
>>>>> would ever change, but at least with something stronger typed, we could
>>>>> get a better exception that bubbled up.
>>>>
>>>> I do not like the pointer + length, as this adds another argument. I
>>>> can’t remember which, but some tools had problems with 5 or more
>>>> arguments.
>>>
>
> Was that part of BCC or some other tooling? IIUC, sdt.h supports up to 12 
> arguments.

Cant remember :( It was either BCC, perf, or bpftrace (but to busy to test it 
right now)

>>> I do agree.  I also want to test the latest version of bcc-tools also,
>>> to see if it really is just related to my f38 version of bcc-tools.
>>> That also raises a different issue, but it can be something we solve in
>>> the interrim with a version check.
>>>
>>>>> I'm okay with doing it this way, but just wanted to make sure that
>>>>> retis, et. al. would be okay with interpreting it this way.
>>>>>
>>>>> Alternatively, there's definitely a bug there - we could omit the
>>>>> argument entirely and rely on pahole or even pyelftools to help extract
>>>>> the struct layout and get the ufid from the udpif_key struct.  The data
>>>>> is already being sent, so it seems like duplication a bit to send it,
>>>>> and becomes difficult in light of this bcc "bug?"
>>>>
>>>> I agree it’s overloading arguments with data already present, but I
>>>> guess CORE can not yet read debug data from userspace
>>>> applications. Adrian, I thought this was being worked on with the
>>>> Pahole guys. If this is close we could just leave out this extra
>>>> argument for now.
>>>
>
> I synced with Arnaldo. It's still not in the main branch, but the plan is to 
> add it. Although this alternative (generating a compile-able header using 
> pahole) is nice and easy for BCC tools, it doesn't make things easy for CO-RE 
> tools (e.g: retis).

Thanks for doing this!

//Eelco

>>> That should be something doable, I think.  For example, there are crates
>>> like gimli that parse DWARF data (but not sure if it is parsing v5,
>>> yet).
>>>
>>>> Alternatively, we could ask the BCC guys if this is a known issue,
>>>> with maybe a potential workaround. Just post it here
>>>> https://github.com/iovisor/bcc/issues the respond quickly.
>>>
>>> They will tell to run with the latest version, I think.  The version I'm
>>> using is what ships with Fedora 38, and seems quite out of date.
>>
>> ACK, will wait for your conclusion and the v9.
>>
>> //Eelco
>>
>>>> <SNIP>
>>
>
> -- 
> Adrián Moreno

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to