On 21 Feb 2024, at 16:03, Adrian Moreno wrote:

> On 2/21/24 15:49, Aaron Conole wrote:
>> Adrian Moreno <amore...@redhat.com> writes:
>>
>>> On 2/20/24 19:06, Aaron Conole wrote:
>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>
>>>>> On 19 Feb 2024, at 19:57, Aaron Conole wrote:
>>>>>
>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>
>>>>>>> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>>>>>>>
>>>>>>>> Aaron Conole <acon...@redhat.com> writes:
>>>>>>>>
>>>>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>>>>
>>>>>>>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>>>>>>>
>>>>>>>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From: Kevin Sprague <ksprague0...@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>>>>>>>> particular flow
>>>>>>>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>> issues tied to ofproto flow changes, trying to determine 
>>>>>>>>>>>>>>> deployed traffic
>>>>>>>>>>>>>>> patterns, or while debugging dynamic systems where ports come 
>>>>>>>>>>>>>>> and go.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Prior to this change, there was a lack of visibility around 
>>>>>>>>>>>>>>> flow expiration.
>>>>>>>>>>>>>>> The existing debugging infrastructure could tell us when a flow 
>>>>>>>>>>>>>>> was added to
>>>>>>>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This change introduces a USDT probe at the point where the 
>>>>>>>>>>>>>>> revalidator
>>>>>>>>>>>>>>> determines that the flow should be removed.  Additionally, we 
>>>>>>>>>>>>>>> track the
>>>>>>>>>>>>>>> reason for the flow eviction and provide that information as 
>>>>>>>>>>>>>>> well.  With
>>>>>>>>>>>>>>> this change, we can track the complete flow lifecycle for the 
>>>>>>>>>>>>>>> netlink
>>>>>>>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow 
>>>>>>>>>>>>>>> put USDT, and
>>>>>>>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and 
>>>>>>>>>>>>>>> removed from
>>>>>>>>>>>>>>> the kernel datapath.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This change only enables this information via USDT probe, so it 
>>>>>>>>>>>>>>> won't be
>>>>>>>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also included is a script 
>>>>>>>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>>>>>>>> which serves as a demonstration of how the new USDT probe might 
>>>>>>>>>>>>>>> be used
>>>>>>>>>>>>>>> going forward.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Kevin Sprague <ksprague0...@gmail.com>
>>>>>>>>>>>>>>> Co-authored-by: Aaron Conole <acon...@redhat.com>
>>>>>>>>>>>>>>> Signed-off-by: Aaron Conole <acon...@redhat.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for following this up Aaron! See comments on this patch
>>>>>>>>>> below. I have no additional comments on patch 2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Eelco
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>     Documentation/topics/usdt-probes.rst         |   1 +
>>>>>>>>>>>>>>>     ofproto/ofproto-dpif-upcall.c                |  42 +-
>>>>>>>>>>>>>>>     utilities/automake.mk                        |   3 +
>>>>>>>>>>>>>>>     utilities/usdt-scripts/flow_reval_monitor.py | 653 
>>>>>>>>>>>>>>> +++++++++++++++++++
>>>>>>>>>>>>>>>     4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>>>>>>>>>>     create mode 100755 
>>>>>>>>>>>>>>> utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst
>>> b/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>>>>>>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>>>>>>>>>>     - dpif_recv:recv_upcall
>>>>>>>>>>>>>>>     - main:poll_block
>>>>>>>>>>>>>>>     - main:run_start
>>>>>>>>>>>>>>> +- revalidate:flow_result
>>>>>>>>>>>>>>>     - revalidate_ukey\_\_:entry
>>>>>>>>>>>>>>>     - revalidate_ukey\_\_:exit
>>>>>>>>>>>>>>>     - udpif_revalidator:start_dump
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You are missing the specific flow_result result
>>> section. This is from the previous patch:
>>>>>>>>>>>>>
>>>>>>>>>>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe 
>>>>>>>>>>>>>> above.
>>>>>>>>>>>>>>     - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +probe revalidate:flow_result
>>>>>>>>>>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +**Description**:
>>>>>>>>>>>>>> +This probe is triggered when the revalidator decides whether or 
>>>>>>>>>>>>>> not to
>>>>>>>>>>>>>> +revalidate a flow. ``reason`` is an enum that denotes that 
>>>>>>>>>>>>>> either the flow
>>>>>>>>>>>>>> +is being kept, or the reason why the flow is being deleted. The
>>>>>>>>>>>>>> +``flow_reval_monitor.py`` script uses this probe to notify 
>>>>>>>>>>>>>> users when flows
>>>>>>>>>>>>>> +matching user-provided criteria are deleted.
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +**Arguments**:
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +- *arg0*: ``(enum flow_del_reason) reason``
>>>>>>>>>>>>>> +- *arg1*: ``(struct udpif *) udpif``
>>>>>>>>>>>>>> +- *arg2*: ``(struct udpif_key *) ukey``
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +**Script references**:
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>     Adding your own probes
>>>>>>>>>>>>>>     ----------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c 
>>>>>>>>>>>>>>> b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>>>>> index b5cbeed878..97d75833f7 100644
>>>>>>>>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>>>>> @@ -269,6 +269,18 @@ enum ukey_state {
>>>>>>>>>>>>>>>     };
>>>>>>>>>>>>>>>     #define N_UKEY_STATES (UKEY_DELETED + 1)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +enum flow_del_reason {
>>>>>>>>>>>>>>> +    FDR_REVALIDATE = 0,     /* The flow was revalidated. */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It was called FDR_FLOW_LIVE before, which might make more
>>>>>>>>>> sense. As the flow is just NOT deleted. It might or might not have
>>>>>>>>>> been revalidated. Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think it had to have been revalidated if we emit the reason, 
>>>>>>>>>>>>> because
>>>>>>>>>>>>> we only emit the reason code after revalidation.  IE: there are 
>>>>>>>>>>>>> many
>>>>>>>>>>>>> places where we skip revalidation but the flow stays live - and 
>>>>>>>>>>>>> we don't
>>>>>>>>>>>>> emit reasons in those cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So at least for this patch, it MUST have been revalidated.  But 
>>>>>>>>>>>>> maybe in
>>>>>>>>>>>>> the future, we would want to catch cases where the flow hasn't 
>>>>>>>>>>>>> been.  In
>>>>>>>>>>>>> that case, it makes sense to add the FDR_FLOW_LIVE at that time - 
>>>>>>>>>>>>> I
>>>>>>>>>>>>> think.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe you disagree?
>>>>>>>>>>>>
>>>>>>>>>>>> Well, it depends on how you define revalidation, it might only have
>>>>>>>>>> updated the counters. i.e. it all depends on ‘bool need_revalidate =
>>>>>>>>>> ukey->reval_seq != reval_seq;’ in revalidate_ukey(). That was why I
>>>>>>>>>> opted for a more general name.
>>>>>>>>>>>>
>>>>>>>>>>>>>>> +    FDR_FLOW_IDLE,          /* The flow went unused and was 
>>>>>>>>>>>>>>> deleted. */
>>>>>>>>>>>>>>> +    FDR_TOO_EXPENSIVE,      /* The flow was too expensive to 
>>>>>>>>>>>>>>> revalidate. */
>>>>>>>>>>>>>>> +    FDR_FLOW_WILDCARDED,    /* The flow needed a narrower 
>>>>>>>>>>>>>>> wildcard mask. */
>>>>>>>>>>>>>>> +    FDR_BAD_ODP_FIT,        /* The flow had a bad ODP flow 
>>>>>>>>>>>>>>> fit. */
>>>>>>>>>>>>>>> +    FDR_NO_OFPROTO,         /* The flow didn't have an 
>>>>>>>>>>>>>>> associated ofproto. */
>>>>>>>>>>>>>>> +    FDR_XLATION_ERROR,      /* There was an error translating 
>>>>>>>>>>>>>>> the flow. */
>>>>>>>>>>>>>>> +    FDR_AVOID_CACHING,      /* Flow deleted to avoid caching. 
>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>> +    FDR_FLOW_LIMIT,         /* All flows being killed. */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking at the comment from Han on FDR_PURGE, and this patch
>>>>>>>>>> needing another spin, we should probably add it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can do that, sure.  In that case, we will need to have a new 
>>>>>>>>>>>>> flow op
>>>>>>>>>>>>> added to revalidator_sweep__ so that we can catch it.  But in 
>>>>>>>>>>>>> that case,
>>>>>>>>>>>>> it will be a different usdt probe, so I still don't know if we 
>>>>>>>>>>>>> need
>>>>>>>>>>>>> FDR_PURGE right?  WDYT?
>>>>>>>>>>>>
>>>>>>>>>>>> In revalidator_sweep__() you have sort of the following:
>>>>>>>>>>>>
>>>>>>>>>>>>                    if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>>>>>>>                        result = UKEY_DELETE;
>>>>>>>>>>>>                    } else if (!seq_mismatch) {
>>>>>>>>>>>>
>>>>>>>>>>>> And I’m afraid that if we use this tool to debug we miss the
>>>>>>>>>> ukey_state == UKEY_INCONSISTENT when debugging and spent a long time
>>>>>>>>>> figuring this out.
>>>>>>>>>>>> Maybe add something general like this (did not give it a lot of
>>>>>>>>>> thought), and only take the FDR_PURGE : FDR_UPDATE_FAIL results in 
>>>>>>>>>> the
>>>>>>>>>> script?
>>>>>>>>>>>>
>>>>>>>>>>>>     /* 'udpif_key's are responsible for tracking the little bit of 
>>>>>>>>>>>> state udpif
>>>>>>>>>>>> @@ -2991,13 +2993,13 @@ revalidator_sweep__(struct
>>> revalidator *revalidator, bool purge)
>>>>>>>>>>>>             uint64_t odp_actions_stub[1024 / 8];
>>>>>>>>>>>>             struct ofpbuf odp_actions = 
>>>>>>>>>>>> OFPBUF_STUB_INITIALIZER(odp_actions_stub);
>>>>>>>>>>>>
>>>>>>>>>>>> -        enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>>>             struct ukey_op ops[REVALIDATE_MAX_BATCH];
>>>>>>>>>>>>             struct udpif_key *ukey;
>>>>>>>>>>>>             struct umap *umap = &udpif->ukeys[i];
>>>>>>>>>>>>             size_t n_ops = 0;
>>>>>>>>>>>>
>>>>>>>>>>>>             CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>>>>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>>>                 enum ukey_state ukey_state;
>>>>>>>>>>>>
>>>>>>>>>>>>                 /* Handler threads could be holding a ukey lock 
>>>>>>>>>>>> while it installs a
>>>>>>>>>>>> @@ -3016,8 +3018,10 @@ revalidator_sweep__(struct
>>> revalidator *revalidator, bool purge)
>>>>>>>>>>>>
>>>>>>>>>>>>                     if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>>>>>>>                         result = UKEY_DELETE;
>>>>>>>>>>>> +                    reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>>>>>>>>>>>>                     } else if (!seq_mismatch) {
>>>>>>>>>>>>                         result = UKEY_KEEP;
>>>>>>>>>>>> +                    reason = FDR_REVALIDATE; //_KEEP
>>>>>>>>>>>>                     } else {
>>>>>>>>>>>>                         struct dpif_flow_stats stats;
>>>>>>>>>>>>                         COVERAGE_INC(revalidate_missed_dp_flow);
>>>>>>>>>>>> @@ -3030,6 +3034,8 @@ revalidator_sweep__(struct revalidator 
>>>>>>>>>>>> *revalidator, bool purge)
>>>>>>>>>>>>                         reval_op_init(&ops[n_ops++], result, 
>>>>>>>>>>>> udpif, ukey, &recircs,
>>>>>>>>>>>>                                       &odp_actions);
>>>>>>>>>>>>                     }
>>>>>>>>>>>> +                OVS_USDT_PROBE(revalidator_sweep__, flow_result, 
>>>>>>>>>>>> result,
>>>>>>>>>>>> +                               reason, udpif, ukey);
>>>>>>>>>>>>                 }
>>>>>>>>>>>>                 ovs_mutex_unlock(&ukey->mutex);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In addition in revalidator_sweep__() should the “enum
>>>>>>>>>> flow_del_reason reason = FDR_REVALIDATE;” not be moved to the
>>>>>>>>>> CMAP_FOR_EACH() loop?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>     /* 'udpif_key's are responsible for tracking the little bit 
>>>>>>>>>>>>>>> of state udpif
>>>>>>>>>>>>>>>      * needs to do flow expiration which can't be pulled 
>>>>>>>>>>>>>>> directly from the
>>>>>>>>>>>>>>>      * datapath.  They may be created by any handler or 
>>>>>>>>>>>>>>> revalidator thread at any
>>>>>>>>>>>>>>> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, 
>>>>>>>>>>>>>>> struct udpif_key *ukey,
>>>>>>>>>>>>>>>     static enum reval_result
>>>>>>>>>>>>>>>     revalidate_ukey__(struct udpif *udpif, const struct 
>>>>>>>>>>>>>>> udpif_key *ukey,
>>>>>>>>>>>>>>>                       uint16_t tcp_flags, struct ofpbuf 
>>>>>>>>>>>>>>> *odp_actions,
>>>>>>>>>>>>>>> -                  struct recirc_refs *recircs, struct 
>>>>>>>>>>>>>>> xlate_cache *xcache)
>>>>>>>>>>>>>>> +                  struct recirc_refs *recircs, struct 
>>>>>>>>>>>>>>> xlate_cache *xcache,
>>>>>>>>>>>>>>> +                  enum flow_del_reason *reason)
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>         struct xlate_out *xoutp;
>>>>>>>>>>>>>>>         struct netflow *netflow;
>>>>>>>>>>>>>>> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>>>         netflow = NULL;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) {
>>>>>>>>>>>>>>> +        *reason = FDR_XLATION_ERROR;
>>>>>>>>>>>>>>>             goto exit;
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>         xoutp = &ctx.xout;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         if (xoutp->avoid_caching) {
>>>>>>>>>>>>>>> +        *reason = FDR_AVOID_CACHING;
>>>>>>>>>>>>>>>             goto exit;
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>>>             ofpbuf_clear(odp_actions);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             if (!ofproto) {
>>>>>>>>>>>>>>> +            *reason = FDR_NO_OFPROTO;
>>>>>>>>>>>>>>>                 goto exit;
>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>>>         if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, 
>>>>>>>>>>>>>>> &dp_mask, &ctx.flow,
>>>>>>>>>>>>>>>                                  NULL)
>>>>>>>>>>>>>>>             == ODP_FIT_ERROR) {
>>>>>>>>>>>>>>> +        *reason = FDR_BAD_ODP_FIT;
>>>>>>>>>>>>>>>             goto exit;
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>>>          * down.  Note that we do not know if the datapath has 
>>>>>>>>>>>>>>> ignored any of the
>>>>>>>>>>>>>>>          * wildcarded bits, so we may be overly conservative 
>>>>>>>>>>>>>>> here. */
>>>>>>>>>>>>>>>         if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) {
>>>>>>>>>>>>>>> +        *reason = FDR_FLOW_WILDCARDED;
>>>>>>>>>>>>>>>             goto exit;
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @@ -2400,7 +2418,7 @@ static enum reval_result
>>>>>>>>>>>>>>>     revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>>>>>>>>>>>>>>                     const struct dpif_flow_stats *stats,
>>>>>>>>>>>>>>>                     struct ofpbuf *odp_actions, uint64_t 
>>>>>>>>>>>>>>> reval_seq,
>>>>>>>>>>>>>>> -                struct recirc_refs *recircs)
>>>>>>>>>>>>>>> +                struct recirc_refs *recircs, enum 
>>>>>>>>>>>>>>> flow_del_reason *reason)
>>>>>>>>>>>>>>>         OVS_REQUIRES(ukey->mutex)
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>         bool need_revalidate = ukey->reval_seq != reval_seq;
>>>>>>>>>>>>>>> @@ -2430,8 +2448,12 @@ revalidate_ukey(struct udpif *udpif, 
>>>>>>>>>>>>>>> struct udpif_key *ukey,
>>>>>>>>>>>>>>>                     xlate_cache_clear(ukey->xcache);
>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>                 result = revalidate_ukey__(udpif, ukey, 
>>>>>>>>>>>>>>> push.tcp_flags,
>>>>>>>>>>>>>>> -                                       odp_actions, recircs, 
>>>>>>>>>>>>>>> ukey->xcache);
>>>>>>>>>>>>>>> -        } /* else delete; too expensive to revalidate */
>>>>>>>>>>>>>>> +                                       odp_actions, recircs, 
>>>>>>>>>>>>>>> ukey->xcache,
>>>>>>>>>>>>>>> +                                       reason);
>>>>>>>>>>>>>>> +        } else {
>>>>>>>>>>>>>>> +            /* delete; too expensive to revalidate */
>>>>>>>>>>>>>>> +            *reason = FDR_TOO_EXPENSIVE;
>>>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>>>         } else if (!push.n_packets || ukey->xcache
>>>>>>>>>>>>>>>                    || !populate_xcache(udpif, ukey, 
>>>>>>>>>>>>>>> push.tcp_flags)) {
>>>>>>>>>>>>>>>             result = UKEY_KEEP;
>>>>>>>>>>>>>>> @@ -2831,6 +2853,7 @@ revalidate(struct revalidator 
>>>>>>>>>>>>>>> *revalidator)
>>>>>>>>>>>>>>>             for (f = flows; f < &flows[n_dumped]; f++) {
>>>>>>>>>>>>>>>                 long long int used = f->stats.used;
>>>>>>>>>>>>>>>                 struct recirc_refs recircs = 
>>>>>>>>>>>>>>> RECIRC_REFS_EMPTY_INITIALIZER;
>>>>>>>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>>>>>>                 struct dpif_flow_stats stats = f->stats;
>>>>>>>>>>>>>>>                 enum reval_result result;
>>>>>>>>>>>>>>>                 struct udpif_key *ukey;
>>>>>>>>>>>>>>> @@ -2905,9 +2928,14 @@ revalidate(struct revalidator 
>>>>>>>>>>>>>>> *revalidator)
>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>                 if (kill_them_all || (used && used < now - 
>>>>>>>>>>>>>>> max_idle)) {
>>>>>>>>>>>>>>>                     result = UKEY_DELETE;
>>>>>>>>>>>>>>> +                if (kill_them_all) {
>>>>>>>>>>>>>>> +                    reason = FDR_FLOW_LIMIT;
>>>>>>>>>>>>>>> +                } else {
>>>>>>>>>>>>>>> +                    reason = FDR_FLOW_IDLE;
>>>>>>>>>>>>>>> +                }
>>>>>>>>>>>>>>>                 } else {
>>>>>>>>>>>>>>>                     result = revalidate_ukey(udpif, ukey, 
>>>>>>>>>>>>>>> &stats, &odp_actions,
>>>>>>>>>>>>>>> -                                         reval_seq, &recircs);
>>>>>>>>>>>>>>> +                                         reval_seq, &recircs, 
>>>>>>>>>>>>>>> &reason);
>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>                 ukey->dump_seq = dump_seq;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @@ -2916,6 +2944,7 @@ revalidate(struct revalidator 
>>>>>>>>>>>>>>> *revalidator)
>>>>>>>>>>>>>>>                     udpif_update_flow_pps(udpif, ukey, f);
>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +            OVS_USDT_PROBE(revalidate, flow_result, reason, 
>>>>>>>>>>>>>>> udpif, ukey);
>>>>>>>>>>>
>>>>>>>>>>> I have been experimenting with several upcall tracking techniques
>>>>>>>>>> that would make it easier to correlate upcalls with their subsequent
>>>>>>>>>> related events.
>>>>>>>>>>> To achieve that, we need (among other things) some easy-to-compare
>>>>>>>>>> unique value in the events. For revalidation events, I think a good
>>>>>>>>>> candidate would be "ukey->ufid" and so does the script in this patch.
>>>>>>>>>>>
>>>>>>>>>>> However, requiring all external tools to know the layout of "struct
>>>>>>>>>> udpif_key" in order to get that value makes things quite complicated
>>>>>>>>>> for CORE tools (e.g: retis).
>>>>>>>>>>>
>>>>>>>>>>> With all this, would you consider adding the ufid to probe payload 
>>>>>>>>>>> directly?
>>>>>>>>>>
>>>>>>>>>> Dont see why we can not, but if we need anything else it would 
>>>>>>>>>> quickly
>>>>>>>>>> explode in too much arguments. I guess CORE needs a good solution for
>>>>>>>>>> userspace.
>>>>>>>>>
>>>>>>>>> I think having the ufid and udpif makes sense for now.  Actually, 
>>>>>>>>> maybe
>>>>>>>>> for the long term we can do something as an appctl command which will
>>>>>>>>> dump the relevant datastructures.  That way a tool like this, which
>>>>>>>>> needs the pid anyway, can generate an appctl command against the 
>>>>>>>>> daemon
>>>>>>>>> and get the exact right header layout.
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> Update on this - I have it ready to go with the exception of this
>>>>>>>> particular change.  The USDT probe code generation seems to have a bug,
>>>>>>>> or maybe it just won't work using these kinds of data types.  For
>>>>>>>> example, the bcc tools gets confused and ends up generating code for 
>>>>>>>> the
>>>>>>>> probe like:
>>>>>>>>
>>>>>>>>       __attribute__((always_inline))
>>>>>>>> static __always_inline int _bpf_readarg_usdt__flow_result_3(struct
>>>>>>> pt_regs *ctx, void *dest, size_t len) {
>>>>>>>>         if (len != sizeof(d)) return -1;
>>>>>>>> { u64 __addr = ctx->r13 + 40; __asm__ __volatile__("": : :"memory");
>>>>>>> uint128_t __res = 0x0; bpf_probe_read_user(&__res, sizeof(__res),
>>>>>>> (void *)__addr); *((d *)dest) = __res; }
>>>>>>>>         return 0;
>>>>>>>>       }
>>>>>>>>
>>>>>>>> This code isn't valid - the (d) variable isn't properly represented
>>>>>>>> here.  I guess maybe there's some issue with the way they use ctypes
>>>>>>>> internally?  It isn't clear to me.
>>>>>>>
>>>>>>> I miss some context, but is it because you point to ufid, which is a
>>>>>>> union (typedef union ovs_u128)? What if you point to the u32 member of
>>>>>>> the union?
>>>>>>
>>>>>> I'm not sure, but that doesn't seem like the best approach to me, tbh.
>>>>>>
>>>>>>>> We could work around this by making the ufid always pass as a void *,
>>>>>>>> and passing a length that tells the size.  However, if the underlying
>>>>>>>> data type ever changes, it will be totally incompatible.  I doubt it
>>>>>>>> would ever change, but at least with something stronger typed, we could
>>>>>>>> get a better exception that bubbled up.
>>>>>>>
>>>>>>> I do not like the pointer + length, as this adds another argument. I
>>>>>>> can’t remember which, but some tools had problems with 5 or more
>>>>>>> arguments.
>>>>>>
>>>>>> I do agree.  I also want to test the latest version of bcc-tools also,
>>>>>> to see if it really is just related to my f38 version of bcc-tools.
>>>>>> That also raises a different issue, but it can be something we solve in
>>>>>> the interrim with a version check.
>>>>>>
>>>>>>>> I'm okay with doing it this way, but just wanted to make sure that
>>>>>>>> retis, et. al. would be okay with interpreting it this way.
>>>>>>>>
>>>>>>>> Alternatively, there's definitely a bug there - we could omit the
>>>>>>>> argument entirely and rely on pahole or even pyelftools to help extract
>>>>>>>> the struct layout and get the ufid from the udpif_key struct.  The data
>>>>>>>> is already being sent, so it seems like duplication a bit to send it,
>>>>>>>> and becomes difficult in light of this bcc "bug?"
>>>>>>>
>>>>>>> I agree it’s overloading arguments with data already present, but I
>>>>>>> guess CORE can not yet read debug data from userspace
>>>>>>> applications. Adrian, I thought this was being worked on with the
>>>>>>> Pahole guys. If this is close we could just leave out this extra
>>>>>>> argument for now.
>>>>>>
>>>>>> That should be something doable, I think.  For example, there are crates
>>>>>> like gimli that parse DWARF data (but not sure if it is parsing v5,
>>>>>> yet).
>>>>>>
>>>>>>> Alternatively, we could ask the BCC guys if this is a known issue,
>>>>>>> with maybe a potential workaround. Just post it here
>>>>>>> https://github.com/iovisor/bcc/issues the respond quickly.
>>>>>>
>>>>>> They will tell to run with the latest version, I think.  The version I'm
>>>>>> using is what ships with Fedora 38, and seems quite out of date.
>>>>>
>>>>> ACK, will wait for your conclusion and the v9.
>>>> After tying to get it working with latest bcc-tools, the behavior is
>>>> actually much worse.  Whereas before the flow_reval_monitor script would
>>>> fail during the bpf compilation (with a clear error message), the latest
>>>> BCC tools will crash the python interpreter.
>>>> I'm going to omit the ufid details.  I think CO-RE will have support
>>>> anyway for this kinds of output, so v9 I will keep the args as-is and
>>>> when the support gets added to decode the debuginfo packages to pull
>>>> those structures, they will get the support.
>>>>
>>>
>>> Sorry, maybe I missed something. What support will CO-RE have?
>>
>> I guess, my understanding was the pahole work would be the thing CO-RE
>> needs, but after reading your comments, maybe that isn't true.
>>
>
> Doing CO-RE with some BTF generated by pahole from a debug file is just 
> possible in theory. It looks doable by using libbpf directly and even there, 
> CO-RE was meant to be used on the vmlinux (or module) BTF. I haven't tried it 
> yet.
> Common ebpf frameworks such as aya, cillim/bpf, etc don't seem to expose that
> functionality AFAICS.
>
> So, not putting the ufid makes things difficult for CO-RE tools for the 
> forseable future.
>
> Also, Eelco, could you expand on the 5 argument limitation? I see 
> revalidate_ukey__:entry has 6 arguments and seems to work.

The only thing I could remember is that I had problems with some tooling, but 
can’t recall which. So it might have been related to the tool, and not the 
actual tracepoint insertion part.

> OTOH, although the ufid is variable-length, could we assume that at most it's 
> 128 bits? That's what the uAPI states. If so, could we send a raw pointer and 
> assume that length? I don't know any reason why OVS will at some point reduce 
> the size but if it does, it's likely that a 128bit buffer will still be used 
> (istead of using dynamic memory).

I feel like duplicating arguments to make it easier for external tools, sounds 
like a bad design principle.

> Sorry for insisting, it's just that I would really like to consume this from 
> the retis side.

How far away are we from dwarf to BTF, which then can be used by CO-RE? Maybe 
we should try this first, and if this is a dead-end we can re-visit and add the 
extra argument. If we add the argument now, we can not remove it later.

//Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to