Eelco Chaudron <[email protected]> writes:

> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>
>> Aaron Conole <[email protected]> writes:
>>
>>> Eelco Chaudron <[email protected]> writes:
>>>
>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>
>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>
>>>>>>
>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>
>>>>>>> Eelco Chaudron <[email protected]> writes:
>>>>>>>
>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>
>>>>>>>>> From: Kevin Sprague <[email protected]>
>>>>>>>>>
>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>> particular flow
>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>> performance
>>>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>>>> traffic
>>>>>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>>>>>
>>>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>>>> expiration.
>>>>>>>>> The existing debugging infrastructure could tell us when a flow was 
>>>>>>>>> added to
>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>
>>>>>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>>>>>> determines that the flow should be removed.  Additionally, we track 
>>>>>>>>> the
>>>>>>>>> reason for the flow eviction and provide that information as well.  
>>>>>>>>> With
>>>>>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put 
>>>>>>>>> USDT, and
>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and removed 
>>>>>>>>> from
>>>>>>>>> the kernel datapath.
>>>>>>>>>
>>>>>>>>> This change only enables this information via USDT probe, so it won't 
>>>>>>>>> be
>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>
>>>>>>>>> Also included is a script 
>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>> which serves as a demonstration of how the new USDT probe might be 
>>>>>>>>> used
>>>>>>>>> going forward.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Kevin Sprague <[email protected]>
>>>>>>>>> Co-authored-by: Aaron Conole <[email protected]>
>>>>>>>>> Signed-off-by: Aaron Conole <[email protected]>
>>>>>>>>
>>>>>>>> Thanks for following this up Aaron! See comments on this patch
>>>> below. I have no additional comments on patch 2.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Eelco
>>>>>>>>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>   Documentation/topics/usdt-probes.rst         |   1 +
>>>>>>>>>   ofproto/ofproto-dpif-upcall.c                |  42 +-
>>>>>>>>>   utilities/automake.mk                        |   3 +
>>>>>>>>>   utilities/usdt-scripts/flow_reval_monitor.py | 653 
>>>>>>>>> +++++++++++++++++++
>>>>>>>>>   4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>>>>   create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>>>>
>>>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>>>>>> b/Documentation/topics/usdt-probes.rst
>>>>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>>>>   - dpif_recv:recv_upcall
>>>>>>>>>   - main:poll_block
>>>>>>>>>   - main:run_start
>>>>>>>>> +- revalidate:flow_result
>>>>>>>>>   - revalidate_ukey\_\_:entry
>>>>>>>>>   - revalidate_ukey\_\_:exit
>>>>>>>>>   - udpif_revalidator:start_dump
>>>>>>>>
>>>>>>>> You are missing the specific flow_result result section. This is from 
>>>>>>>> the previous patch:
>>>>>>>
>>>>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>>>>
>>>>>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>>>>>   - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>>>>>
>>>>>>>>
>>>>>>>> +probe revalidate:flow_result
>>>>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>> +
>>>>>>>> +**Description**:
>>>>>>>> +This probe is triggered when the revalidator decides whether or not to
>>>>>>>> +revalidate a flow. ``reason`` is an enum that denotes that either the 
>>>>>>>> flow
>>>>>>>> +is being kept, or the reason why the flow is being deleted. The
>>>>>>>> +``flow_reval_monitor.py`` script uses this probe to notify users when 
>>>>>>>> flows
>>>>>>>> +matching user-provided criteria are deleted.
>>>>>>>> +
>>>>>>>> +**Arguments**:
>>>>>>>> +
>>>>>>>> +- *arg0*: ``(enum flow_del_reason) reason``
>>>>>>>> +- *arg1*: ``(struct udpif *) udpif``
>>>>>>>> +- *arg2*: ``(struct udpif_key *) ukey``
>>>>>>>> +
>>>>>>>> +**Script references**:
>>>>>>>> +
>>>>>>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>>>>>>>> +
>>>>>>>> +
>>>>>>>>   Adding your own probes
>>>>>>>>   ----------------------
>>>>>>>>
>>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c 
>>>>>>>>> b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>> index b5cbeed878..97d75833f7 100644
>>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>> @@ -269,6 +269,18 @@ enum ukey_state {
>>>>>>>>>   };
>>>>>>>>>   #define N_UKEY_STATES (UKEY_DELETED + 1)
>>>>>>>>>
>>>>>>>>> +enum flow_del_reason {
>>>>>>>>> +    FDR_REVALIDATE = 0,     /* The flow was revalidated. */
>>>>>>>>
>>>>>>>> It was called FDR_FLOW_LIVE before, which might make more
>>>> sense. As the flow is just NOT deleted. It might or might not have
>>>> been revalidated. Thoughts?
>>>>>>>
>>>>>>> I think it had to have been revalidated if we emit the reason, because
>>>>>>> we only emit the reason code after revalidation.  IE: there are many
>>>>>>> places where we skip revalidation but the flow stays live - and we don't
>>>>>>> emit reasons in those cases.
>>>>>>>
>>>>>>> So at least for this patch, it MUST have been revalidated.  But maybe in
>>>>>>> the future, we would want to catch cases where the flow hasn't been.  In
>>>>>>> that case, it makes sense to add the FDR_FLOW_LIVE at that time - I
>>>>>>> think.
>>>>>>>
>>>>>>> Maybe you disagree?
>>>>>>
>>>>>> Well, it depends on how you define revalidation, it might only have
>>>> updated the counters. i.e. it all depends on ‘bool need_revalidate =
>>>> ukey->reval_seq != reval_seq;’ in revalidate_ukey(). That was why I
>>>> opted for a more general name.
>>>>>>
>>>>>>>>> +    FDR_FLOW_IDLE,          /* The flow went unused and was deleted. 
>>>>>>>>> */
>>>>>>>>> +    FDR_TOO_EXPENSIVE,      /* The flow was too expensive to 
>>>>>>>>> revalidate. */
>>>>>>>>> +    FDR_FLOW_WILDCARDED,    /* The flow needed a narrower wildcard 
>>>>>>>>> mask. */
>>>>>>>>> +    FDR_BAD_ODP_FIT,        /* The flow had a bad ODP flow fit. */
>>>>>>>>> +    FDR_NO_OFPROTO,         /* The flow didn't have an associated 
>>>>>>>>> ofproto. */
>>>>>>>>> +    FDR_XLATION_ERROR,      /* There was an error translating the 
>>>>>>>>> flow. */
>>>>>>>>> +    FDR_AVOID_CACHING,      /* Flow deleted to avoid caching. */
>>>>>>>>> +    FDR_FLOW_LIMIT,         /* All flows being killed. */
>>>>>>>>
>>>>>>>> Looking at the comment from Han on FDR_PURGE, and this patch
>>>> needing another spin, we should probably add it.
>>>>>>>
>>>>>>> I can do that, sure.  In that case, we will need to have a new flow op
>>>>>>> added to revalidator_sweep__ so that we can catch it.  But in that case,
>>>>>>> it will be a different usdt probe, so I still don't know if we need
>>>>>>> FDR_PURGE right?  WDYT?
>>>>>>
>>>>>> In revalidator_sweep__() you have sort of the following:
>>>>>>
>>>>>>                  if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>                      result = UKEY_DELETE;
>>>>>>                  } else if (!seq_mismatch) {
>>>>>>
>>>>>> And I’m afraid that if we use this tool to debug we miss the
>>>> ukey_state == UKEY_INCONSISTENT when debugging and spent a long time
>>>> figuring this out.
>>>>>> Maybe add something general like this (did not give it a lot of
>>>> thought), and only take the FDR_PURGE : FDR_UPDATE_FAIL results in the
>>>> script?
>>>>>>
>>>>>>   /* 'udpif_key's are responsible for tracking the little bit of state 
>>>>>> udpif
>>>>>> @@ -2991,13 +2993,13 @@ revalidator_sweep__(struct revalidator 
>>>>>> *revalidator, bool purge)
>>>>>>           uint64_t odp_actions_stub[1024 / 8];
>>>>>>           struct ofpbuf odp_actions = 
>>>>>> OFPBUF_STUB_INITIALIZER(odp_actions_stub);
>>>>>>
>>>>>> -        enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>           struct ukey_op ops[REVALIDATE_MAX_BATCH];
>>>>>>           struct udpif_key *ukey;
>>>>>>           struct umap *umap = &udpif->ukeys[i];
>>>>>>           size_t n_ops = 0;
>>>>>>
>>>>>>           CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>               enum ukey_state ukey_state;
>>>>>>
>>>>>>               /* Handler threads could be holding a ukey lock while it 
>>>>>> installs a
>>>>>> @@ -3016,8 +3018,10 @@ revalidator_sweep__(struct revalidator 
>>>>>> *revalidator, bool purge)
>>>>>>
>>>>>>                   if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>                       result = UKEY_DELETE;
>>>>>> +                    reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>>>>>>                   } else if (!seq_mismatch) {
>>>>>>                       result = UKEY_KEEP;
>>>>>> +                    reason = FDR_REVALIDATE; //_KEEP
>>>>>>                   } else {
>>>>>>                       struct dpif_flow_stats stats;
>>>>>>                       COVERAGE_INC(revalidate_missed_dp_flow);
>>>>>> @@ -3030,6 +3034,8 @@ revalidator_sweep__(struct revalidator 
>>>>>> *revalidator, bool purge)
>>>>>>                       reval_op_init(&ops[n_ops++], result, udpif, ukey, 
>>>>>> &recircs,
>>>>>>                                     &odp_actions);
>>>>>>                   }
>>>>>> +                OVS_USDT_PROBE(revalidator_sweep__, flow_result, result,
>>>>>> +                               reason, udpif, ukey);
>>>>>>               }
>>>>>>               ovs_mutex_unlock(&ukey->mutex);
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> In addition in revalidator_sweep__() should the “enum
>>>> flow_del_reason reason = FDR_REVALIDATE;” not be moved to the
>>>> CMAP_FOR_EACH() loop?
>>>>>>
>>>>>>
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>>   /* 'udpif_key's are responsible for tracking the little bit of 
>>>>>>>>> state udpif
>>>>>>>>>    * needs to do flow expiration which can't be pulled directly from 
>>>>>>>>> the
>>>>>>>>>    * datapath.  They may be created by any handler or revalidator 
>>>>>>>>> thread at any
>>>>>>>>> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct 
>>>>>>>>> udpif_key *ukey,
>>>>>>>>>   static enum reval_result
>>>>>>>>>   revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey,
>>>>>>>>>                     uint16_t tcp_flags, struct ofpbuf *odp_actions,
>>>>>>>>> -                  struct recirc_refs *recircs, struct xlate_cache 
>>>>>>>>> *xcache)
>>>>>>>>> +                  struct recirc_refs *recircs, struct xlate_cache 
>>>>>>>>> *xcache,
>>>>>>>>> +                  enum flow_del_reason *reason)
>>>>>>>>>   {
>>>>>>>>>       struct xlate_out *xoutp;
>>>>>>>>>       struct netflow *netflow;
>>>>>>>>> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif,
>>>> const struct udpif_key *ukey,
>>>>>>>>>       netflow = NULL;
>>>>>>>>>
>>>>>>>>>       if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) {
>>>>>>>>> +        *reason = FDR_XLATION_ERROR;
>>>>>>>>>           goto exit;
>>>>>>>>>       }
>>>>>>>>>       xoutp = &ctx.xout;
>>>>>>>>>
>>>>>>>>>       if (xoutp->avoid_caching) {
>>>>>>>>> +        *reason = FDR_AVOID_CACHING;
>>>>>>>>>           goto exit;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> @@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif,
>>> const struct udpif_key *ukey,
>>>>>>>>>           ofpbuf_clear(odp_actions);
>>>>>>>>>
>>>>>>>>>           if (!ofproto) {
>>>>>>>>> +            *reason = FDR_NO_OFPROTO;
>>>>>>>>>               goto exit;
>>>>>>>>>           }
>>>>>>>>>
>>>>>>>>> @@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif,
>>> const struct udpif_key *ukey,
>>>>>>>>>       if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, &dp_mask, 
>>>>>>>>> &ctx.flow,
>>>>>>>>>                                NULL)
>>>>>>>>>           == ODP_FIT_ERROR) {
>>>>>>>>> +        *reason = FDR_BAD_ODP_FIT;
>>>>>>>>>           goto exit;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> @@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif,
>>> const struct udpif_key *ukey,
>>>>>>>>>        * down.  Note that we do not know if the datapath has ignored 
>>>>>>>>> any of the
>>>>>>>>>        * wildcarded bits, so we may be overly conservative here. */
>>>>>>>>>       if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) {
>>>>>>>>> +        *reason = FDR_FLOW_WILDCARDED;
>>>>>>>>>           goto exit;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> @@ -2400,7 +2418,7 @@ static enum reval_result
>>>>>>>>>   revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>>>>>>>>                   const struct dpif_flow_stats *stats,
>>>>>>>>>                   struct ofpbuf *odp_actions, uint64_t reval_seq,
>>>>>>>>> -                struct recirc_refs *recircs)
>>>>>>>>> +                struct recirc_refs *recircs, enum flow_del_reason 
>>>>>>>>> *reason)
>>>>>>>>>       OVS_REQUIRES(ukey->mutex)
>>>>>>>>>   {
>>>>>>>>>       bool need_revalidate = ukey->reval_seq != reval_seq;
>>>>>>>>> @@ -2430,8 +2448,12 @@ revalidate_ukey(struct udpif *udpif, struct 
>>>>>>>>> udpif_key *ukey,
>>>>>>>>>                   xlate_cache_clear(ukey->xcache);
>>>>>>>>>               }
>>>>>>>>>               result = revalidate_ukey__(udpif, ukey, push.tcp_flags,
>>>>>>>>> -                                       odp_actions, recircs, 
>>>>>>>>> ukey->xcache);
>>>>>>>>> -        } /* else delete; too expensive to revalidate */
>>>>>>>>> +                                       odp_actions, recircs, 
>>>>>>>>> ukey->xcache,
>>>>>>>>> +                                       reason);
>>>>>>>>> +        } else {
>>>>>>>>> +            /* delete; too expensive to revalidate */
>>>>>>>>> +            *reason = FDR_TOO_EXPENSIVE;
>>>>>>>>> +        }
>>>>>>>>>       } else if (!push.n_packets || ukey->xcache
>>>>>>>>>                  || !populate_xcache(udpif, ukey, push.tcp_flags)) {
>>>>>>>>>           result = UKEY_KEEP;
>>>>>>>>> @@ -2831,6 +2853,7 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>           for (f = flows; f < &flows[n_dumped]; f++) {
>>>>>>>>>               long long int used = f->stats.used;
>>>>>>>>>               struct recirc_refs recircs = 
>>>>>>>>> RECIRC_REFS_EMPTY_INITIALIZER;
>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>               struct dpif_flow_stats stats = f->stats;
>>>>>>>>>               enum reval_result result;
>>>>>>>>>               struct udpif_key *ukey;
>>>>>>>>> @@ -2905,9 +2928,14 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>               }
>>>>>>>>>               if (kill_them_all || (used && used < now - max_idle)) {
>>>>>>>>>                   result = UKEY_DELETE;
>>>>>>>>> +                if (kill_them_all) {
>>>>>>>>> +                    reason = FDR_FLOW_LIMIT;
>>>>>>>>> +                } else {
>>>>>>>>> +                    reason = FDR_FLOW_IDLE;
>>>>>>>>> +                }
>>>>>>>>>               } else {
>>>>>>>>>                   result = revalidate_ukey(udpif, ukey, &stats, 
>>>>>>>>> &odp_actions,
>>>>>>>>> -                                         reval_seq, &recircs);
>>>>>>>>> +                                         reval_seq, &recircs, 
>>>>>>>>> &reason);
>>>>>>>>>               }
>>>>>>>>>               ukey->dump_seq = dump_seq;
>>>>>>>>>
>>>>>>>>> @@ -2916,6 +2944,7 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>                   udpif_update_flow_pps(udpif, ukey, f);
>>>>>>>>>               }
>>>>>>>>>
>>>>>>>>> +            OVS_USDT_PROBE(revalidate, flow_result, reason, udpif, 
>>>>>>>>> ukey);
>>>>>
>>>>> I have been experimenting with several upcall tracking techniques
>>>> that would make it easier to correlate upcalls with their subsequent
>>>> related events.
>>>>> To achieve that, we need (among other things) some easy-to-compare
>>>> unique value in the events. For revalidation events, I think a good
>>>> candidate would be "ukey->ufid" and so does the script in this patch.
>>>>>
>>>>> However, requiring all external tools to know the layout of "struct
>>>> udpif_key" in order to get that value makes things quite complicated
>>>> for CORE tools (e.g: retis).
>>>>>
>>>>> With all this, would you consider adding the ufid to probe payload 
>>>>> directly?
>>>>
>>>> Dont see why we can not, but if we need anything else it would quickly
>>>> explode in too much arguments. I guess CORE needs a good solution for
>>>> userspace.
>>>
>>> I think having the ufid and udpif makes sense for now.  Actually, maybe
>>> for the long term we can do something as an appctl command which will
>>> dump the relevant datastructures.  That way a tool like this, which
>>> needs the pid anyway, can generate an appctl command against the daemon
>>> and get the exact right header layout.
>>>
>>> WDYT?
>>
>> Update on this - I have it ready to go with the exception of this
>> particular change.  The USDT probe code generation seems to have a bug,
>> or maybe it just won't work using these kinds of data types.  For
>> example, the bcc tools gets confused and ends up generating code for the
>> probe like:
>>
>>     __attribute__((always_inline))
>> static __always_inline int _bpf_readarg_usdt__flow_result_3(struct
> pt_regs *ctx, void *dest, size_t len) {
>>       if (len != sizeof(d)) return -1;
>> { u64 __addr = ctx->r13 + 40; __asm__ __volatile__("": : :"memory");
> uint128_t __res = 0x0; bpf_probe_read_user(&__res, sizeof(__res),
> (void *)__addr); *((d *)dest) = __res; }
>>       return 0;
>>     }
>>
>> This code isn't valid - the (d) variable isn't properly represented
>> here.  I guess maybe there's some issue with the way they use ctypes
>> internally?  It isn't clear to me.
>
> I miss some context, but is it because you point to ufid, which is a
> union (typedef union ovs_u128)? What if you point to the u32 member of
> the union?

I'm not sure, but that doesn't seem like the best approach to me, tbh.

>> We could work around this by making the ufid always pass as a void *,
>> and passing a length that tells the size.  However, if the underlying
>> data type ever changes, it will be totally incompatible.  I doubt it
>> would ever change, but at least with something stronger typed, we could
>> get a better exception that bubbled up.
>
> I do not like the pointer + length, as this adds another argument. I
> can’t remember which, but some tools had problems with 5 or more
> arguments.

I do agree.  I also want to test the latest version of bcc-tools also,
to see if it really is just related to my f38 version of bcc-tools.
That also raises a different issue, but it can be something we solve in
the interrim with a version check.

>> I'm okay with doing it this way, but just wanted to make sure that
>> retis, et. al. would be okay with interpreting it this way.
>>
>> Alternatively, there's definitely a bug there - we could omit the
>> argument entirely and rely on pahole or even pyelftools to help extract
>> the struct layout and get the ufid from the udpif_key struct.  The data
>> is already being sent, so it seems like duplication a bit to send it,
>> and becomes difficult in light of this bcc "bug?"
>
> I agree it’s overloading arguments with data already present, but I
> guess CORE can not yet read debug data from userspace
> applications. Adrian, I thought this was being worked on with the
> Pahole guys. If this is close we could just leave out this extra
> argument for now.

That should be something doable, I think.  For example, there are crates
like gimli that parse DWARF data (but not sure if it is parsing v5,
yet).

> Alternatively, we could ask the BCC guys if this is a known issue,
> with maybe a potential workaround. Just post it here
> https://github.com/iovisor/bcc/issues the respond quickly.

They will tell to run with the latest version, I think.  The version I'm
using is what ships with Fedora 38, and seems quite out of date.

> <SNIP>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to