On 21 Feb 2024, at 16:03, Adrian Moreno wrote:
> On 2/21/24 15:49, Aaron Conole wrote: >> Adrian Moreno <amore...@redhat.com> writes: >> >>> On 2/20/24 19:06, Aaron Conole wrote: >>>> Eelco Chaudron <echau...@redhat.com> writes: >>>> >>>>> On 19 Feb 2024, at 19:57, Aaron Conole wrote: >>>>> >>>>>> Eelco Chaudron <echau...@redhat.com> writes: >>>>>> >>>>>>> On 12 Feb 2024, at 15:15, Aaron Conole wrote: >>>>>>> >>>>>>>> Aaron Conole <acon...@redhat.com> writes: >>>>>>>> >>>>>>>>> Eelco Chaudron <echau...@redhat.com> writes: >>>>>>>>> >>>>>>>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote: >>>>>>>>>> >>>>>>>>>>> On 2/1/24 10:02, Eelco Chaudron wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Eelco Chaudron <echau...@redhat.com> writes: >>>>>>>>>>>>> >>>>>>>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Kevin Sprague <ksprague0...@gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> During normal operations, it is useful to understand when a >>>>>>>>>>>>>>> particular flow >>>>>>>>>>>>>>> gets removed from the system. This can be useful when debugging >>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>> issues tied to ofproto flow changes, trying to determine >>>>>>>>>>>>>>> deployed traffic >>>>>>>>>>>>>>> patterns, or while debugging dynamic systems where ports come >>>>>>>>>>>>>>> and go. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Prior to this change, there was a lack of visibility around >>>>>>>>>>>>>>> flow expiration. >>>>>>>>>>>>>>> The existing debugging infrastructure could tell us when a flow >>>>>>>>>>>>>>> was added to >>>>>>>>>>>>>>> the datapath, but not when it was removed or why. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This change introduces a USDT probe at the point where the >>>>>>>>>>>>>>> revalidator >>>>>>>>>>>>>>> determines that the flow should be removed. Additionally, we >>>>>>>>>>>>>>> track the >>>>>>>>>>>>>>> reason for the flow eviction and provide that information as >>>>>>>>>>>>>>> well. With >>>>>>>>>>>>>>> this change, we can track the complete flow lifecycle for the >>>>>>>>>>>>>>> netlink >>>>>>>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow >>>>>>>>>>>>>>> put USDT, and >>>>>>>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and >>>>>>>>>>>>>>> removed from >>>>>>>>>>>>>>> the kernel datapath. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This change only enables this information via USDT probe, so it >>>>>>>>>>>>>>> won't be >>>>>>>>>>>>>>> possible to access this information any other way (see: >>>>>>>>>>>>>>> Documentation/topics/usdt-probes.rst). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also included is a script >>>>>>>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py) >>>>>>>>>>>>>>> which serves as a demonstration of how the new USDT probe might >>>>>>>>>>>>>>> be used >>>>>>>>>>>>>>> going forward. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Kevin Sprague <ksprague0...@gmail.com> >>>>>>>>>>>>>>> Co-authored-by: Aaron Conole <acon...@redhat.com> >>>>>>>>>>>>>>> Signed-off-by: Aaron Conole <acon...@redhat.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for following this up Aaron! See comments on this patch >>>>>>>>>> below. I have no additional comments on patch 2. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Eelco >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> Documentation/topics/usdt-probes.rst | 1 + >>>>>>>>>>>>>>> ofproto/ofproto-dpif-upcall.c | 42 +- >>>>>>>>>>>>>>> utilities/automake.mk | 3 + >>>>>>>>>>>>>>> utilities/usdt-scripts/flow_reval_monitor.py | 653 >>>>>>>>>>>>>>> +++++++++++++++++++ >>>>>>>>>>>>>>> 4 files changed, 693 insertions(+), 6 deletions(-) >>>>>>>>>>>>>>> create mode 100755 >>>>>>>>>>>>>>> utilities/usdt-scripts/flow_reval_monitor.py >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst >>> b/Documentation/topics/usdt-probes.rst >>>>>>>>>>>>>>> index e527f43bab..a8da9bb1f7 100644 >>>>>>>>>>>>>>> --- a/Documentation/topics/usdt-probes.rst >>>>>>>>>>>>>>> +++ b/Documentation/topics/usdt-probes.rst >>>>>>>>>>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``: >>>>>>>>>>>>>>> - dpif_recv:recv_upcall >>>>>>>>>>>>>>> - main:poll_block >>>>>>>>>>>>>>> - main:run_start >>>>>>>>>>>>>>> +- revalidate:flow_result >>>>>>>>>>>>>>> - revalidate_ukey\_\_:entry >>>>>>>>>>>>>>> - revalidate_ukey\_\_:exit >>>>>>>>>>>>>>> - udpif_revalidator:start_dump >>>>>>>>>>>>>> >>>>>>>>>>>>>> You are missing the specific flow_result result >>> section. This is from the previous patch: >>>>>>>>>>>>> >>>>>>>>>>>>> D'oh! Thanks for catching it. I'll re-add it. >>>>>>>>>>>>> >>>>>>>>>>>>>> @@ -358,6 +360,27 @@ See also the ``main:run_start`` probe >>>>>>>>>>>>>> above. >>>>>>>>>>>>>> - ``utilities/usdt-scripts/bridge_loop.bt`` >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> +probe revalidate:flow_result >>>>>>>>>>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>>>>>>>>> + >>>>>>>>>>>>>> +**Description**: >>>>>>>>>>>>>> +This probe is triggered when the revalidator decides whether or >>>>>>>>>>>>>> not to >>>>>>>>>>>>>> +revalidate a flow. ``reason`` is an enum that denotes that >>>>>>>>>>>>>> either the flow >>>>>>>>>>>>>> +is being kept, or the reason why the flow is being deleted. The >>>>>>>>>>>>>> +``flow_reval_monitor.py`` script uses this probe to notify >>>>>>>>>>>>>> users when flows >>>>>>>>>>>>>> +matching user-provided criteria are deleted. >>>>>>>>>>>>>> + >>>>>>>>>>>>>> +**Arguments**: >>>>>>>>>>>>>> + >>>>>>>>>>>>>> +- *arg0*: ``(enum flow_del_reason) reason`` >>>>>>>>>>>>>> +- *arg1*: ``(struct udpif *) udpif`` >>>>>>>>>>>>>> +- *arg2*: ``(struct udpif_key *) ukey`` >>>>>>>>>>>>>> + >>>>>>>>>>>>>> +**Script references**: >>>>>>>>>>>>>> + >>>>>>>>>>>>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py`` >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + >>>>>>>>>>>>>> Adding your own probes >>>>>>>>>>>>>> ---------------------- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c >>>>>>>>>>>>>>> b/ofproto/ofproto-dpif-upcall.c >>>>>>>>>>>>>>> index b5cbeed878..97d75833f7 100644 >>>>>>>>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c >>>>>>>>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c >>>>>>>>>>>>>>> @@ -269,6 +269,18 @@ enum ukey_state { >>>>>>>>>>>>>>> }; >>>>>>>>>>>>>>> #define N_UKEY_STATES (UKEY_DELETED + 1) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +enum flow_del_reason { >>>>>>>>>>>>>>> + FDR_REVALIDATE = 0, /* The flow was revalidated. */ >>>>>>>>>>>>>> >>>>>>>>>>>>>> It was called FDR_FLOW_LIVE before, which might make more >>>>>>>>>> sense. As the flow is just NOT deleted. It might or might not have >>>>>>>>>> been revalidated. Thoughts? >>>>>>>>>>>>> >>>>>>>>>>>>> I think it had to have been revalidated if we emit the reason, >>>>>>>>>>>>> because >>>>>>>>>>>>> we only emit the reason code after revalidation. IE: there are >>>>>>>>>>>>> many >>>>>>>>>>>>> places where we skip revalidation but the flow stays live - and >>>>>>>>>>>>> we don't >>>>>>>>>>>>> emit reasons in those cases. >>>>>>>>>>>>> >>>>>>>>>>>>> So at least for this patch, it MUST have been revalidated. But >>>>>>>>>>>>> maybe in >>>>>>>>>>>>> the future, we would want to catch cases where the flow hasn't >>>>>>>>>>>>> been. In >>>>>>>>>>>>> that case, it makes sense to add the FDR_FLOW_LIVE at that time - >>>>>>>>>>>>> I >>>>>>>>>>>>> think. >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe you disagree? >>>>>>>>>>>> >>>>>>>>>>>> Well, it depends on how you define revalidation, it might only have >>>>>>>>>> updated the counters. i.e. it all depends on ‘bool need_revalidate = >>>>>>>>>> ukey->reval_seq != reval_seq;’ in revalidate_ukey(). That was why I >>>>>>>>>> opted for a more general name. >>>>>>>>>>>> >>>>>>>>>>>>>>> + FDR_FLOW_IDLE, /* The flow went unused and was >>>>>>>>>>>>>>> deleted. */ >>>>>>>>>>>>>>> + FDR_TOO_EXPENSIVE, /* The flow was too expensive to >>>>>>>>>>>>>>> revalidate. */ >>>>>>>>>>>>>>> + FDR_FLOW_WILDCARDED, /* The flow needed a narrower >>>>>>>>>>>>>>> wildcard mask. */ >>>>>>>>>>>>>>> + FDR_BAD_ODP_FIT, /* The flow had a bad ODP flow >>>>>>>>>>>>>>> fit. */ >>>>>>>>>>>>>>> + FDR_NO_OFPROTO, /* The flow didn't have an >>>>>>>>>>>>>>> associated ofproto. */ >>>>>>>>>>>>>>> + FDR_XLATION_ERROR, /* There was an error translating >>>>>>>>>>>>>>> the flow. */ >>>>>>>>>>>>>>> + FDR_AVOID_CACHING, /* Flow deleted to avoid caching. >>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>> + FDR_FLOW_LIMIT, /* All flows being killed. */ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at the comment from Han on FDR_PURGE, and this patch >>>>>>>>>> needing another spin, we should probably add it. >>>>>>>>>>>>> >>>>>>>>>>>>> I can do that, sure. In that case, we will need to have a new >>>>>>>>>>>>> flow op >>>>>>>>>>>>> added to revalidator_sweep__ so that we can catch it. But in >>>>>>>>>>>>> that case, >>>>>>>>>>>>> it will be a different usdt probe, so I still don't know if we >>>>>>>>>>>>> need >>>>>>>>>>>>> FDR_PURGE right? WDYT? >>>>>>>>>>>> >>>>>>>>>>>> In revalidator_sweep__() you have sort of the following: >>>>>>>>>>>> >>>>>>>>>>>> if (purge || ukey_state == UKEY_INCONSISTENT) { >>>>>>>>>>>> result = UKEY_DELETE; >>>>>>>>>>>> } else if (!seq_mismatch) { >>>>>>>>>>>> >>>>>>>>>>>> And I’m afraid that if we use this tool to debug we miss the >>>>>>>>>> ukey_state == UKEY_INCONSISTENT when debugging and spent a long time >>>>>>>>>> figuring this out. >>>>>>>>>>>> Maybe add something general like this (did not give it a lot of >>>>>>>>>> thought), and only take the FDR_PURGE : FDR_UPDATE_FAIL results in >>>>>>>>>> the >>>>>>>>>> script? >>>>>>>>>>>> >>>>>>>>>>>> /* 'udpif_key's are responsible for tracking the little bit of >>>>>>>>>>>> state udpif >>>>>>>>>>>> @@ -2991,13 +2993,13 @@ revalidator_sweep__(struct >>> revalidator *revalidator, bool purge) >>>>>>>>>>>> uint64_t odp_actions_stub[1024 / 8]; >>>>>>>>>>>> struct ofpbuf odp_actions = >>>>>>>>>>>> OFPBUF_STUB_INITIALIZER(odp_actions_stub); >>>>>>>>>>>> >>>>>>>>>>>> - enum flow_del_reason reason = FDR_REVALIDATE; >>>>>>>>>>>> struct ukey_op ops[REVALIDATE_MAX_BATCH]; >>>>>>>>>>>> struct udpif_key *ukey; >>>>>>>>>>>> struct umap *umap = &udpif->ukeys[i]; >>>>>>>>>>>> size_t n_ops = 0; >>>>>>>>>>>> >>>>>>>>>>>> CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) { >>>>>>>>>>>> + enum flow_del_reason reason = FDR_REVALIDATE; >>>>>>>>>>>> enum ukey_state ukey_state; >>>>>>>>>>>> >>>>>>>>>>>> /* Handler threads could be holding a ukey lock >>>>>>>>>>>> while it installs a >>>>>>>>>>>> @@ -3016,8 +3018,10 @@ revalidator_sweep__(struct >>> revalidator *revalidator, bool purge) >>>>>>>>>>>> >>>>>>>>>>>> if (purge || ukey_state == UKEY_INCONSISTENT) { >>>>>>>>>>>> result = UKEY_DELETE; >>>>>>>>>>>> + reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL; >>>>>>>>>>>> } else if (!seq_mismatch) { >>>>>>>>>>>> result = UKEY_KEEP; >>>>>>>>>>>> + reason = FDR_REVALIDATE; //_KEEP >>>>>>>>>>>> } else { >>>>>>>>>>>> struct dpif_flow_stats stats; >>>>>>>>>>>> COVERAGE_INC(revalidate_missed_dp_flow); >>>>>>>>>>>> @@ -3030,6 +3034,8 @@ revalidator_sweep__(struct revalidator >>>>>>>>>>>> *revalidator, bool purge) >>>>>>>>>>>> reval_op_init(&ops[n_ops++], result, >>>>>>>>>>>> udpif, ukey, &recircs, >>>>>>>>>>>> &odp_actions); >>>>>>>>>>>> } >>>>>>>>>>>> + OVS_USDT_PROBE(revalidator_sweep__, flow_result, >>>>>>>>>>>> result, >>>>>>>>>>>> + reason, udpif, ukey); >>>>>>>>>>>> } >>>>>>>>>>>> ovs_mutex_unlock(&ukey->mutex); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In addition in revalidator_sweep__() should the “enum >>>>>>>>>> flow_del_reason reason = FDR_REVALIDATE;” not be moved to the >>>>>>>>>> CMAP_FOR_EACH() loop? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> +}; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> /* 'udpif_key's are responsible for tracking the little bit >>>>>>>>>>>>>>> of state udpif >>>>>>>>>>>>>>> * needs to do flow expiration which can't be pulled >>>>>>>>>>>>>>> directly from the >>>>>>>>>>>>>>> * datapath. They may be created by any handler or >>>>>>>>>>>>>>> revalidator thread at any >>>>>>>>>>>>>>> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, >>>>>>>>>>>>>>> struct udpif_key *ukey, >>>>>>>>>>>>>>> static enum reval_result >>>>>>>>>>>>>>> revalidate_ukey__(struct udpif *udpif, const struct >>>>>>>>>>>>>>> udpif_key *ukey, >>>>>>>>>>>>>>> uint16_t tcp_flags, struct ofpbuf >>>>>>>>>>>>>>> *odp_actions, >>>>>>>>>>>>>>> - struct recirc_refs *recircs, struct >>>>>>>>>>>>>>> xlate_cache *xcache) >>>>>>>>>>>>>>> + struct recirc_refs *recircs, struct >>>>>>>>>>>>>>> xlate_cache *xcache, >>>>>>>>>>>>>>> + enum flow_del_reason *reason) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> struct xlate_out *xoutp; >>>>>>>>>>>>>>> struct netflow *netflow; >>>>>>>>>>>>>>> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif, >>>>>>>>>> const struct udpif_key *ukey, >>>>>>>>>>>>>>> netflow = NULL; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) { >>>>>>>>>>>>>>> + *reason = FDR_XLATION_ERROR; >>>>>>>>>>>>>>> goto exit; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> xoutp = &ctx.xout; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (xoutp->avoid_caching) { >>>>>>>>>>>>>>> + *reason = FDR_AVOID_CACHING; >>>>>>>>>>>>>>> goto exit; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif, >>>>>>>>> const struct udpif_key *ukey, >>>>>>>>>>>>>>> ofpbuf_clear(odp_actions); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (!ofproto) { >>>>>>>>>>>>>>> + *reason = FDR_NO_OFPROTO; >>>>>>>>>>>>>>> goto exit; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif, >>>>>>>>> const struct udpif_key *ukey, >>>>>>>>>>>>>>> if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, >>>>>>>>>>>>>>> &dp_mask, &ctx.flow, >>>>>>>>>>>>>>> NULL) >>>>>>>>>>>>>>> == ODP_FIT_ERROR) { >>>>>>>>>>>>>>> + *reason = FDR_BAD_ODP_FIT; >>>>>>>>>>>>>>> goto exit; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif, >>>>>>>>> const struct udpif_key *ukey, >>>>>>>>>>>>>>> * down. Note that we do not know if the datapath has >>>>>>>>>>>>>>> ignored any of the >>>>>>>>>>>>>>> * wildcarded bits, so we may be overly conservative >>>>>>>>>>>>>>> here. */ >>>>>>>>>>>>>>> if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) { >>>>>>>>>>>>>>> + *reason = FDR_FLOW_WILDCARDED; >>>>>>>>>>>>>>> goto exit; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -2400,7 +2418,7 @@ static enum reval_result >>>>>>>>>>>>>>> revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, >>>>>>>>>>>>>>> const struct dpif_flow_stats *stats, >>>>>>>>>>>>>>> struct ofpbuf *odp_actions, uint64_t >>>>>>>>>>>>>>> reval_seq, >>>>>>>>>>>>>>> - struct recirc_refs *recircs) >>>>>>>>>>>>>>> + struct recirc_refs *recircs, enum >>>>>>>>>>>>>>> flow_del_reason *reason) >>>>>>>>>>>>>>> OVS_REQUIRES(ukey->mutex) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> bool need_revalidate = ukey->reval_seq != reval_seq; >>>>>>>>>>>>>>> @@ -2430,8 +2448,12 @@ revalidate_ukey(struct udpif *udpif, >>>>>>>>>>>>>>> struct udpif_key *ukey, >>>>>>>>>>>>>>> xlate_cache_clear(ukey->xcache); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> result = revalidate_ukey__(udpif, ukey, >>>>>>>>>>>>>>> push.tcp_flags, >>>>>>>>>>>>>>> - odp_actions, recircs, >>>>>>>>>>>>>>> ukey->xcache); >>>>>>>>>>>>>>> - } /* else delete; too expensive to revalidate */ >>>>>>>>>>>>>>> + odp_actions, recircs, >>>>>>>>>>>>>>> ukey->xcache, >>>>>>>>>>>>>>> + reason); >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + /* delete; too expensive to revalidate */ >>>>>>>>>>>>>>> + *reason = FDR_TOO_EXPENSIVE; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> } else if (!push.n_packets || ukey->xcache >>>>>>>>>>>>>>> || !populate_xcache(udpif, ukey, >>>>>>>>>>>>>>> push.tcp_flags)) { >>>>>>>>>>>>>>> result = UKEY_KEEP; >>>>>>>>>>>>>>> @@ -2831,6 +2853,7 @@ revalidate(struct revalidator >>>>>>>>>>>>>>> *revalidator) >>>>>>>>>>>>>>> for (f = flows; f < &flows[n_dumped]; f++) { >>>>>>>>>>>>>>> long long int used = f->stats.used; >>>>>>>>>>>>>>> struct recirc_refs recircs = >>>>>>>>>>>>>>> RECIRC_REFS_EMPTY_INITIALIZER; >>>>>>>>>>>>>>> + enum flow_del_reason reason = FDR_REVALIDATE; >>>>>>>>>>>>>>> struct dpif_flow_stats stats = f->stats; >>>>>>>>>>>>>>> enum reval_result result; >>>>>>>>>>>>>>> struct udpif_key *ukey; >>>>>>>>>>>>>>> @@ -2905,9 +2928,14 @@ revalidate(struct revalidator >>>>>>>>>>>>>>> *revalidator) >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> if (kill_them_all || (used && used < now - >>>>>>>>>>>>>>> max_idle)) { >>>>>>>>>>>>>>> result = UKEY_DELETE; >>>>>>>>>>>>>>> + if (kill_them_all) { >>>>>>>>>>>>>>> + reason = FDR_FLOW_LIMIT; >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + reason = FDR_FLOW_IDLE; >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>> result = revalidate_ukey(udpif, ukey, >>>>>>>>>>>>>>> &stats, &odp_actions, >>>>>>>>>>>>>>> - reval_seq, &recircs); >>>>>>>>>>>>>>> + reval_seq, &recircs, >>>>>>>>>>>>>>> &reason); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> ukey->dump_seq = dump_seq; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -2916,6 +2944,7 @@ revalidate(struct revalidator >>>>>>>>>>>>>>> *revalidator) >>>>>>>>>>>>>>> udpif_update_flow_pps(udpif, ukey, f); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + OVS_USDT_PROBE(revalidate, flow_result, reason, >>>>>>>>>>>>>>> udpif, ukey); >>>>>>>>>>> >>>>>>>>>>> I have been experimenting with several upcall tracking techniques >>>>>>>>>> that would make it easier to correlate upcalls with their subsequent >>>>>>>>>> related events. >>>>>>>>>>> To achieve that, we need (among other things) some easy-to-compare >>>>>>>>>> unique value in the events. For revalidation events, I think a good >>>>>>>>>> candidate would be "ukey->ufid" and so does the script in this patch. >>>>>>>>>>> >>>>>>>>>>> However, requiring all external tools to know the layout of "struct >>>>>>>>>> udpif_key" in order to get that value makes things quite complicated >>>>>>>>>> for CORE tools (e.g: retis). >>>>>>>>>>> >>>>>>>>>>> With all this, would you consider adding the ufid to probe payload >>>>>>>>>>> directly? >>>>>>>>>> >>>>>>>>>> Dont see why we can not, but if we need anything else it would >>>>>>>>>> quickly >>>>>>>>>> explode in too much arguments. I guess CORE needs a good solution for >>>>>>>>>> userspace. >>>>>>>>> >>>>>>>>> I think having the ufid and udpif makes sense for now. Actually, >>>>>>>>> maybe >>>>>>>>> for the long term we can do something as an appctl command which will >>>>>>>>> dump the relevant datastructures. That way a tool like this, which >>>>>>>>> needs the pid anyway, can generate an appctl command against the >>>>>>>>> daemon >>>>>>>>> and get the exact right header layout. >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>> >>>>>>>> Update on this - I have it ready to go with the exception of this >>>>>>>> particular change. The USDT probe code generation seems to have a bug, >>>>>>>> or maybe it just won't work using these kinds of data types. For >>>>>>>> example, the bcc tools gets confused and ends up generating code for >>>>>>>> the >>>>>>>> probe like: >>>>>>>> >>>>>>>> __attribute__((always_inline)) >>>>>>>> static __always_inline int _bpf_readarg_usdt__flow_result_3(struct >>>>>>> pt_regs *ctx, void *dest, size_t len) { >>>>>>>> if (len != sizeof(d)) return -1; >>>>>>>> { u64 __addr = ctx->r13 + 40; __asm__ __volatile__("": : :"memory"); >>>>>>> uint128_t __res = 0x0; bpf_probe_read_user(&__res, sizeof(__res), >>>>>>> (void *)__addr); *((d *)dest) = __res; } >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> This code isn't valid - the (d) variable isn't properly represented >>>>>>>> here. I guess maybe there's some issue with the way they use ctypes >>>>>>>> internally? It isn't clear to me. >>>>>>> >>>>>>> I miss some context, but is it because you point to ufid, which is a >>>>>>> union (typedef union ovs_u128)? What if you point to the u32 member of >>>>>>> the union? >>>>>> >>>>>> I'm not sure, but that doesn't seem like the best approach to me, tbh. >>>>>> >>>>>>>> We could work around this by making the ufid always pass as a void *, >>>>>>>> and passing a length that tells the size. However, if the underlying >>>>>>>> data type ever changes, it will be totally incompatible. I doubt it >>>>>>>> would ever change, but at least with something stronger typed, we could >>>>>>>> get a better exception that bubbled up. >>>>>>> >>>>>>> I do not like the pointer + length, as this adds another argument. I >>>>>>> can’t remember which, but some tools had problems with 5 or more >>>>>>> arguments. >>>>>> >>>>>> I do agree. I also want to test the latest version of bcc-tools also, >>>>>> to see if it really is just related to my f38 version of bcc-tools. >>>>>> That also raises a different issue, but it can be something we solve in >>>>>> the interrim with a version check. >>>>>> >>>>>>>> I'm okay with doing it this way, but just wanted to make sure that >>>>>>>> retis, et. al. would be okay with interpreting it this way. >>>>>>>> >>>>>>>> Alternatively, there's definitely a bug there - we could omit the >>>>>>>> argument entirely and rely on pahole or even pyelftools to help extract >>>>>>>> the struct layout and get the ufid from the udpif_key struct. The data >>>>>>>> is already being sent, so it seems like duplication a bit to send it, >>>>>>>> and becomes difficult in light of this bcc "bug?" >>>>>>> >>>>>>> I agree it’s overloading arguments with data already present, but I >>>>>>> guess CORE can not yet read debug data from userspace >>>>>>> applications. Adrian, I thought this was being worked on with the >>>>>>> Pahole guys. If this is close we could just leave out this extra >>>>>>> argument for now. >>>>>> >>>>>> That should be something doable, I think. For example, there are crates >>>>>> like gimli that parse DWARF data (but not sure if it is parsing v5, >>>>>> yet). >>>>>> >>>>>>> Alternatively, we could ask the BCC guys if this is a known issue, >>>>>>> with maybe a potential workaround. Just post it here >>>>>>> https://github.com/iovisor/bcc/issues the respond quickly. >>>>>> >>>>>> They will tell to run with the latest version, I think. The version I'm >>>>>> using is what ships with Fedora 38, and seems quite out of date. >>>>> >>>>> ACK, will wait for your conclusion and the v9. >>>> After tying to get it working with latest bcc-tools, the behavior is >>>> actually much worse. Whereas before the flow_reval_monitor script would >>>> fail during the bpf compilation (with a clear error message), the latest >>>> BCC tools will crash the python interpreter. >>>> I'm going to omit the ufid details. I think CO-RE will have support >>>> anyway for this kinds of output, so v9 I will keep the args as-is and >>>> when the support gets added to decode the debuginfo packages to pull >>>> those structures, they will get the support. >>>> >>> >>> Sorry, maybe I missed something. What support will CO-RE have? >> >> I guess, my understanding was the pahole work would be the thing CO-RE >> needs, but after reading your comments, maybe that isn't true. >> > > Doing CO-RE with some BTF generated by pahole from a debug file is just > possible in theory. It looks doable by using libbpf directly and even there, > CO-RE was meant to be used on the vmlinux (or module) BTF. I haven't tried it > yet. > Common ebpf frameworks such as aya, cillim/bpf, etc don't seem to expose that > functionality AFAICS. > > So, not putting the ufid makes things difficult for CO-RE tools for the > forseable future. > > Also, Eelco, could you expand on the 5 argument limitation? I see > revalidate_ukey__:entry has 6 arguments and seems to work. The only thing I could remember is that I had problems with some tooling, but can’t recall which. So it might have been related to the tool, and not the actual tracepoint insertion part. > OTOH, although the ufid is variable-length, could we assume that at most it's > 128 bits? That's what the uAPI states. If so, could we send a raw pointer and > assume that length? I don't know any reason why OVS will at some point reduce > the size but if it does, it's likely that a 128bit buffer will still be used > (istead of using dynamic memory). I feel like duplicating arguments to make it easier for external tools, sounds like a bad design principle. > Sorry for insisting, it's just that I would really like to consume this from > the retis side. How far away are we from dwarf to BTF, which then can be used by CO-RE? Maybe we should try this first, and if this is a dead-end we can re-visit and add the extra argument. If we add the argument now, we can not remove it later. //Eelco _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev