On 20 Dec 2022, at 15:21, Eelco Chaudron wrote:
> On 16 Nov 2022, at 16:41, Eelco Chaudron wrote: > >> On 21 Oct 2022, at 18:35, Kevin Sprague wrote: >> >>> During normal operations, it is useful to understand when a particular flow >>> gets removed from the system. This can be useful when debugging performance >>> issues tied to ofproto flow changes, trying to determine deployed traffic >>> patterns, or while debugging dynamic systems where ports come and go. >>> >>> Prior to this change, there was a lack of visibility around flow expiration. >>> The existing debugging infrastructure could tell us when a flow was added to >>> the datapath, but not when it was removed or why. >>> >>> This change introduces a USDT probe at the point where the revalidator >>> determines that the flow should be removed. Additionally, we track the >>> reason for the flow eviction and provide that information as well. With >>> this change, we can track the complete flow lifecycle for the netlink >>> datapath >>> by hooking the upcall tracepoint in kernel, the flow put USDT, and the >>> revaldiator USDT, letting us watch as flows are added and removed from the >>> kernel datapath. >>> >>> This change only enables this information via USDT probe, so it won't be >>> possible to access this information any other way (see: >>> Documentation/topics/usdt-probes.rst). >>> >>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py) >>> that >>> serves as a demonstration of how the new USDT probe might be used going >>> forward. >>> >>> Change since v5: fixed author information. >>> >>> Signed-off-by: Kevin Sprague <[email protected]> >> >> >> Hi Kevin, >> >> Most of the changes look fine to me, however, there are still a lot of >> crashes in the filter code. >> >> Also for now including the OVS data structures in the script will work for >> now. If we do not get a solution before this gets merged, I’ll fix up all >> the scripts that need this later. >> >> About the crash, it has to do with when we do not receive any uuid/key (I >> think I did not research). >> >> But if I start the script, and do the following (RHEL8): >> >> ovs-vsctl del-br br-int >> ovs-vsctl add-br br-int >> >> Now I get this: >> >> TIME UFID EVENT/REASON >> 5361884.255647616 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255689699 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255712148 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255734158 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255753341 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255772079 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255805591 ufid:2876428c-567e-429c-9dc3-0000d83503f1 Insert >> (put) flow to kernel. >> 5361884.255832007 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255852449 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255871090 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255889960 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255909455 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255928863 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> 5361884.255948291 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >> >> So a lot of all 0 ufid’s, did not investigate if this is true, or a script >> error. >> >> >> Now if I use the script with the -k option: >> >> $ ./flow_reval_monitor.py -k >> TIME UFID EVENT/REASON >> Traceback (most recent call last): >> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >> ret = callback(ctx, data, size) >> File "./flow_reval_monitor.py", line 502, in handle_event >> handle_flow_put(event) >> File "./flow_reval_monitor.py", line 227, in handle_flow_put >> key = decode_key(bytes(event.key)[:event.key_size]) >> File "./flow_reval_monitor.py", line 328, in decode_key >> result[get_ovs_key_attr_str(nla_type)] = nla_data >> File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str >> return ovs_key_attr[attr] >> IndexError: list index out of range >> >> Same thing if I try to use a filter option: >> >> [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f ipv6 >> TIME UFID EVENT/REASON >> Traceback (most recent call last): >> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >> ret = callback(ctx, data, size) >> File "./flow_reval_monitor.py", line 502, in handle_event >> handle_flow_put(event) >> File "./flow_reval_monitor.py", line 227, in handle_flow_put >> key = decode_key(bytes(event.key)[:event.key_size]) >> File "./flow_reval_monitor.py", line 328, in decode_key >> result[get_ovs_key_attr_str(nla_type)] = nla_data >> File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str >> return ovs_key_attr[attr] >> IndexError: list index out of range >> >> >> Also, an invalid filter would also crash. I think Adrian requested to report >> errors for the unknown/invalid filter option, i.e. do a sanity check of the >> supplied filter. >> >> $ ./flow_reval_monitor.py -f eelco >> TIME UFID EVENT/REASON >> Traceback (most recent call last): >> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >> ret = callback(ctx, data, size) >> File "./flow_reval_monitor.py", line 502, in handle_event >> handle_flow_put(event) >> File "./flow_reval_monitor.py", line 232, in handle_flow_put >> find_and_delete_from_watchlist(event) >> File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist >> b["watchlist"].items_delete_batch(key) >> >> I guess you need to re-look at it, as even your example is crashing. >> >> $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)" >> TIME UFID EVENT/REASON >> Traceback (most recent call last): >> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >> ret = callback(ctx, data, size) >> File "./flow_reval_monitor.py", line 502, in handle_event >> handle_flow_put(event) >> File "./flow_reval_monitor.py", line 232, in handle_flow_put >> find_and_delete_from_watchlist(event) >> File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist >> b["watchlist"].items_delete_batch(key) >> AttributeError: 'HashTable' object has no attribute 'items_delete_batch' >> >> >> Maybe you should also add some details on what filter options are supported >> and the format as it gets me puzzled… >> >> Cheers, >> >> Eelco > > Hi Kevin, > > Did not see any update on this patch for a while, just want to let you know I > sent out a v3 of the reval_stats patch, which include the pahole tool to > create the OVS structures. > > In addition, I also noticed your events do not include a thread id, which I > think is useful, so you might want to add this to (maybe the thread name > which his more user friendly). > > Also when hw offload is used the flow add is not captured, maybe you can add > another trace point that will trigger for TC (or one that will trigger for > both)? Forgot to mention that you probably also want to add some checking to make sure you do not lose events. See __sync_fetch_and_add() below: +BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1); + +static struct event_t *get_event(uint32_t id) { + struct event_t *event = events.ringbuf_reserve(sizeof(struct event_t)); + + if (!event) { + uint32_t type = 0; + uint64_t *value = dropcnt.lookup(&type); + if (value) + __sync_fetch_and_add(value, 1); + + return NULL; + } + + event->id = id; + event->ts = bpf_ktime_get_ns(); + event->pid = bpf_get_current_pid_tgid(); + + return event; +} _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
