On 23 Dec 2022, at 18:02, kevin sprague wrote:
> Hi Eelco, > > I've been busy finishing up school, I should have some time to sit down > with it in the next week or so. Hi Kevin, Are you still planning on looking at this? I know it’s more than a year ago you posted this, but I still think this should go into OVS. So if you are not planning this, I’ll find someone who can work on this. Thanks, Eelco > Thanks, > Kevin Sprague > > On Tue, Dec 20, 2022, 9:34 AM Eelco Chaudron <[email protected]> wrote: > >> >> >> On 20 Dec 2022, at 15:21, Eelco Chaudron wrote: >> >>> On 16 Nov 2022, at 16:41, Eelco Chaudron wrote: >>> >>>> On 21 Oct 2022, at 18:35, Kevin Sprague wrote: >>>> >>>>> During normal operations, it is useful to understand when a particular >> flow >>>>> gets removed from the system. This can be useful when debugging >> performance >>>>> issues tied to ofproto flow changes, trying to determine deployed >> traffic >>>>> patterns, or while debugging dynamic systems where ports come and go. >>>>> >>>>> Prior to this change, there was a lack of visibility around flow >> expiration. >>>>> The existing debugging infrastructure could tell us when a flow was >> added to >>>>> the datapath, but not when it was removed or why. >>>>> >>>>> This change introduces a USDT probe at the point where the revalidator >>>>> determines that the flow should be removed. Additionally, we track the >>>>> reason for the flow eviction and provide that information as well. >> With >>>>> this change, we can track the complete flow lifecycle for the netlink >> datapath >>>>> by hooking the upcall tracepoint in kernel, the flow put USDT, and the >>>>> revaldiator USDT, letting us watch as flows are added and removed from >> the >>>>> kernel datapath. >>>>> >>>>> This change only enables this information via USDT probe, so it won't >> be >>>>> possible to access this information any other way (see: >>>>> Documentation/topics/usdt-probes.rst). >>>>> >>>>> Also included is a script >> (utilities/usdt-scripts/flow_reval_monitor.py) that >>>>> serves as a demonstration of how the new USDT probe might be used going >>>>> forward. >>>>> >>>>> Change since v5: fixed author information. >>>>> >>>>> Signed-off-by: Kevin Sprague <[email protected]> >>>> >>>> >>>> Hi Kevin, >>>> >>>> Most of the changes look fine to me, however, there are still a lot of >> crashes in the filter code. >>>> >>>> Also for now including the OVS data structures in the script will work >> for now. If we do not get a solution before this gets merged, I’ll fix up >> all the scripts that need this later. >>>> >>>> About the crash, it has to do with when we do not receive any uuid/key >> (I think I did not research). >>>> >>>> But if I start the script, and do the following (RHEL8): >>>> >>>> ovs-vsctl del-br br-int >>>> ovs-vsctl add-br br-int >>>> >>>> Now I get this: >>>> >>>> TIME UFID >> EVENT/REASON >>>> 5361884.255647616 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255689699 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255712148 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255734158 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255753341 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255772079 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255805591 ufid:2876428c-567e-429c-9dc3-0000d83503f1 Insert >> (put) flow to kernel. >>>> 5361884.255832007 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255852449 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255871090 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255889960 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255909455 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255928863 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> 5361884.255948291 ufid:00000000-0000-0000-0000-000000000000 Insert >> (put) flow to kernel. >>>> >>>> So a lot of all 0 ufid’s, did not investigate if this is true, or a >> script error. >>>> >>>> >>>> Now if I use the script with the -k option: >>>> >>>> $ ./flow_reval_monitor.py -k >>>> TIME UFID >> EVENT/REASON >>>> Traceback (most recent call last): >>>> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >>>> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >>>> ret = callback(ctx, data, size) >>>> File "./flow_reval_monitor.py", line 502, in handle_event >>>> handle_flow_put(event) >>>> File "./flow_reval_monitor.py", line 227, in handle_flow_put >>>> key = decode_key(bytes(event.key)[:event.key_size]) >>>> File "./flow_reval_monitor.py", line 328, in decode_key >>>> result[get_ovs_key_attr_str(nla_type)] = nla_data >>>> File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str >>>> return ovs_key_attr[attr] >>>> IndexError: list index out of range >>>> >>>> Same thing if I try to use a filter option: >>>> >>>> [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f >> ipv6 >>>> TIME UFID >> EVENT/REASON >>>> Traceback (most recent call last): >>>> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >>>> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >>>> ret = callback(ctx, data, size) >>>> File "./flow_reval_monitor.py", line 502, in handle_event >>>> handle_flow_put(event) >>>> File "./flow_reval_monitor.py", line 227, in handle_flow_put >>>> key = decode_key(bytes(event.key)[:event.key_size]) >>>> File "./flow_reval_monitor.py", line 328, in decode_key >>>> result[get_ovs_key_attr_str(nla_type)] = nla_data >>>> File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str >>>> return ovs_key_attr[attr] >>>> IndexError: list index out of range >>>> >>>> >>>> Also, an invalid filter would also crash. I think Adrian requested to >> report errors for the unknown/invalid filter option, i.e. do a sanity check >> of the supplied filter. >>>> >>>> $ ./flow_reval_monitor.py -f eelco >>>> TIME UFID >> EVENT/REASON >>>> Traceback (most recent call last): >>>> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >>>> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >>>> ret = callback(ctx, data, size) >>>> File "./flow_reval_monitor.py", line 502, in handle_event >>>> handle_flow_put(event) >>>> File "./flow_reval_monitor.py", line 232, in handle_flow_put >>>> find_and_delete_from_watchlist(event) >>>> File "./flow_reval_monitor.py", line 218, in >> find_and_delete_from_watchlist >>>> b["watchlist"].items_delete_batch(key) >>>> >>>> I guess you need to re-look at it, as even your example is crashing. >>>> >>>> $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)" >>>> TIME UFID >> EVENT/REASON >>>> Traceback (most recent call last): >>>> File "_ctypes/callbacks.c", line 234, in 'calling callback function' >>>> File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in >> ringbuf_cb_ >>>> ret = callback(ctx, data, size) >>>> File "./flow_reval_monitor.py", line 502, in handle_event >>>> handle_flow_put(event) >>>> File "./flow_reval_monitor.py", line 232, in handle_flow_put >>>> find_and_delete_from_watchlist(event) >>>> File "./flow_reval_monitor.py", line 218, in >> find_and_delete_from_watchlist >>>> b["watchlist"].items_delete_batch(key) >>>> AttributeError: 'HashTable' object has no attribute 'items_delete_batch' >>>> >>>> >>>> Maybe you should also add some details on what filter options are >> supported and the format as it gets me puzzled… >>>> >>>> Cheers, >>>> >>>> Eelco >>> >>> Hi Kevin, >>> >>> Did not see any update on this patch for a while, just want to let you >> know I sent out a v3 of the reval_stats patch, which include the pahole >> tool to create the OVS structures. >>> >>> In addition, I also noticed your events do not include a thread id, >> which I think is useful, so you might want to add this to (maybe the thread >> name which his more user friendly). >>> >>> Also when hw offload is used the flow add is not captured, maybe you can >> add another trace point that will trigger for TC (or one that will trigger >> for both)? >> >> Forgot to mention that you probably also want to add some checking to make >> sure you do not lose events. >> >> See __sync_fetch_and_add() below: >> >> >> +BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1); >> + >> +static struct event_t *get_event(uint32_t id) { >> + struct event_t *event = events.ringbuf_reserve(sizeof(struct >> event_t)); >> + >> + if (!event) { >> + uint32_t type = 0; >> + uint64_t *value = dropcnt.lookup(&type); >> + if (value) >> + __sync_fetch_and_add(value, 1); >> + >> + return NULL; >> + } >> + >> + event->id = id; >> + event->ts = bpf_ktime_get_ns(); >> + event->pid = bpf_get_current_pid_tgid(); >> + >> + return event; >> +} >> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
