On 16 Nov 2022, at 16:41, Eelco Chaudron wrote:
> On 21 Oct 2022, at 18:35, Kevin Sprague wrote: > >> During normal operations, it is useful to understand when a particular flow >> gets removed from the system. This can be useful when debugging performance >> issues tied to ofproto flow changes, trying to determine deployed traffic >> patterns, or while debugging dynamic systems where ports come and go. >> >> Prior to this change, there was a lack of visibility around flow expiration. >> The existing debugging infrastructure could tell us when a flow was added to >> the datapath, but not when it was removed or why. >> >> This change introduces a USDT probe at the point where the revalidator >> determines that the flow should be removed. Additionally, we track the >> reason for the flow eviction and provide that information as well. With >> this change, we can track the complete flow lifecycle for the netlink >> datapath >> by hooking the upcall tracepoint in kernel, the flow put USDT, and the >> revaldiator USDT, letting us watch as flows are added and removed from the >> kernel datapath. >> >> This change only enables this information via USDT probe, so it won't be >> possible to access this information any other way (see: >> Documentation/topics/usdt-probes.rst). >> >> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py) that >> serves as a demonstration of how the new USDT probe might be used going >> forward. >> >> Change since v5: fixed author information. >> >> Signed-off-by: Kevin Sprague <[email protected]> > > > Hi Kevin, > > Most of the changes look fine to me, however, there are still a lot of > crashes in the filter code. > > Also for now including the OVS data structures in the script will work for > now. If we do not get a solution before this gets merged, I’ll fix up all the > scripts that need this later. > > About the crash, it has to do with when we do not receive any uuid/key (I > think I did not research). > > But if I start the script, and do the following (RHEL8): > > ovs-vsctl del-br br-int > ovs-vsctl add-br br-int > > Now I get this: > > TIME UFID EVENT/REASON > 5361884.255647616 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255689699 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255712148 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255734158 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255753341 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255772079 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255805591 ufid:2876428c-567e-429c-9dc3-0000d83503f1 Insert (put) > flow to kernel. > 5361884.255832007 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255852449 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255871090 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255889960 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255909455 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255928863 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > 5361884.255948291 ufid:00000000-0000-0000-0000-000000000000 Insert (put) > flow to kernel. > > So a lot of all 0 ufid’s, did not investigate if this is true, or a script > error. > > > Now if I use the script with the -k option: > > $ ./flow_reval_monitor.py -k > TIME UFID EVENT/REASON > Traceback (most recent call last): > File "_ctypes/callbacks.c", line 234, in 'calling callback function' > File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in > ringbuf_cb_ > ret = callback(ctx, data, size) > File "./flow_reval_monitor.py", line 502, in handle_event > handle_flow_put(event) > File "./flow_reval_monitor.py", line 227, in handle_flow_put > key = decode_key(bytes(event.key)[:event.key_size]) > File "./flow_reval_monitor.py", line 328, in decode_key > result[get_ovs_key_attr_str(nla_type)] = nla_data > File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str > return ovs_key_attr[attr] > IndexError: list index out of range > > Same thing if I try to use a filter option: > > [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f ipv6 > TIME UFID EVENT/REASON > Traceback (most recent call last): > File "_ctypes/callbacks.c", line 234, in 'calling callback function' > File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in > ringbuf_cb_ > ret = callback(ctx, data, size) > File "./flow_reval_monitor.py", line 502, in handle_event > handle_flow_put(event) > File "./flow_reval_monitor.py", line 227, in handle_flow_put > key = decode_key(bytes(event.key)[:event.key_size]) > File "./flow_reval_monitor.py", line 328, in decode_key > result[get_ovs_key_attr_str(nla_type)] = nla_data > File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str > return ovs_key_attr[attr] > IndexError: list index out of range > > > Also, an invalid filter would also crash. I think Adrian requested to report > errors for the unknown/invalid filter option, i.e. do a sanity check of the > supplied filter. > > $ ./flow_reval_monitor.py -f eelco > TIME UFID EVENT/REASON > Traceback (most recent call last): > File "_ctypes/callbacks.c", line 234, in 'calling callback function' > File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in > ringbuf_cb_ > ret = callback(ctx, data, size) > File "./flow_reval_monitor.py", line 502, in handle_event > handle_flow_put(event) > File "./flow_reval_monitor.py", line 232, in handle_flow_put > find_and_delete_from_watchlist(event) > File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist > b["watchlist"].items_delete_batch(key) > > I guess you need to re-look at it, as even your example is crashing. > > $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)" > TIME UFID EVENT/REASON > Traceback (most recent call last): > File "_ctypes/callbacks.c", line 234, in 'calling callback function' > File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in > ringbuf_cb_ > ret = callback(ctx, data, size) > File "./flow_reval_monitor.py", line 502, in handle_event > handle_flow_put(event) > File "./flow_reval_monitor.py", line 232, in handle_flow_put > find_and_delete_from_watchlist(event) > File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist > b["watchlist"].items_delete_batch(key) > AttributeError: 'HashTable' object has no attribute 'items_delete_batch' > > > Maybe you should also add some details on what filter options are supported > and the format as it gets me puzzled… > > Cheers, > > Eelco Hi Kevin, Did not see any update on this patch for a while, just want to let you know I sent out a v3 of the reval_stats patch, which include the pahole tool to create the OVS structures. In addition, I also noticed your events do not include a thread id, which I think is useful, so you might want to add this to (maybe the thread name which his more user friendly). Also when hw offload is used the flow add is not captured, maybe you can add another trace point that will trigger for TC (or one that will trigger for both)? Cheers, Eelco _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
