On 16 Nov 2022, at 16:41, Eelco Chaudron wrote:

> On 21 Oct 2022, at 18:35, Kevin Sprague wrote:
>
>> During normal operations, it is useful to understand when a particular flow
>> gets removed from the system. This can be useful when debugging performance
>> issues tied to ofproto flow changes, trying to determine deployed traffic
>> patterns, or while debugging dynamic systems where ports come and go.
>>
>> Prior to this change, there was a lack of visibility around flow expiration.
>> The existing debugging infrastructure could tell us when a flow was added to
>> the datapath, but not when it was removed or why.
>>
>> This change introduces a USDT probe at the point where the revalidator
>> determines that the flow should be removed.  Additionally, we track the
>> reason for the flow eviction and provide that information as well.  With
>> this change, we can track the complete flow lifecycle for the netlink 
>> datapath
>> by hooking the upcall tracepoint in kernel, the flow put USDT, and the
>> revaldiator USDT, letting us watch as flows are added and removed from the
>> kernel datapath.
>>
>> This change only enables this information via USDT probe, so it won't be
>> possible to access this information any other way (see:
>> Documentation/topics/usdt-probes.rst).
>>
>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py) that
>> serves as a demonstration of how the new USDT probe might be used going
>> forward.
>>
>> Change since v5: fixed author information.
>>
>> Signed-off-by: Kevin Sprague <[email protected]>
>
>
> Hi Kevin,
>
> Most of the changes look fine to me, however, there are still a lot of 
> crashes in the filter code.
>
> Also for now including the OVS data structures in the script will work for 
> now. If we do not get a solution before this gets merged, I’ll fix up all the 
> scripts that need this later.
>
> About the crash, it has to do with when we do not receive any uuid/key (I 
> think I did not research).
>
> But if I start the script, and do the following (RHEL8):
>
>   ovs-vsctl del-br br-int
>   ovs-vsctl add-br br-int
>
> Now I get this:
>
> TIME               UFID                                          EVENT/REASON
> 5361884.255647616  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255689699  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255712148  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255734158  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255753341  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255772079  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255805591  ufid:2876428c-567e-429c-9dc3-0000d83503f1     Insert (put) 
> flow to kernel.
> 5361884.255832007  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255852449  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255871090  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255889960  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255909455  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255928863  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
> 5361884.255948291  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
> flow to kernel.
>
> So a lot of all 0 ufid’s, did not investigate if this is true, or a script 
> error.
>
>
> Now if I use the script with the -k option:
>
> $ ./flow_reval_monitor.py -k
> TIME               UFID                                          EVENT/REASON
> Traceback (most recent call last):
>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
> ringbuf_cb_
>     ret = callback(ctx, data, size)
>   File "./flow_reval_monitor.py", line 502, in handle_event
>     handle_flow_put(event)
>   File "./flow_reval_monitor.py", line 227, in handle_flow_put
>     key = decode_key(bytes(event.key)[:event.key_size])
>   File "./flow_reval_monitor.py", line 328, in decode_key
>     result[get_ovs_key_attr_str(nla_type)] = nla_data
>   File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
>     return ovs_key_attr[attr]
> IndexError: list index out of range
>
> Same thing if I try to use a filter option:
>
> [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f ipv6
> TIME               UFID                                          EVENT/REASON
> Traceback (most recent call last):
>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
> ringbuf_cb_
>     ret = callback(ctx, data, size)
>   File "./flow_reval_monitor.py", line 502, in handle_event
>     handle_flow_put(event)
>   File "./flow_reval_monitor.py", line 227, in handle_flow_put
>     key = decode_key(bytes(event.key)[:event.key_size])
>   File "./flow_reval_monitor.py", line 328, in decode_key
>     result[get_ovs_key_attr_str(nla_type)] = nla_data
>   File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
>     return ovs_key_attr[attr]
> IndexError: list index out of range
>
>
> Also, an invalid filter would also crash. I think Adrian requested to report 
> errors for the unknown/invalid filter option, i.e. do a sanity check of the 
> supplied filter.
>
> $ ./flow_reval_monitor.py -f eelco
> TIME               UFID                                          EVENT/REASON
> Traceback (most recent call last):
>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
> ringbuf_cb_
>     ret = callback(ctx, data, size)
>   File "./flow_reval_monitor.py", line 502, in handle_event
>     handle_flow_put(event)
>   File "./flow_reval_monitor.py", line 232, in handle_flow_put
>     find_and_delete_from_watchlist(event)
>   File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist
>     b["watchlist"].items_delete_batch(key)
>
> I guess you need to re-look at it, as even your example is crashing.
>
> $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)"
> TIME               UFID                                          EVENT/REASON
> Traceback (most recent call last):
>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
> ringbuf_cb_
>     ret = callback(ctx, data, size)
>   File "./flow_reval_monitor.py", line 502, in handle_event
>     handle_flow_put(event)
>   File "./flow_reval_monitor.py", line 232, in handle_flow_put
>     find_and_delete_from_watchlist(event)
>   File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist
>     b["watchlist"].items_delete_batch(key)
> AttributeError: 'HashTable' object has no attribute 'items_delete_batch'
>
>
> Maybe you should also add some details on what filter options are supported 
> and the format as it gets me puzzled…
>
> Cheers,
>
> Eelco

Hi Kevin,

Did not see any update on this patch for a while, just want to let you know I 
sent out a v3 of the reval_stats patch, which include the pahole tool to create 
the OVS structures.

In addition, I also noticed your events do not include a thread id, which I 
think is useful, so you might want to add this to (maybe the thread name which 
his more user friendly).

Also when hw offload is used the flow add is not captured, maybe you can add 
another trace point that will trigger for TC (or one that will trigger for 
both)?

Cheers,

Eelco

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to