On 20 Dec 2022, at 15:21, Eelco Chaudron wrote:

> On 16 Nov 2022, at 16:41, Eelco Chaudron wrote:
>
>> On 21 Oct 2022, at 18:35, Kevin Sprague wrote:
>>
>>> During normal operations, it is useful to understand when a particular flow
>>> gets removed from the system. This can be useful when debugging performance
>>> issues tied to ofproto flow changes, trying to determine deployed traffic
>>> patterns, or while debugging dynamic systems where ports come and go.
>>>
>>> Prior to this change, there was a lack of visibility around flow expiration.
>>> The existing debugging infrastructure could tell us when a flow was added to
>>> the datapath, but not when it was removed or why.
>>>
>>> This change introduces a USDT probe at the point where the revalidator
>>> determines that the flow should be removed.  Additionally, we track the
>>> reason for the flow eviction and provide that information as well.  With
>>> this change, we can track the complete flow lifecycle for the netlink 
>>> datapath
>>> by hooking the upcall tracepoint in kernel, the flow put USDT, and the
>>> revaldiator USDT, letting us watch as flows are added and removed from the
>>> kernel datapath.
>>>
>>> This change only enables this information via USDT probe, so it won't be
>>> possible to access this information any other way (see:
>>> Documentation/topics/usdt-probes.rst).
>>>
>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py) 
>>> that
>>> serves as a demonstration of how the new USDT probe might be used going
>>> forward.
>>>
>>> Change since v5: fixed author information.
>>>
>>> Signed-off-by: Kevin Sprague <[email protected]>
>>
>>
>> Hi Kevin,
>>
>> Most of the changes look fine to me, however, there are still a lot of 
>> crashes in the filter code.
>>
>> Also for now including the OVS data structures in the script will work for 
>> now. If we do not get a solution before this gets merged, I’ll fix up all 
>> the scripts that need this later.
>>
>> About the crash, it has to do with when we do not receive any uuid/key (I 
>> think I did not research).
>>
>> But if I start the script, and do the following (RHEL8):
>>
>>   ovs-vsctl del-br br-int
>>   ovs-vsctl add-br br-int
>>
>> Now I get this:
>>
>> TIME               UFID                                          EVENT/REASON
>> 5361884.255647616  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255689699  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255712148  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255734158  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255753341  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255772079  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255805591  ufid:2876428c-567e-429c-9dc3-0000d83503f1     Insert 
>> (put) flow to kernel.
>> 5361884.255832007  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255852449  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255871090  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255889960  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255909455  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255928863  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>> 5361884.255948291  ufid:00000000-0000-0000-0000-000000000000     Insert 
>> (put) flow to kernel.
>>
>> So a lot of all 0 ufid’s, did not investigate if this is true, or a script 
>> error.
>>
>>
>> Now if I use the script with the -k option:
>>
>> $ ./flow_reval_monitor.py -k
>> TIME               UFID                                          EVENT/REASON
>> Traceback (most recent call last):
>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
>> ringbuf_cb_
>>     ret = callback(ctx, data, size)
>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>     handle_flow_put(event)
>>   File "./flow_reval_monitor.py", line 227, in handle_flow_put
>>     key = decode_key(bytes(event.key)[:event.key_size])
>>   File "./flow_reval_monitor.py", line 328, in decode_key
>>     result[get_ovs_key_attr_str(nla_type)] = nla_data
>>   File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
>>     return ovs_key_attr[attr]
>> IndexError: list index out of range
>>
>> Same thing if I try to use a filter option:
>>
>> [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f ipv6
>> TIME               UFID                                          EVENT/REASON
>> Traceback (most recent call last):
>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
>> ringbuf_cb_
>>     ret = callback(ctx, data, size)
>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>     handle_flow_put(event)
>>   File "./flow_reval_monitor.py", line 227, in handle_flow_put
>>     key = decode_key(bytes(event.key)[:event.key_size])
>>   File "./flow_reval_monitor.py", line 328, in decode_key
>>     result[get_ovs_key_attr_str(nla_type)] = nla_data
>>   File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
>>     return ovs_key_attr[attr]
>> IndexError: list index out of range
>>
>>
>> Also, an invalid filter would also crash. I think Adrian requested to report 
>> errors for the unknown/invalid filter option, i.e. do a sanity check of the 
>> supplied filter.
>>
>> $ ./flow_reval_monitor.py -f eelco
>> TIME               UFID                                          EVENT/REASON
>> Traceback (most recent call last):
>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
>> ringbuf_cb_
>>     ret = callback(ctx, data, size)
>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>     handle_flow_put(event)
>>   File "./flow_reval_monitor.py", line 232, in handle_flow_put
>>     find_and_delete_from_watchlist(event)
>>   File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist
>>     b["watchlist"].items_delete_batch(key)
>>
>> I guess you need to re-look at it, as even your example is crashing.
>>
>> $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)"
>> TIME               UFID                                          EVENT/REASON
>> Traceback (most recent call last):
>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
>> ringbuf_cb_
>>     ret = callback(ctx, data, size)
>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>     handle_flow_put(event)
>>   File "./flow_reval_monitor.py", line 232, in handle_flow_put
>>     find_and_delete_from_watchlist(event)
>>   File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist
>>     b["watchlist"].items_delete_batch(key)
>> AttributeError: 'HashTable' object has no attribute 'items_delete_batch'
>>
>>
>> Maybe you should also add some details on what filter options are supported 
>> and the format as it gets me puzzled…
>>
>> Cheers,
>>
>> Eelco
>
> Hi Kevin,
>
> Did not see any update on this patch for a while, just want to let you know I 
> sent out a v3 of the reval_stats patch, which include the pahole tool to 
> create the OVS structures.
>
> In addition, I also noticed your events do not include a thread id, which I 
> think is useful, so you might want to add this to (maybe the thread name 
> which his more user friendly).
>
> Also when hw offload is used the flow add is not captured, maybe you can add 
> another trace point that will trigger for TC (or one that will trigger for 
> both)?

Forgot to mention that you probably also want to add some checking to make sure 
you do not lose events.

See __sync_fetch_and_add() below:


+BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1);
+
+static struct event_t *get_event(uint32_t id) {
+    struct event_t *event = events.ringbuf_reserve(sizeof(struct event_t));
+
+    if (!event) {
+        uint32_t type = 0;
+        uint64_t *value = dropcnt.lookup(&type);
+        if (value)
+            __sync_fetch_and_add(value, 1);
+
+        return NULL;
+    }
+
+    event->id = id;
+    event->ts = bpf_ktime_get_ns();
+    event->pid = bpf_get_current_pid_tgid();
+
+    return event;
+}

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to