On 23 Dec 2022, at 18:02, kevin sprague wrote:

> Hi Eelco,
>
> I've been busy finishing up school, I should have some time to sit down
> with it in the next week or so.

Hi Kevin,

Are you still planning on looking at this? I know it’s more than a year ago you 
posted this, but I still think this should go into OVS. So if you are not 
planning this, I’ll find someone who can work on this.

Thanks,

Eelco


> Thanks,
> Kevin Sprague
>
> On Tue, Dec 20, 2022, 9:34 AM Eelco Chaudron <[email protected]> wrote:
>
>>
>>
>> On 20 Dec 2022, at 15:21, Eelco Chaudron wrote:
>>
>>> On 16 Nov 2022, at 16:41, Eelco Chaudron wrote:
>>>
>>>> On 21 Oct 2022, at 18:35, Kevin Sprague wrote:
>>>>
>>>>> During normal operations, it is useful to understand when a particular
>> flow
>>>>> gets removed from the system. This can be useful when debugging
>> performance
>>>>> issues tied to ofproto flow changes, trying to determine deployed
>> traffic
>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>
>>>>> Prior to this change, there was a lack of visibility around flow
>> expiration.
>>>>> The existing debugging infrastructure could tell us when a flow was
>> added to
>>>>> the datapath, but not when it was removed or why.
>>>>>
>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>> determines that the flow should be removed.  Additionally, we track the
>>>>> reason for the flow eviction and provide that information as well.
>> With
>>>>> this change, we can track the complete flow lifecycle for the netlink
>> datapath
>>>>> by hooking the upcall tracepoint in kernel, the flow put USDT, and the
>>>>> revaldiator USDT, letting us watch as flows are added and removed from
>> the
>>>>> kernel datapath.
>>>>>
>>>>> This change only enables this information via USDT probe, so it won't
>> be
>>>>> possible to access this information any other way (see:
>>>>> Documentation/topics/usdt-probes.rst).
>>>>>
>>>>> Also included is a script
>> (utilities/usdt-scripts/flow_reval_monitor.py) that
>>>>> serves as a demonstration of how the new USDT probe might be used going
>>>>> forward.
>>>>>
>>>>> Change since v5: fixed author information.
>>>>>
>>>>> Signed-off-by: Kevin Sprague <[email protected]>
>>>>
>>>>
>>>> Hi Kevin,
>>>>
>>>> Most of the changes look fine to me, however, there are still a lot of
>> crashes in the filter code.
>>>>
>>>> Also for now including the OVS data structures in the script will work
>> for now. If we do not get a solution before this gets merged, I’ll fix up
>> all the scripts that need this later.
>>>>
>>>> About the crash, it has to do with when we do not receive any uuid/key
>> (I think I did not research).
>>>>
>>>> But if I start the script, and do the following (RHEL8):
>>>>
>>>>   ovs-vsctl del-br br-int
>>>>   ovs-vsctl add-br br-int
>>>>
>>>> Now I get this:
>>>>
>>>> TIME               UFID
>> EVENT/REASON
>>>> 5361884.255647616  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255689699  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255712148  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255734158  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255753341  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255772079  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255805591  ufid:2876428c-567e-429c-9dc3-0000d83503f1     Insert
>> (put) flow to kernel.
>>>> 5361884.255832007  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255852449  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255871090  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255889960  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255909455  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255928863  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>> 5361884.255948291  ufid:00000000-0000-0000-0000-000000000000     Insert
>> (put) flow to kernel.
>>>>
>>>> So a lot of all 0 ufid’s, did not investigate if this is true, or a
>> script error.
>>>>
>>>>
>>>> Now if I use the script with the -k option:
>>>>
>>>> $ ./flow_reval_monitor.py -k
>>>> TIME               UFID
>> EVENT/REASON
>>>> Traceback (most recent call last):
>>>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in
>> ringbuf_cb_
>>>>     ret = callback(ctx, data, size)
>>>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>>>     handle_flow_put(event)
>>>>   File "./flow_reval_monitor.py", line 227, in handle_flow_put
>>>>     key = decode_key(bytes(event.key)[:event.key_size])
>>>>   File "./flow_reval_monitor.py", line 328, in decode_key
>>>>     result[get_ovs_key_attr_str(nla_type)] = nla_data
>>>>   File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
>>>>     return ovs_key_attr[attr]
>>>> IndexError: list index out of range
>>>>
>>>> Same thing if I try to use a filter option:
>>>>
>>>> [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f
>> ipv6
>>>> TIME               UFID
>> EVENT/REASON
>>>> Traceback (most recent call last):
>>>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in
>> ringbuf_cb_
>>>>     ret = callback(ctx, data, size)
>>>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>>>     handle_flow_put(event)
>>>>   File "./flow_reval_monitor.py", line 227, in handle_flow_put
>>>>     key = decode_key(bytes(event.key)[:event.key_size])
>>>>   File "./flow_reval_monitor.py", line 328, in decode_key
>>>>     result[get_ovs_key_attr_str(nla_type)] = nla_data
>>>>   File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
>>>>     return ovs_key_attr[attr]
>>>> IndexError: list index out of range
>>>>
>>>>
>>>> Also, an invalid filter would also crash. I think Adrian requested to
>> report errors for the unknown/invalid filter option, i.e. do a sanity check
>> of the supplied filter.
>>>>
>>>> $ ./flow_reval_monitor.py -f eelco
>>>> TIME               UFID
>> EVENT/REASON
>>>> Traceback (most recent call last):
>>>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in
>> ringbuf_cb_
>>>>     ret = callback(ctx, data, size)
>>>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>>>     handle_flow_put(event)
>>>>   File "./flow_reval_monitor.py", line 232, in handle_flow_put
>>>>     find_and_delete_from_watchlist(event)
>>>>   File "./flow_reval_monitor.py", line 218, in
>> find_and_delete_from_watchlist
>>>>     b["watchlist"].items_delete_batch(key)
>>>>
>>>> I guess you need to re-look at it, as even your example is crashing.
>>>>
>>>> $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)"
>>>> TIME               UFID
>> EVENT/REASON
>>>> Traceback (most recent call last):
>>>>   File "_ctypes/callbacks.c", line 234, in 'calling callback function'
>>>>   File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in
>> ringbuf_cb_
>>>>     ret = callback(ctx, data, size)
>>>>   File "./flow_reval_monitor.py", line 502, in handle_event
>>>>     handle_flow_put(event)
>>>>   File "./flow_reval_monitor.py", line 232, in handle_flow_put
>>>>     find_and_delete_from_watchlist(event)
>>>>   File "./flow_reval_monitor.py", line 218, in
>> find_and_delete_from_watchlist
>>>>     b["watchlist"].items_delete_batch(key)
>>>> AttributeError: 'HashTable' object has no attribute 'items_delete_batch'
>>>>
>>>>
>>>> Maybe you should also add some details on what filter options are
>> supported and the format as it gets me puzzled…
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>>
>>> Hi Kevin,
>>>
>>> Did not see any update on this patch for a while, just want to let you
>> know I sent out a v3 of the reval_stats patch, which include the pahole
>> tool to create the OVS structures.
>>>
>>> In addition, I also noticed your events do not include a thread id,
>> which I think is useful, so you might want to add this to (maybe the thread
>> name which his more user friendly).
>>>
>>> Also when hw offload is used the flow add is not captured, maybe you can
>> add another trace point that will trigger for TC (or one that will trigger
>> for both)?
>>
>> Forgot to mention that you probably also want to add some checking to make
>> sure you do not lose events.
>>
>> See __sync_fetch_and_add() below:
>>
>>
>> +BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1);
>> +
>> +static struct event_t *get_event(uint32_t id) {
>> +    struct event_t *event = events.ringbuf_reserve(sizeof(struct
>> event_t));
>> +
>> +    if (!event) {
>> +        uint32_t type = 0;
>> +        uint64_t *value = dropcnt.lookup(&type);
>> +        if (value)
>> +            __sync_fetch_and_add(value, 1);
>> +
>> +        return NULL;
>> +    }
>> +
>> +    event->id = id;
>> +    event->ts = bpf_ktime_get_ns();
>> +    event->pid = bpf_get_current_pid_tgid();
>> +
>> +    return event;
>> +}
>>
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to