On 14 May 2024, at 14:28, Ilya Maximets wrote:

> On 5/14/24 13:27, Eelco Chaudron wrote:
>>
>>
>> On 14 May 2024, at 13:05, Ilya Maximets wrote:
>>
>>> On 5/14/24 12:14, Adrian Moreno wrote:
>>>>
>>>>
>>>> On 5/14/24 11:09 AM, Ilya Maximets wrote:
>>>>> On 5/14/24 09:39, Adrian Moreno wrote:
>>>>>>
>>>>>>
>>>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>>>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>>>>>>>
>>>>>>>>> The new odp sample attributes allow userspace to specify a group_id 
>>>>>>>>> and
>>>>>>>>> user-defined cookie to be passed down to psample.
>>>>>>>>>
>>>>>>>>> Add support for parsing and formatting such action.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>>>>>>>>
>>>>>>>> Hi Adrian,
>>>>>>>>
>>>>>>>> See my comments below inline.
>>>>>>>>
>>>>>>>> //Eelco
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>    include/linux/openvswitch.h  |  49 +++++++++---
>>>>>>>>>    lib/odp-execute.c            |   3 +
>>>>>>>>>    lib/odp-util.c               | 150 
>>>>>>>>> ++++++++++++++++++++++++++---------
>>>>>>>>>    ofproto/ofproto-dpif-ipfix.c |   2 +
>>>>>>>>>    python/ovs/flow/odp.py       |   2 +
>>>>>>>>>    tests/odp.at                 |   5 ++
>>>>>>>>>    6 files changed, 163 insertions(+), 48 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>>>>>>>>> index d9fb991ef..3e748405c 100644
>>>>>>>>> --- a/include/linux/openvswitch.h
>>>>>>>>> +++ b/include/linux/openvswitch.h
>>>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr {
>>>>>>>>>    #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>>>>>>>>    #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>>>>>>>>
>>>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>>>>>>>>    /**
>>>>>>>>>     * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE 
>>>>>>>>> action.
>>>>>>>>>     * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to 
>>>>>>>>> sample with
>>>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr {
>>>>>>>>>     * %UINT32_MAX samples all packets and intermediate values sample 
>>>>>>>>> intermediate
>>>>>>>>>     * fractions of packets.
>>>>>>>>>     * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling 
>>>>>>>>> event.
>>>>>>>>> - * Actions are passed as nested attributes.
>>>>>>>>> + * Actions are passed as nested attributes. Optional if
>>>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as 
>>>>>>>>> psample group.
>>>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie 
>>>>>>>>> that, if
>>>>>>>>> + * provided, will be copied to the psample cookie.
>>>>>>>>
>>>>>>>> Should we mention any limit on the actual length of the cookie?
>>>>>>>>
>>>>>>>> Any reason we explicitly call this psample? From an OVS perspective, 
>>>>>>>> this
>>>>>>>> could just be
>>>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE
>>>>>>>>  and
>>>>>>>> _GROUP. Other datapaths, do not have PSAMPLE.
>>>>>>>>
>>>>>>>
>>>>>>> See my response to your comment on the cover-letter for possible name 
>>>>>>> alternatives.
>>>>>>>
>>>>>>>
>>>>>>>> Thinking about it more, from an OVS perspective we should probably not 
>>>>>>>> even
>>>>>>>> send down a COOKIE, but we should send down an obs_domain_id and 
>>>>>>>> obs_point_id,
>>>>>>>> and then the kernel can move this into a cookie.
>>>>>>>>
>>>>>>>
>>>>>>> I did consider this. However, the opaque cookie fits well with the tc 
>>>>>>> API which
>>>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if the 
>>>>>>> action
>>>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the 
>>>>>>> cookie anyways.
>>>>>>>
>>>>>>>> The collector itself could be called system/local collector, or 
>>>>>>>> something like
>>>>>>>> that. This way it can use for example psample for kernel and UDST 
>>>>>>>> probes for
>>>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and 
>>>>>>>> obs_point_id).
>>>>>>>>
>>>>>>>> Also, the presence of any of this should not dictate the psample 
>>>>>>>> action, we
>>>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag.
>>>>>>>>
>>>>>>>
>>>>>>> It clearer and it might simplify option verification a bit, but isn't a 
>>>>>>> bit
>>>>>>> redundant? There is no flag for action execution for instance, the 
>>>>>>> presence of
>>>>>>> the attribute is enough.
>>>>>>>
>>>>>>> An alternative would be to have nested attributes, e.g:
>>>>>>>
>>>>>>> enum ovs_sample_attr {
>>>>>>>       OVS_SAMPLE_ATTR_UNSPEC,
>>>>>>>       OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>>>>>>>       OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* 
>>>>>>> attributes. */
>>>>>>>       OVS_SAMPLE_ATTR_SYSTEM,     /* Nested OVS_SAMPLE_SYSTEM_ATTR_* 
>>>>>>> attributes. */
>>>>>>>
>>>>>>>       __OVS_SAMPLE_ATTR_MAX,
>>>>>>> };
>>>>>>>
>>>>>>> enum ovs_sample_system_attr {
>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN,  /* u32 number */
>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */
>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_COOKIE,     /* Nested OVS_ACTION_ATTR_*
>>>>>>>
>>>>>>>       __OVS_SSAMPLE_SYSTEM_ATTR_MAX,
>>>>>>> };
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>
>>>>>> Another benefit of nested attributes is the easier addition of other
>>>>>> sample-related attributes, I already have a candidate: 
>>>>>> OVS_SAMPLE_SYSTEM_TRUNC
>>>>>>
>>>>>> This will be passed to psample (md->trunc_size) who will truncate the 
>>>>>> packet
>>>>>> size to the specified value. This can be useful for large packets for 
>>>>>> which we
>>>>>> are only interested in the headers.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>
>>>>> Hrm.  This makes me think these should not be attributes of a sample 
>>>>> action at all,
>>>>> but rather we should have a separate EMIT_SAMPLE action that just calls 
>>>>> psample, so
>>>>> we can combine it with other actions:
>>>>>
>>>>>    sample(..., actions(userspace(...),trunc(100),emit_sample(cookie=...)))
>>>>>
>>>>> or even:
>>>>>
>>>>>    sample(..., 
>>>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...)))
>>>>>
>>>>> This will also simplify the requires_datapath_assistance() check as we 
>>>>> can just
>>>>> check the action and not iterate over the list of attributes.  And it 
>>>>> will be
>>>>> capable to only send this one action to the datapath and not the whole 
>>>>> thing.
>>>>>
>>>>
>>>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid using 
>>>> the
>>>> existing trunc implementation.
>>>>
>>>> Truncation is implemented inside psample so we would not be maintaining two
>>>> implementations.
>>>
>>> We will maintain trunc action and a trunc attribute for sampling, so we will
>>> have a duplication here.
>>>
>>>> psample implementation does not require packet clone (which is
>>>> needed for OVS trunc) plus it can be easily optimized out if there is no
>>>> listener in the multicast group (I plan to add that optimization in the 
>>>> next
>>>> kernel series).
>>>
>>> 'trunc' doesn't require a clone in case there are no other actions after, 
>>> which
>>> is the main use case for drop sampling.
>>>
>>> BTW, having explicit drop actions might have broken some of the assumptions
>>> around this optimization, but it should be fixable.
>>>
>>> And psample itself might need a clone() in case there was a trunc() before
>>> sampling.  psample will report skb->len as original packet size, but that
>>> is will not be correct as we have a postponed cutlen, so the reported length
>>> must be the length after the cut and that can't be done without actually
>>> cutting the skb at this point.
>>>
>>>>
>>>> Having said that, I'm not against adding a new action if that simplifies 
>>>> things.
>>>> I think it might simplify things in ovs-vswitchd and even in the kernel.
>>>> However, I have one concern.
>>>>
>>>> IIUC and based on your examples, your suggestions means we would not have
>>>> sampling rate inside emit_sample, right? This might be a problem because 
>>>> psample
>>>> does have sampling rate which I believe to be crucial to balance 
>>>> performance
>>>> impact and accuracy.
>>>
>>> psample doesn't limit the rate, it just reports a number.  Rate is part of
>>> the main sample() action, i.e.
>>>
>>>   sample(sample=75%,actions(emit_sample(cookie=...)))
>>>
>>> Having rate also in the 'emit_sample' would be another API duplication.
>>
>> I agree, but this is the confusing part :) If you use the psample TC action, 
>> it will have the rate-limiting as part of the action.
>>
>> So if we offload the above, we will have a nested tc psample action, the 
>> outer one with the desired sample rate, and the inner one with a 100% sample 
>> rate. Actually, the inner one needs to be handled by sw as it will go to 
>> vswitchd (and than back to psample?). I guess this avoids the idea of doing 
>> this fast… Maybe if emit_sample() is the only action we can do some 
>> optimisation, but it needs to be translatable back and forth from TC to 
>> OVS_ATTRs.
>>
>> I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by the 
>> way Nvidia was going to work on once the sFlow part was accepted.
>
> The actual sample() action can not be offloaded in a generic case.
> There is no smaple action in TC that would accept arbitrary action
> list.  We can only offload a few corner cases.  Today we offload
> none.  With the sFlow series we can offload a single specific
> OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW.
>
> And we can't offload more than one sampling method within a single
> OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent
> probabilities, which is not a desired behavior.
>
> So, we'll always have special case offloading for each particular
> action inside the sample().

ACK, just wanted to clarify that we should take extra care when we do this. And 
we should build on top of the sflow series :)

Not sure about the naming, emit_sample, it’s not really a sample in the case we 
are not sampling anything, the action is always taken. Maybe  
OVS_ACTION_ATTR_LOCAL_COPY? But this comes close to OVS_ACTION_ATTR_USERSPACE?

>>
>>> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not
>>> adding a new table for this new smapling method, i.e. it will not be
>>> possible to set sampling rate for psample with the current database
>>> configuration.
>>>
>>>> This value must be present in the psample sample or else
>>>> users will not know how inaccurate the sample is.
>>>
>>> Users can check the sampling configuration in the database if we will
>>> have one.  And they are configuring it in the first place.  And it will
>>> be confusing to see two rates in the same action list.
>>>
>>> It might be possible to carry the current rate in the OVS_CB(skb) and
>>> pass it to the psample.
>>>
>>> In any case, the rate reported by psample will not be accurate if there
>>> are nested sample actions, unless we carry and properly multiply / divide
>>> probabilities when entering / exiting the sample action.
>>>
>>>>
>>>>
>>>>> Naming is still hard though.  I like the word "emit" because it doesn't 
>>>>> have a
>>>>> concept of directionality, i.e. we're not sending it to somewhere, we're 
>>>>> just
>>>>> emitting it to the wild.  That kind of matches with our vague definition 
>>>>> of a
>>>>> "local" or "system" sampling.
>>>>>
>>>>
>>>> I like the word "emit".
>>>>
>>>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. 
>>>>> output(sample, cookie=...).
>>>>>
>>>>> The main concern is that we already have truncation implemented, so we 
>>>>> can be more
>>>>> flexible in a way we generate actions instead of adding new attributes.
>>>>>
>>>>> Thoughts?
>>>>
>>>> See my comment above. I am not suggesting we use the current trunc
>>>> implementation nor to add a new one, just to use psample's.
>>>>
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to