On 5/14/24 14:43, Eelco Chaudron wrote:
> 
> 
> On 14 May 2024, at 14:28, Ilya Maximets wrote:
> 
>> On 5/14/24 13:27, Eelco Chaudron wrote:
>>>
>>>
>>> On 14 May 2024, at 13:05, Ilya Maximets wrote:
>>>
>>>> On 5/14/24 12:14, Adrian Moreno wrote:
>>>>>
>>>>>
>>>>> On 5/14/24 11:09 AM, Ilya Maximets wrote:
>>>>>> On 5/14/24 09:39, Adrian Moreno wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>>>>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>>>>>>>>
>>>>>>>>>> The new odp sample attributes allow userspace to specify a group_id 
>>>>>>>>>> and
>>>>>>>>>> user-defined cookie to be passed down to psample.
>>>>>>>>>>
>>>>>>>>>> Add support for parsing and formatting such action.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>>>>>>>>>
>>>>>>>>> Hi Adrian,
>>>>>>>>>
>>>>>>>>> See my comments below inline.
>>>>>>>>>
>>>>>>>>> //Eelco
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>    include/linux/openvswitch.h  |  49 +++++++++---
>>>>>>>>>>    lib/odp-execute.c            |   3 +
>>>>>>>>>>    lib/odp-util.c               | 150 
>>>>>>>>>> ++++++++++++++++++++++++++---------
>>>>>>>>>>    ofproto/ofproto-dpif-ipfix.c |   2 +
>>>>>>>>>>    python/ovs/flow/odp.py       |   2 +
>>>>>>>>>>    tests/odp.at                 |   5 ++
>>>>>>>>>>    6 files changed, 163 insertions(+), 48 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/openvswitch.h 
>>>>>>>>>> b/include/linux/openvswitch.h
>>>>>>>>>> index d9fb991ef..3e748405c 100644
>>>>>>>>>> --- a/include/linux/openvswitch.h
>>>>>>>>>> +++ b/include/linux/openvswitch.h
>>>>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr {
>>>>>>>>>>    #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>>>>>>>>>    #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>>>>>>>>>
>>>>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>>>>>>>>>    /**
>>>>>>>>>>     * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE 
>>>>>>>>>> action.
>>>>>>>>>>     * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to 
>>>>>>>>>> sample with
>>>>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr {
>>>>>>>>>>     * %UINT32_MAX samples all packets and intermediate values sample 
>>>>>>>>>> intermediate
>>>>>>>>>>     * fractions of packets.
>>>>>>>>>>     * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in 
>>>>>>>>>> sampling event.
>>>>>>>>>> - * Actions are passed as nested attributes.
>>>>>>>>>> + * Actions are passed as nested attributes. Optional if
>>>>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as 
>>>>>>>>>> psample group.
>>>>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie 
>>>>>>>>>> that, if
>>>>>>>>>> + * provided, will be copied to the psample cookie.
>>>>>>>>>
>>>>>>>>> Should we mention any limit on the actual length of the cookie?
>>>>>>>>>
>>>>>>>>> Any reason we explicitly call this psample? From an OVS perspective, 
>>>>>>>>> this
>>>>>>>>> could just be
>>>>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE
>>>>>>>>>  and
>>>>>>>>> _GROUP. Other datapaths, do not have PSAMPLE.
>>>>>>>>>
>>>>>>>>
>>>>>>>> See my response to your comment on the cover-letter for possible name 
>>>>>>>> alternatives.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thinking about it more, from an OVS perspective we should probably 
>>>>>>>>> not even
>>>>>>>>> send down a COOKIE, but we should send down an obs_domain_id and 
>>>>>>>>> obs_point_id,
>>>>>>>>> and then the kernel can move this into a cookie.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I did consider this. However, the opaque cookie fits well with the tc 
>>>>>>>> API which
>>>>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if 
>>>>>>>> the action
>>>>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the 
>>>>>>>> cookie anyways.
>>>>>>>>
>>>>>>>>> The collector itself could be called system/local collector, or 
>>>>>>>>> something like
>>>>>>>>> that. This way it can use for example psample for kernel and UDST 
>>>>>>>>> probes for
>>>>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and 
>>>>>>>>> obs_point_id).
>>>>>>>>>
>>>>>>>>> Also, the presence of any of this should not dictate the psample 
>>>>>>>>> action, we
>>>>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It clearer and it might simplify option verification a bit, but isn't 
>>>>>>>> a bit
>>>>>>>> redundant? There is no flag for action execution for instance, the 
>>>>>>>> presence of
>>>>>>>> the attribute is enough.
>>>>>>>>
>>>>>>>> An alternative would be to have nested attributes, e.g:
>>>>>>>>
>>>>>>>> enum ovs_sample_attr {
>>>>>>>>       OVS_SAMPLE_ATTR_UNSPEC,
>>>>>>>>       OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>>>>>>>>       OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* 
>>>>>>>> attributes. */
>>>>>>>>       OVS_SAMPLE_ATTR_SYSTEM,     /* Nested OVS_SAMPLE_SYSTEM_ATTR_* 
>>>>>>>> attributes. */
>>>>>>>>
>>>>>>>>       __OVS_SAMPLE_ATTR_MAX,
>>>>>>>> };
>>>>>>>>
>>>>>>>> enum ovs_sample_system_attr {
>>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN,  /* u32 number */
>>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */
>>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_COOKIE,     /* Nested OVS_ACTION_ATTR_*
>>>>>>>>
>>>>>>>>       __OVS_SSAMPLE_SYSTEM_ATTR_MAX,
>>>>>>>> };
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>
>>>>>>> Another benefit of nested attributes is the easier addition of other
>>>>>>> sample-related attributes, I already have a candidate: 
>>>>>>> OVS_SAMPLE_SYSTEM_TRUNC
>>>>>>>
>>>>>>> This will be passed to psample (md->trunc_size) who will truncate the 
>>>>>>> packet
>>>>>>> size to the specified value. This can be useful for large packets for 
>>>>>>> which we
>>>>>>> are only interested in the headers.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> [snip]
>>>>>>>
>>>>>>
>>>>>> Hrm.  This makes me think these should not be attributes of a sample 
>>>>>> action at all,
>>>>>> but rather we should have a separate EMIT_SAMPLE action that just calls 
>>>>>> psample, so
>>>>>> we can combine it with other actions:
>>>>>>
>>>>>>    sample(..., 
>>>>>> actions(userspace(...),trunc(100),emit_sample(cookie=...)))
>>>>>>
>>>>>> or even:
>>>>>>
>>>>>>    sample(..., 
>>>>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...)))
>>>>>>
>>>>>> This will also simplify the requires_datapath_assistance() check as we 
>>>>>> can just
>>>>>> check the action and not iterate over the list of attributes.  And it 
>>>>>> will be
>>>>>> capable to only send this one action to the datapath and not the whole 
>>>>>> thing.
>>>>>>
>>>>>
>>>>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid 
>>>>> using the
>>>>> existing trunc implementation.
>>>>>
>>>>> Truncation is implemented inside psample so we would not be maintaining 
>>>>> two
>>>>> implementations.
>>>>
>>>> We will maintain trunc action and a trunc attribute for sampling, so we 
>>>> will
>>>> have a duplication here.
>>>>
>>>>> psample implementation does not require packet clone (which is
>>>>> needed for OVS trunc) plus it can be easily optimized out if there is no
>>>>> listener in the multicast group (I plan to add that optimization in the 
>>>>> next
>>>>> kernel series).
>>>>
>>>> 'trunc' doesn't require a clone in case there are no other actions after, 
>>>> which
>>>> is the main use case for drop sampling.
>>>>
>>>> BTW, having explicit drop actions might have broken some of the assumptions
>>>> around this optimization, but it should be fixable.
>>>>
>>>> And psample itself might need a clone() in case there was a trunc() before
>>>> sampling.  psample will report skb->len as original packet size, but that
>>>> is will not be correct as we have a postponed cutlen, so the reported 
>>>> length
>>>> must be the length after the cut and that can't be done without actually
>>>> cutting the skb at this point.
>>>>
>>>>>
>>>>> Having said that, I'm not against adding a new action if that simplifies 
>>>>> things.
>>>>> I think it might simplify things in ovs-vswitchd and even in the kernel.
>>>>> However, I have one concern.
>>>>>
>>>>> IIUC and based on your examples, your suggestions means we would not have
>>>>> sampling rate inside emit_sample, right? This might be a problem because 
>>>>> psample
>>>>> does have sampling rate which I believe to be crucial to balance 
>>>>> performance
>>>>> impact and accuracy.
>>>>
>>>> psample doesn't limit the rate, it just reports a number.  Rate is part of
>>>> the main sample() action, i.e.
>>>>
>>>>   sample(sample=75%,actions(emit_sample(cookie=...)))
>>>>
>>>> Having rate also in the 'emit_sample' would be another API duplication.
>>>
>>> I agree, but this is the confusing part :) If you use the psample TC 
>>> action, it will have the rate-limiting as part of the action.
>>>
>>> So if we offload the above, we will have a nested tc psample action, the 
>>> outer one with the desired sample rate, and the inner one with a 100% 
>>> sample rate. Actually, the inner one needs to be handled by sw as it will 
>>> go to vswitchd (and than back to psample?). I guess this avoids the idea of 
>>> doing this fast… Maybe if emit_sample() is the only action we can do some 
>>> optimisation, but it needs to be translatable back and forth from TC to 
>>> OVS_ATTRs.
>>>
>>> I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by 
>>> the way Nvidia was going to work on once the sFlow part was accepted.
>>
>> The actual sample() action can not be offloaded in a generic case.
>> There is no smaple action in TC that would accept arbitrary action
>> list.  We can only offload a few corner cases.  Today we offload
>> none.  With the sFlow series we can offload a single specific
>> OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW.
>>
>> And we can't offload more than one sampling method within a single
>> OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent
>> probabilities, which is not a desired behavior.
>>
>> So, we'll always have special case offloading for each particular
>> action inside the sample().
> 
> ACK, just wanted to clarify that we should take extra care when we do this.
> And we should build on top of the sflow series :)

Re-reading it. :)

> 
> Not sure about the naming, emit_sample, it’s not really a sample in the case
> we are not sampling anything, the action is always taken.

We're not sampling, we're emitting an already prepared sample.

> Maybe  OVS_ACTION_ATTR_LOCAL_COPY? But this comes close to 
> OVS_ACTION_ATTR_USERSPACE?
> 
>>>
>>>> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not
>>>> adding a new table for this new smapling method, i.e. it will not be
>>>> possible to set sampling rate for psample with the current database
>>>> configuration.
>>>>
>>>>> This value must be present in the psample sample or else
>>>>> users will not know how inaccurate the sample is.
>>>>
>>>> Users can check the sampling configuration in the database if we will
>>>> have one.  And they are configuring it in the first place.  And it will
>>>> be confusing to see two rates in the same action list.
>>>>
>>>> It might be possible to carry the current rate in the OVS_CB(skb) and
>>>> pass it to the psample.
>>>>
>>>> In any case, the rate reported by psample will not be accurate if there
>>>> are nested sample actions, unless we carry and properly multiply / divide
>>>> probabilities when entering / exiting the sample action.
>>>>
>>>>>
>>>>>
>>>>>> Naming is still hard though.  I like the word "emit" because it doesn't 
>>>>>> have a
>>>>>> concept of directionality, i.e. we're not sending it to somewhere, we're 
>>>>>> just
>>>>>> emitting it to the wild.  That kind of matches with our vague definition 
>>>>>> of a
>>>>>> "local" or "system" sampling.
>>>>>>
>>>>>
>>>>> I like the word "emit".
>>>>>
>>>>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. 
>>>>>> output(sample, cookie=...).
>>>>>>
>>>>>> The main concern is that we already have truncation implemented, so we 
>>>>>> can be more
>>>>>> flexible in a way we generate actions instead of adding new attributes.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> See my comment above. I am not suggesting we use the current trunc
>>>>> implementation nor to add a new one, just to use psample's.
>>>>>
>>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to