On 14 May 2024, at 14:50, Ilya Maximets wrote:

> On 5/14/24 14:43, Eelco Chaudron wrote:
>>
>>
>> On 14 May 2024, at 14:28, Ilya Maximets wrote:
>>
>>> On 5/14/24 13:27, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 14 May 2024, at 13:05, Ilya Maximets wrote:
>>>>
>>>>> On 5/14/24 12:14, Adrian Moreno wrote:
>>>>>>
>>>>>>
>>>>>> On 5/14/24 11:09 AM, Ilya Maximets wrote:
>>>>>>> On 5/14/24 09:39, Adrian Moreno wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>>>>>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>>>>>>>>>
>>>>>>>>>>> The new odp sample attributes allow userspace to specify a group_id 
>>>>>>>>>>> and
>>>>>>>>>>> user-defined cookie to be passed down to psample.
>>>>>>>>>>>
>>>>>>>>>>> Add support for parsing and formatting such action.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>>>>>>>>>>
>>>>>>>>>> Hi Adrian,
>>>>>>>>>>
>>>>>>>>>> See my comments below inline.
>>>>>>>>>>
>>>>>>>>>> //Eelco
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>    include/linux/openvswitch.h  |  49 +++++++++---
>>>>>>>>>>>    lib/odp-execute.c            |   3 +
>>>>>>>>>>>    lib/odp-util.c               | 150 
>>>>>>>>>>> ++++++++++++++++++++++++++---------
>>>>>>>>>>>    ofproto/ofproto-dpif-ipfix.c |   2 +
>>>>>>>>>>>    python/ovs/flow/odp.py       |   2 +
>>>>>>>>>>>    tests/odp.at                 |   5 ++
>>>>>>>>>>>    6 files changed, 163 insertions(+), 48 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/linux/openvswitch.h 
>>>>>>>>>>> b/include/linux/openvswitch.h
>>>>>>>>>>> index d9fb991ef..3e748405c 100644
>>>>>>>>>>> --- a/include/linux/openvswitch.h
>>>>>>>>>>> +++ b/include/linux/openvswitch.h
>>>>>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr {
>>>>>>>>>>>    #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>>>>>>>>>>    #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>>>>>>>>>>
>>>>>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>>>>>>>>>>    /**
>>>>>>>>>>>     * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE 
>>>>>>>>>>> action.
>>>>>>>>>>>     * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to 
>>>>>>>>>>> sample with
>>>>>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr {
>>>>>>>>>>>     * %UINT32_MAX samples all packets and intermediate values 
>>>>>>>>>>> sample intermediate
>>>>>>>>>>>     * fractions of packets.
>>>>>>>>>>>     * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in 
>>>>>>>>>>> sampling event.
>>>>>>>>>>> - * Actions are passed as nested attributes.
>>>>>>>>>>> + * Actions are passed as nested attributes. Optional if
>>>>>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>>>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as 
>>>>>>>>>>> psample group.
>>>>>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>>>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary 
>>>>>>>>>>> cookie that, if
>>>>>>>>>>> + * provided, will be copied to the psample cookie.
>>>>>>>>>>
>>>>>>>>>> Should we mention any limit on the actual length of the cookie?
>>>>>>>>>>
>>>>>>>>>> Any reason we explicitly call this psample? From an OVS perspective, 
>>>>>>>>>> this
>>>>>>>>>> could just be
>>>>>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE
>>>>>>>>>>  and
>>>>>>>>>> _GROUP. Other datapaths, do not have PSAMPLE.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> See my response to your comment on the cover-letter for possible name 
>>>>>>>>> alternatives.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thinking about it more, from an OVS perspective we should probably 
>>>>>>>>>> not even
>>>>>>>>>> send down a COOKIE, but we should send down an obs_domain_id and 
>>>>>>>>>> obs_point_id,
>>>>>>>>>> and then the kernel can move this into a cookie.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did consider this. However, the opaque cookie fits well with the tc 
>>>>>>>>> API which
>>>>>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if 
>>>>>>>>> the action
>>>>>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the 
>>>>>>>>> cookie anyways.
>>>>>>>>>
>>>>>>>>>> The collector itself could be called system/local collector, or 
>>>>>>>>>> something like
>>>>>>>>>> that. This way it can use for example psample for kernel and UDST 
>>>>>>>>>> probes for
>>>>>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and 
>>>>>>>>>> obs_point_id).
>>>>>>>>>>
>>>>>>>>>> Also, the presence of any of this should not dictate the psample 
>>>>>>>>>> action, we
>>>>>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It clearer and it might simplify option verification a bit, but isn't 
>>>>>>>>> a bit
>>>>>>>>> redundant? There is no flag for action execution for instance, the 
>>>>>>>>> presence of
>>>>>>>>> the attribute is enough.
>>>>>>>>>
>>>>>>>>> An alternative would be to have nested attributes, e.g:
>>>>>>>>>
>>>>>>>>> enum ovs_sample_attr {
>>>>>>>>>       OVS_SAMPLE_ATTR_UNSPEC,
>>>>>>>>>       OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>>>>>>>>>       OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* 
>>>>>>>>> attributes. */
>>>>>>>>>       OVS_SAMPLE_ATTR_SYSTEM,     /* Nested OVS_SAMPLE_SYSTEM_ATTR_* 
>>>>>>>>> attributes. */
>>>>>>>>>
>>>>>>>>>       __OVS_SAMPLE_ATTR_MAX,
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> enum ovs_sample_system_attr {
>>>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN,  /* u32 number */
>>>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */
>>>>>>>>>       OVS_SAMPLE_SYSTEM_ATTR_COOKIE,     /* Nested OVS_ACTION_ATTR_*
>>>>>>>>>
>>>>>>>>>       __OVS_SSAMPLE_SYSTEM_ATTR_MAX,
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Another benefit of nested attributes is the easier addition of other
>>>>>>>> sample-related attributes, I already have a candidate: 
>>>>>>>> OVS_SAMPLE_SYSTEM_TRUNC
>>>>>>>>
>>>>>>>> This will be passed to psample (md->trunc_size) who will truncate the 
>>>>>>>> packet
>>>>>>>> size to the specified value. This can be useful for large packets for 
>>>>>>>> which we
>>>>>>>> are only interested in the headers.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> [snip]
>>>>>>>>
>>>>>>>
>>>>>>> Hrm.  This makes me think these should not be attributes of a sample 
>>>>>>> action at all,
>>>>>>> but rather we should have a separate EMIT_SAMPLE action that just calls 
>>>>>>> psample, so
>>>>>>> we can combine it with other actions:
>>>>>>>
>>>>>>>    sample(..., 
>>>>>>> actions(userspace(...),trunc(100),emit_sample(cookie=...)))
>>>>>>>
>>>>>>> or even:
>>>>>>>
>>>>>>>    sample(..., 
>>>>>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...)))
>>>>>>>
>>>>>>> This will also simplify the requires_datapath_assistance() check as we 
>>>>>>> can just
>>>>>>> check the action and not iterate over the list of attributes.  And it 
>>>>>>> will be
>>>>>>> capable to only send this one action to the datapath and not the whole 
>>>>>>> thing.
>>>>>>>
>>>>>>
>>>>>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid 
>>>>>> using the
>>>>>> existing trunc implementation.
>>>>>>
>>>>>> Truncation is implemented inside psample so we would not be maintaining 
>>>>>> two
>>>>>> implementations.
>>>>>
>>>>> We will maintain trunc action and a trunc attribute for sampling, so we 
>>>>> will
>>>>> have a duplication here.
>>>>>
>>>>>> psample implementation does not require packet clone (which is
>>>>>> needed for OVS trunc) plus it can be easily optimized out if there is no
>>>>>> listener in the multicast group (I plan to add that optimization in the 
>>>>>> next
>>>>>> kernel series).
>>>>>
>>>>> 'trunc' doesn't require a clone in case there are no other actions after, 
>>>>> which
>>>>> is the main use case for drop sampling.
>>>>>
>>>>> BTW, having explicit drop actions might have broken some of the 
>>>>> assumptions
>>>>> around this optimization, but it should be fixable.
>>>>>
>>>>> And psample itself might need a clone() in case there was a trunc() before
>>>>> sampling.  psample will report skb->len as original packet size, but that
>>>>> is will not be correct as we have a postponed cutlen, so the reported 
>>>>> length
>>>>> must be the length after the cut and that can't be done without actually
>>>>> cutting the skb at this point.
>>>>>
>>>>>>
>>>>>> Having said that, I'm not against adding a new action if that simplifies 
>>>>>> things.
>>>>>> I think it might simplify things in ovs-vswitchd and even in the kernel.
>>>>>> However, I have one concern.
>>>>>>
>>>>>> IIUC and based on your examples, your suggestions means we would not have
>>>>>> sampling rate inside emit_sample, right? This might be a problem because 
>>>>>> psample
>>>>>> does have sampling rate which I believe to be crucial to balance 
>>>>>> performance
>>>>>> impact and accuracy.
>>>>>
>>>>> psample doesn't limit the rate, it just reports a number.  Rate is part of
>>>>> the main sample() action, i.e.
>>>>>
>>>>>   sample(sample=75%,actions(emit_sample(cookie=...)))
>>>>>
>>>>> Having rate also in the 'emit_sample' would be another API duplication.
>>>>
>>>> I agree, but this is the confusing part :) If you use the psample TC 
>>>> action, it will have the rate-limiting as part of the action.
>>>>
>>>> So if we offload the above, we will have a nested tc psample action, the 
>>>> outer one with the desired sample rate, and the inner one with a 100% 
>>>> sample rate. Actually, the inner one needs to be handled by sw as it will 
>>>> go to vswitchd (and than back to psample?). I guess this avoids the idea 
>>>> of doing this fast… Maybe if emit_sample() is the only action we can do 
>>>> some optimisation, but it needs to be translatable back and forth from TC 
>>>> to OVS_ATTRs.
>>>>
>>>> I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by 
>>>> the way Nvidia was going to work on once the sFlow part was accepted.
>>>
>>> The actual sample() action can not be offloaded in a generic case.
>>> There is no smaple action in TC that would accept arbitrary action
>>> list.  We can only offload a few corner cases.  Today we offload
>>> none.  With the sFlow series we can offload a single specific
>>> OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW.
>>>
>>> And we can't offload more than one sampling method within a single
>>> OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent
>>> probabilities, which is not a desired behavior.
>>>
>>> So, we'll always have special case offloading for each particular
>>> action inside the sample().
>>
>> ACK, just wanted to clarify that we should take extra care when we do this.
>> And we should build on top of the sflow series :)
>
> Re-reading it. :)
>
>>
>> Not sure about the naming, emit_sample, it’s not really a sample in the case
>> we are not sampling anything, the action is always taken.
>
> We're not sampling, we're emitting an already prepared sample.

Are we? This action can be used without enclosing it in a sample() action, or 
am I missing something?

>> Maybe  OVS_ACTION_ATTR_LOCAL_COPY? But this comes close to 
>> OVS_ACTION_ATTR_USERSPACE?
>>
>>>>
>>>>> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not
>>>>> adding a new table for this new smapling method, i.e. it will not be
>>>>> possible to set sampling rate for psample with the current database
>>>>> configuration.
>>>>>
>>>>>> This value must be present in the psample sample or else
>>>>>> users will not know how inaccurate the sample is.
>>>>>
>>>>> Users can check the sampling configuration in the database if we will
>>>>> have one.  And they are configuring it in the first place.  And it will
>>>>> be confusing to see two rates in the same action list.
>>>>>
>>>>> It might be possible to carry the current rate in the OVS_CB(skb) and
>>>>> pass it to the psample.
>>>>>
>>>>> In any case, the rate reported by psample will not be accurate if there
>>>>> are nested sample actions, unless we carry and properly multiply / divide
>>>>> probabilities when entering / exiting the sample action.
>>>>>
>>>>>>
>>>>>>
>>>>>>> Naming is still hard though.  I like the word "emit" because it doesn't 
>>>>>>> have a
>>>>>>> concept of directionality, i.e. we're not sending it to somewhere, 
>>>>>>> we're just
>>>>>>> emitting it to the wild.  That kind of matches with our vague 
>>>>>>> definition of a
>>>>>>> "local" or "system" sampling.
>>>>>>>
>>>>>>
>>>>>> I like the word "emit".
>>>>>>
>>>>>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. 
>>>>>>> output(sample, cookie=...).
>>>>>>>
>>>>>>> The main concern is that we already have truncation implemented, so we 
>>>>>>> can be more
>>>>>>> flexible in a way we generate actions instead of adding new attributes.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> See my comment above. I am not suggesting we use the current trunc
>>>>>> implementation nor to add a new one, just to use psample's.
>>>>>>
>>>>
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to