On 14 May 2024, at 13:05, Ilya Maximets wrote:

> On 5/14/24 12:14, Adrian Moreno wrote:
>>
>>
>> On 5/14/24 11:09 AM, Ilya Maximets wrote:
>>> On 5/14/24 09:39, Adrian Moreno wrote:
>>>>
>>>>
>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote:
>>>>>
>>>>>
>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>>>>>
>>>>>>> The new odp sample attributes allow userspace to specify a group_id and
>>>>>>> user-defined cookie to be passed down to psample.
>>>>>>>
>>>>>>> Add support for parsing and formatting such action.
>>>>>>>
>>>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>>>>>>
>>>>>> Hi Adrian,
>>>>>>
>>>>>> See my comments below inline.
>>>>>>
>>>>>> //Eelco
>>>>>>
>>>>>>> ---
>>>>>>>    include/linux/openvswitch.h  |  49 +++++++++---
>>>>>>>    lib/odp-execute.c            |   3 +
>>>>>>>    lib/odp-util.c               | 150 
>>>>>>> ++++++++++++++++++++++++++---------
>>>>>>>    ofproto/ofproto-dpif-ipfix.c |   2 +
>>>>>>>    python/ovs/flow/odp.py       |   2 +
>>>>>>>    tests/odp.at                 |   5 ++
>>>>>>>    6 files changed, 163 insertions(+), 48 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>>>>>>> index d9fb991ef..3e748405c 100644
>>>>>>> --- a/include/linux/openvswitch.h
>>>>>>> +++ b/include/linux/openvswitch.h
>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr {
>>>>>>>    #define OVS_UFID_F_OMIT_MASK     (1 << 1)
>>>>>>>    #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>>>>>>
>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>>>>>>    /**
>>>>>>>     * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE 
>>>>>>> action.
>>>>>>>     * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to 
>>>>>>> sample with
>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr {
>>>>>>>     * %UINT32_MAX samples all packets and intermediate values sample 
>>>>>>> intermediate
>>>>>>>     * fractions of packets.
>>>>>>>     * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling 
>>>>>>> event.
>>>>>>> - * Actions are passed as nested attributes.
>>>>>>> + * Actions are passed as nested attributes. Optional if
>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as 
>>>>>>> psample group.
>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie 
>>>>>>> that, if
>>>>>>> + * provided, will be copied to the psample cookie.
>>>>>>
>>>>>> Should we mention any limit on the actual length of the cookie?
>>>>>>
>>>>>> Any reason we explicitly call this psample? From an OVS perspective, this
>>>>>> could just be
>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE 
>>>>>> and
>>>>>> _GROUP. Other datapaths, do not have PSAMPLE.
>>>>>>
>>>>>
>>>>> See my response to your comment on the cover-letter for possible name 
>>>>> alternatives.
>>>>>
>>>>>
>>>>>> Thinking about it more, from an OVS perspective we should probably not 
>>>>>> even
>>>>>> send down a COOKIE, but we should send down an obs_domain_id and 
>>>>>> obs_point_id,
>>>>>> and then the kernel can move this into a cookie.
>>>>>>
>>>>>
>>>>> I did consider this. However, the opaque cookie fits well with the tc API 
>>>>> which
>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if the 
>>>>> action
>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the cookie 
>>>>> anyways.
>>>>>
>>>>>> The collector itself could be called system/local collector, or 
>>>>>> something like
>>>>>> that. This way it can use for example psample for kernel and UDST probes 
>>>>>> for
>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and 
>>>>>> obs_point_id).
>>>>>>
>>>>>> Also, the presence of any of this should not dictate the psample action, 
>>>>>> we
>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag.
>>>>>>
>>>>>
>>>>> It clearer and it might simplify option verification a bit, but isn't a 
>>>>> bit
>>>>> redundant? There is no flag for action execution for instance, the 
>>>>> presence of
>>>>> the attribute is enough.
>>>>>
>>>>> An alternative would be to have nested attributes, e.g:
>>>>>
>>>>> enum ovs_sample_attr {
>>>>>       OVS_SAMPLE_ATTR_UNSPEC,
>>>>>       OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
>>>>>       OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* 
>>>>> attributes. */
>>>>>       OVS_SAMPLE_ATTR_SYSTEM,     /* Nested OVS_SAMPLE_SYSTEM_ATTR_* 
>>>>> attributes. */
>>>>>
>>>>>       __OVS_SAMPLE_ATTR_MAX,
>>>>> };
>>>>>
>>>>> enum ovs_sample_system_attr {
>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN,  /* u32 number */
>>>>>       OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */
>>>>>       OVS_SAMPLE_SYSTEM_ATTR_COOKIE,     /* Nested OVS_ACTION_ATTR_*
>>>>>
>>>>>       __OVS_SSAMPLE_SYSTEM_ATTR_MAX,
>>>>> };
>>>>>
>>>>> WDYT?
>>>>>
>>>>
>>>> Another benefit of nested attributes is the easier addition of other
>>>> sample-related attributes, I already have a candidate: 
>>>> OVS_SAMPLE_SYSTEM_TRUNC
>>>>
>>>> This will be passed to psample (md->trunc_size) who will truncate the 
>>>> packet
>>>> size to the specified value. This can be useful for large packets for 
>>>> which we
>>>> are only interested in the headers.
>>>>
>>>> WDYT?
>>>>
>>>> [snip]
>>>>
>>>
>>> Hrm.  This makes me think these should not be attributes of a sample action 
>>> at all,
>>> but rather we should have a separate EMIT_SAMPLE action that just calls 
>>> psample, so
>>> we can combine it with other actions:
>>>
>>>    sample(..., actions(userspace(...),trunc(100),emit_sample(cookie=...)))
>>>
>>> or even:
>>>
>>>    sample(..., 
>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...)))
>>>
>>> This will also simplify the requires_datapath_assistance() check as we can 
>>> just
>>> check the action and not iterate over the list of attributes.  And it will 
>>> be
>>> capable to only send this one action to the datapath and not the whole 
>>> thing.
>>>
>>
>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid using 
>> the
>> existing trunc implementation.
>>
>> Truncation is implemented inside psample so we would not be maintaining two
>> implementations.
>
> We will maintain trunc action and a trunc attribute for sampling, so we will
> have a duplication here.
>
>> psample implementation does not require packet clone (which is
>> needed for OVS trunc) plus it can be easily optimized out if there is no
>> listener in the multicast group (I plan to add that optimization in the next
>> kernel series).
>
> 'trunc' doesn't require a clone in case there are no other actions after, 
> which
> is the main use case for drop sampling.
>
> BTW, having explicit drop actions might have broken some of the assumptions
> around this optimization, but it should be fixable.
>
> And psample itself might need a clone() in case there was a trunc() before
> sampling.  psample will report skb->len as original packet size, but that
> is will not be correct as we have a postponed cutlen, so the reported length
> must be the length after the cut and that can't be done without actually
> cutting the skb at this point.
>
>>
>> Having said that, I'm not against adding a new action if that simplifies 
>> things.
>> I think it might simplify things in ovs-vswitchd and even in the kernel.
>> However, I have one concern.
>>
>> IIUC and based on your examples, your suggestions means we would not have
>> sampling rate inside emit_sample, right? This might be a problem because 
>> psample
>> does have sampling rate which I believe to be crucial to balance performance
>> impact and accuracy.
>
> psample doesn't limit the rate, it just reports a number.  Rate is part of
> the main sample() action, i.e.
>
>   sample(sample=75%,actions(emit_sample(cookie=...)))
>
> Having rate also in the 'emit_sample' would be another API duplication.

I agree, but this is the confusing part :) If you use the psample TC action, it 
will have the rate-limiting as part of the action.

So if we offload the above, we will have a nested tc psample action, the outer 
one with the desired sample rate, and the inner one with a 100% sample rate. 
Actually, the inner one needs to be handled by sw as it will go to vswitchd 
(and than back to psample?). I guess this avoids the idea of doing this fast… 
Maybe if emit_sample() is the only action we can do some optimisation, but it 
needs to be translatable back and forth from TC to OVS_ATTRs.

I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by the 
way Nvidia was going to work on once the sFlow part was accepted.

> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not
> adding a new table for this new smapling method, i.e. it will not be
> possible to set sampling rate for psample with the current database
> configuration.
>
>> This value must be present in the psample sample or else
>> users will not know how inaccurate the sample is.
>
> Users can check the sampling configuration in the database if we will
> have one.  And they are configuring it in the first place.  And it will
> be confusing to see two rates in the same action list.
>
> It might be possible to carry the current rate in the OVS_CB(skb) and
> pass it to the psample.
>
> In any case, the rate reported by psample will not be accurate if there
> are nested sample actions, unless we carry and properly multiply / divide
> probabilities when entering / exiting the sample action.
>
>>
>>
>>> Naming is still hard though.  I like the word "emit" because it doesn't 
>>> have a
>>> concept of directionality, i.e. we're not sending it to somewhere, we're 
>>> just
>>> emitting it to the wild.  That kind of matches with our vague definition of 
>>> a
>>> "local" or "system" sampling.
>>>
>>
>> I like the word "emit".
>>
>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. output(sample, 
>>> cookie=...).
>>>
>>> The main concern is that we already have truncation implemented, so we can 
>>> be more
>>> flexible in a way we generate actions instead of adding new attributes.
>>>
>>> Thoughts?
>>
>> See my comment above. I am not suggesting we use the current trunc
>> implementation nor to add a new one, just to use psample's.
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to