On 14 May 2024, at 14:50, Ilya Maximets wrote:
> On 5/14/24 14:43, Eelco Chaudron wrote: >> >> >> On 14 May 2024, at 14:28, Ilya Maximets wrote: >> >>> On 5/14/24 13:27, Eelco Chaudron wrote: >>>> >>>> >>>> On 14 May 2024, at 13:05, Ilya Maximets wrote: >>>> >>>>> On 5/14/24 12:14, Adrian Moreno wrote: >>>>>> >>>>>> >>>>>> On 5/14/24 11:09 AM, Ilya Maximets wrote: >>>>>>> On 5/14/24 09:39, Adrian Moreno wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote: >>>>>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote: >>>>>>>>>> >>>>>>>>>>> The new odp sample attributes allow userspace to specify a group_id >>>>>>>>>>> and >>>>>>>>>>> user-defined cookie to be passed down to psample. >>>>>>>>>>> >>>>>>>>>>> Add support for parsing and formatting such action. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Adrian Moreno <[email protected]> >>>>>>>>>> >>>>>>>>>> Hi Adrian, >>>>>>>>>> >>>>>>>>>> See my comments below inline. >>>>>>>>>> >>>>>>>>>> //Eelco >>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> include/linux/openvswitch.h | 49 +++++++++--- >>>>>>>>>>> lib/odp-execute.c | 3 + >>>>>>>>>>> lib/odp-util.c | 150 >>>>>>>>>>> ++++++++++++++++++++++++++--------- >>>>>>>>>>> ofproto/ofproto-dpif-ipfix.c | 2 + >>>>>>>>>>> python/ovs/flow/odp.py | 2 + >>>>>>>>>>> tests/odp.at | 5 ++ >>>>>>>>>>> 6 files changed, 163 insertions(+), 48 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/include/linux/openvswitch.h >>>>>>>>>>> b/include/linux/openvswitch.h >>>>>>>>>>> index d9fb991ef..3e748405c 100644 >>>>>>>>>>> --- a/include/linux/openvswitch.h >>>>>>>>>>> +++ b/include/linux/openvswitch.h >>>>>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr { >>>>>>>>>>> #define OVS_UFID_F_OMIT_MASK (1 << 1) >>>>>>>>>>> #define OVS_UFID_F_OMIT_ACTIONS (1 << 2) >>>>>>>>>>> >>>>>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 >>>>>>>>>>> /** >>>>>>>>>>> * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE >>>>>>>>>>> action. >>>>>>>>>>> * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to >>>>>>>>>>> sample with >>>>>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr { >>>>>>>>>>> * %UINT32_MAX samples all packets and intermediate values >>>>>>>>>>> sample intermediate >>>>>>>>>>> * fractions of packets. >>>>>>>>>>> * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in >>>>>>>>>>> sampling event. >>>>>>>>>>> - * Actions are passed as nested attributes. >>>>>>>>>>> + * Actions are passed as nested attributes. Optional if >>>>>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set. >>>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as >>>>>>>>>>> psample group. >>>>>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set. >>>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary >>>>>>>>>>> cookie that, if >>>>>>>>>>> + * provided, will be copied to the psample cookie. >>>>>>>>>> >>>>>>>>>> Should we mention any limit on the actual length of the cookie? >>>>>>>>>> >>>>>>>>>> Any reason we explicitly call this psample? From an OVS perspective, >>>>>>>>>> this >>>>>>>>>> could just be >>>>>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE >>>>>>>>>> and >>>>>>>>>> _GROUP. Other datapaths, do not have PSAMPLE. >>>>>>>>>> >>>>>>>>> >>>>>>>>> See my response to your comment on the cover-letter for possible name >>>>>>>>> alternatives. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thinking about it more, from an OVS perspective we should probably >>>>>>>>>> not even >>>>>>>>>> send down a COOKIE, but we should send down an obs_domain_id and >>>>>>>>>> obs_point_id, >>>>>>>>>> and then the kernel can move this into a cookie. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I did consider this. However, the opaque cookie fits well with the tc >>>>>>>>> API which >>>>>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if >>>>>>>>> the action >>>>>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the >>>>>>>>> cookie anyways. >>>>>>>>> >>>>>>>>>> The collector itself could be called system/local collector, or >>>>>>>>>> something like >>>>>>>>>> that. This way it can use for example psample for kernel and UDST >>>>>>>>>> probes for >>>>>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and >>>>>>>>>> obs_point_id). >>>>>>>>>> >>>>>>>>>> Also, the presence of any of this should not dictate the psample >>>>>>>>>> action, we >>>>>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag. >>>>>>>>>> >>>>>>>>> >>>>>>>>> It clearer and it might simplify option verification a bit, but isn't >>>>>>>>> a bit >>>>>>>>> redundant? There is no flag for action execution for instance, the >>>>>>>>> presence of >>>>>>>>> the attribute is enough. >>>>>>>>> >>>>>>>>> An alternative would be to have nested attributes, e.g: >>>>>>>>> >>>>>>>>> enum ovs_sample_attr { >>>>>>>>> OVS_SAMPLE_ATTR_UNSPEC, >>>>>>>>> OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */ >>>>>>>>> OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* >>>>>>>>> attributes. */ >>>>>>>>> OVS_SAMPLE_ATTR_SYSTEM, /* Nested OVS_SAMPLE_SYSTEM_ATTR_* >>>>>>>>> attributes. */ >>>>>>>>> >>>>>>>>> __OVS_SAMPLE_ATTR_MAX, >>>>>>>>> }; >>>>>>>>> >>>>>>>>> enum ovs_sample_system_attr { >>>>>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN, /* u32 number */ >>>>>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */ >>>>>>>>> OVS_SAMPLE_SYSTEM_ATTR_COOKIE, /* Nested OVS_ACTION_ATTR_* >>>>>>>>> >>>>>>>>> __OVS_SSAMPLE_SYSTEM_ATTR_MAX, >>>>>>>>> }; >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>> >>>>>>>> Another benefit of nested attributes is the easier addition of other >>>>>>>> sample-related attributes, I already have a candidate: >>>>>>>> OVS_SAMPLE_SYSTEM_TRUNC >>>>>>>> >>>>>>>> This will be passed to psample (md->trunc_size) who will truncate the >>>>>>>> packet >>>>>>>> size to the specified value. This can be useful for large packets for >>>>>>>> which we >>>>>>>> are only interested in the headers. >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> [snip] >>>>>>>> >>>>>>> >>>>>>> Hrm. This makes me think these should not be attributes of a sample >>>>>>> action at all, >>>>>>> but rather we should have a separate EMIT_SAMPLE action that just calls >>>>>>> psample, so >>>>>>> we can combine it with other actions: >>>>>>> >>>>>>> sample(..., >>>>>>> actions(userspace(...),trunc(100),emit_sample(cookie=...))) >>>>>>> >>>>>>> or even: >>>>>>> >>>>>>> sample(..., >>>>>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...))) >>>>>>> >>>>>>> This will also simplify the requires_datapath_assistance() check as we >>>>>>> can just >>>>>>> check the action and not iterate over the list of attributes. And it >>>>>>> will be >>>>>>> capable to only send this one action to the datapath and not the whole >>>>>>> thing. >>>>>>> >>>>>> >>>>>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid >>>>>> using the >>>>>> existing trunc implementation. >>>>>> >>>>>> Truncation is implemented inside psample so we would not be maintaining >>>>>> two >>>>>> implementations. >>>>> >>>>> We will maintain trunc action and a trunc attribute for sampling, so we >>>>> will >>>>> have a duplication here. >>>>> >>>>>> psample implementation does not require packet clone (which is >>>>>> needed for OVS trunc) plus it can be easily optimized out if there is no >>>>>> listener in the multicast group (I plan to add that optimization in the >>>>>> next >>>>>> kernel series). >>>>> >>>>> 'trunc' doesn't require a clone in case there are no other actions after, >>>>> which >>>>> is the main use case for drop sampling. >>>>> >>>>> BTW, having explicit drop actions might have broken some of the >>>>> assumptions >>>>> around this optimization, but it should be fixable. >>>>> >>>>> And psample itself might need a clone() in case there was a trunc() before >>>>> sampling. psample will report skb->len as original packet size, but that >>>>> is will not be correct as we have a postponed cutlen, so the reported >>>>> length >>>>> must be the length after the cut and that can't be done without actually >>>>> cutting the skb at this point. >>>>> >>>>>> >>>>>> Having said that, I'm not against adding a new action if that simplifies >>>>>> things. >>>>>> I think it might simplify things in ovs-vswitchd and even in the kernel. >>>>>> However, I have one concern. >>>>>> >>>>>> IIUC and based on your examples, your suggestions means we would not have >>>>>> sampling rate inside emit_sample, right? This might be a problem because >>>>>> psample >>>>>> does have sampling rate which I believe to be crucial to balance >>>>>> performance >>>>>> impact and accuracy. >>>>> >>>>> psample doesn't limit the rate, it just reports a number. Rate is part of >>>>> the main sample() action, i.e. >>>>> >>>>> sample(sample=75%,actions(emit_sample(cookie=...))) >>>>> >>>>> Having rate also in the 'emit_sample' would be another API duplication. >>>> >>>> I agree, but this is the confusing part :) If you use the psample TC >>>> action, it will have the rate-limiting as part of the action. >>>> >>>> So if we offload the above, we will have a nested tc psample action, the >>>> outer one with the desired sample rate, and the inner one with a 100% >>>> sample rate. Actually, the inner one needs to be handled by sw as it will >>>> go to vswitchd (and than back to psample?). I guess this avoids the idea >>>> of doing this fast… Maybe if emit_sample() is the only action we can do >>>> some optimisation, but it needs to be translatable back and forth from TC >>>> to OVS_ATTRs. >>>> >>>> I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by >>>> the way Nvidia was going to work on once the sFlow part was accepted. >>> >>> The actual sample() action can not be offloaded in a generic case. >>> There is no smaple action in TC that would accept arbitrary action >>> list. We can only offload a few corner cases. Today we offload >>> none. With the sFlow series we can offload a single specific >>> OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW. >>> >>> And we can't offload more than one sampling method within a single >>> OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent >>> probabilities, which is not a desired behavior. >>> >>> So, we'll always have special case offloading for each particular >>> action inside the sample(). >> >> ACK, just wanted to clarify that we should take extra care when we do this. >> And we should build on top of the sflow series :) > > Re-reading it. :) > >> >> Not sure about the naming, emit_sample, it’s not really a sample in the case >> we are not sampling anything, the action is always taken. > > We're not sampling, we're emitting an already prepared sample. Are we? This action can be used without enclosing it in a sample() action, or am I missing something? >> Maybe OVS_ACTION_ATTR_LOCAL_COPY? But this comes close to >> OVS_ACTION_ATTR_USERSPACE? >> >>>> >>>>> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not >>>>> adding a new table for this new smapling method, i.e. it will not be >>>>> possible to set sampling rate for psample with the current database >>>>> configuration. >>>>> >>>>>> This value must be present in the psample sample or else >>>>>> users will not know how inaccurate the sample is. >>>>> >>>>> Users can check the sampling configuration in the database if we will >>>>> have one. And they are configuring it in the first place. And it will >>>>> be confusing to see two rates in the same action list. >>>>> >>>>> It might be possible to carry the current rate in the OVS_CB(skb) and >>>>> pass it to the psample. >>>>> >>>>> In any case, the rate reported by psample will not be accurate if there >>>>> are nested sample actions, unless we carry and properly multiply / divide >>>>> probabilities when entering / exiting the sample action. >>>>> >>>>>> >>>>>> >>>>>>> Naming is still hard though. I like the word "emit" because it doesn't >>>>>>> have a >>>>>>> concept of directionality, i.e. we're not sending it to somewhere, >>>>>>> we're just >>>>>>> emitting it to the wild. That kind of matches with our vague >>>>>>> definition of a >>>>>>> "local" or "system" sampling. >>>>>>> >>>>>> >>>>>> I like the word "emit". >>>>>> >>>>>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. >>>>>>> output(sample, cookie=...). >>>>>>> >>>>>>> The main concern is that we already have truncation implemented, so we >>>>>>> can be more >>>>>>> flexible in a way we generate actions instead of adding new attributes. >>>>>>> >>>>>>> Thoughts? >>>>>> >>>>>> See my comment above. I am not suggesting we use the current trunc >>>>>> implementation nor to add a new one, just to use psample's. >>>>>> >>>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
