On 14 May 2024, at 14:28, Ilya Maximets wrote:
> On 5/14/24 13:27, Eelco Chaudron wrote: >> >> >> On 14 May 2024, at 13:05, Ilya Maximets wrote: >> >>> On 5/14/24 12:14, Adrian Moreno wrote: >>>> >>>> >>>> On 5/14/24 11:09 AM, Ilya Maximets wrote: >>>>> On 5/14/24 09:39, Adrian Moreno wrote: >>>>>> >>>>>> >>>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote: >>>>>>> >>>>>>> >>>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote: >>>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote: >>>>>>>> >>>>>>>>> The new odp sample attributes allow userspace to specify a group_id >>>>>>>>> and >>>>>>>>> user-defined cookie to be passed down to psample. >>>>>>>>> >>>>>>>>> Add support for parsing and formatting such action. >>>>>>>>> >>>>>>>>> Signed-off-by: Adrian Moreno <[email protected]> >>>>>>>> >>>>>>>> Hi Adrian, >>>>>>>> >>>>>>>> See my comments below inline. >>>>>>>> >>>>>>>> //Eelco >>>>>>>> >>>>>>>>> --- >>>>>>>>> include/linux/openvswitch.h | 49 +++++++++--- >>>>>>>>> lib/odp-execute.c | 3 + >>>>>>>>> lib/odp-util.c | 150 >>>>>>>>> ++++++++++++++++++++++++++--------- >>>>>>>>> ofproto/ofproto-dpif-ipfix.c | 2 + >>>>>>>>> python/ovs/flow/odp.py | 2 + >>>>>>>>> tests/odp.at | 5 ++ >>>>>>>>> 6 files changed, 163 insertions(+), 48 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h >>>>>>>>> index d9fb991ef..3e748405c 100644 >>>>>>>>> --- a/include/linux/openvswitch.h >>>>>>>>> +++ b/include/linux/openvswitch.h >>>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr { >>>>>>>>> #define OVS_UFID_F_OMIT_MASK (1 << 1) >>>>>>>>> #define OVS_UFID_F_OMIT_ACTIONS (1 << 2) >>>>>>>>> >>>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 >>>>>>>>> /** >>>>>>>>> * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE >>>>>>>>> action. >>>>>>>>> * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to >>>>>>>>> sample with >>>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr { >>>>>>>>> * %UINT32_MAX samples all packets and intermediate values sample >>>>>>>>> intermediate >>>>>>>>> * fractions of packets. >>>>>>>>> * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling >>>>>>>>> event. >>>>>>>>> - * Actions are passed as nested attributes. >>>>>>>>> + * Actions are passed as nested attributes. Optional if >>>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set. >>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as >>>>>>>>> psample group. >>>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set. >>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie >>>>>>>>> that, if >>>>>>>>> + * provided, will be copied to the psample cookie. >>>>>>>> >>>>>>>> Should we mention any limit on the actual length of the cookie? >>>>>>>> >>>>>>>> Any reason we explicitly call this psample? From an OVS perspective, >>>>>>>> this >>>>>>>> could just be >>>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE >>>>>>>> and >>>>>>>> _GROUP. Other datapaths, do not have PSAMPLE. >>>>>>>> >>>>>>> >>>>>>> See my response to your comment on the cover-letter for possible name >>>>>>> alternatives. >>>>>>> >>>>>>> >>>>>>>> Thinking about it more, from an OVS perspective we should probably not >>>>>>>> even >>>>>>>> send down a COOKIE, but we should send down an obs_domain_id and >>>>>>>> obs_point_id, >>>>>>>> and then the kernel can move this into a cookie. >>>>>>>> >>>>>>> >>>>>>> I did consider this. However, the opaque cookie fits well with the tc >>>>>>> API which >>>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if the >>>>>>> action >>>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the >>>>>>> cookie anyways. >>>>>>> >>>>>>>> The collector itself could be called system/local collector, or >>>>>>>> something like >>>>>>>> that. This way it can use for example psample for kernel and UDST >>>>>>>> probes for >>>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and >>>>>>>> obs_point_id). >>>>>>>> >>>>>>>> Also, the presence of any of this should not dictate the psample >>>>>>>> action, we >>>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag. >>>>>>>> >>>>>>> >>>>>>> It clearer and it might simplify option verification a bit, but isn't a >>>>>>> bit >>>>>>> redundant? There is no flag for action execution for instance, the >>>>>>> presence of >>>>>>> the attribute is enough. >>>>>>> >>>>>>> An alternative would be to have nested attributes, e.g: >>>>>>> >>>>>>> enum ovs_sample_attr { >>>>>>> OVS_SAMPLE_ATTR_UNSPEC, >>>>>>> OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */ >>>>>>> OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* >>>>>>> attributes. */ >>>>>>> OVS_SAMPLE_ATTR_SYSTEM, /* Nested OVS_SAMPLE_SYSTEM_ATTR_* >>>>>>> attributes. */ >>>>>>> >>>>>>> __OVS_SAMPLE_ATTR_MAX, >>>>>>> }; >>>>>>> >>>>>>> enum ovs_sample_system_attr { >>>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN, /* u32 number */ >>>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */ >>>>>>> OVS_SAMPLE_SYSTEM_ATTR_COOKIE, /* Nested OVS_ACTION_ATTR_* >>>>>>> >>>>>>> __OVS_SSAMPLE_SYSTEM_ATTR_MAX, >>>>>>> }; >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>> >>>>>> Another benefit of nested attributes is the easier addition of other >>>>>> sample-related attributes, I already have a candidate: >>>>>> OVS_SAMPLE_SYSTEM_TRUNC >>>>>> >>>>>> This will be passed to psample (md->trunc_size) who will truncate the >>>>>> packet >>>>>> size to the specified value. This can be useful for large packets for >>>>>> which we >>>>>> are only interested in the headers. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> [snip] >>>>>> >>>>> >>>>> Hrm. This makes me think these should not be attributes of a sample >>>>> action at all, >>>>> but rather we should have a separate EMIT_SAMPLE action that just calls >>>>> psample, so >>>>> we can combine it with other actions: >>>>> >>>>> sample(..., actions(userspace(...),trunc(100),emit_sample(cookie=...))) >>>>> >>>>> or even: >>>>> >>>>> sample(..., >>>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...))) >>>>> >>>>> This will also simplify the requires_datapath_assistance() check as we >>>>> can just >>>>> check the action and not iterate over the list of attributes. And it >>>>> will be >>>>> capable to only send this one action to the datapath and not the whole >>>>> thing. >>>>> >>>> >>>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid using >>>> the >>>> existing trunc implementation. >>>> >>>> Truncation is implemented inside psample so we would not be maintaining two >>>> implementations. >>> >>> We will maintain trunc action and a trunc attribute for sampling, so we will >>> have a duplication here. >>> >>>> psample implementation does not require packet clone (which is >>>> needed for OVS trunc) plus it can be easily optimized out if there is no >>>> listener in the multicast group (I plan to add that optimization in the >>>> next >>>> kernel series). >>> >>> 'trunc' doesn't require a clone in case there are no other actions after, >>> which >>> is the main use case for drop sampling. >>> >>> BTW, having explicit drop actions might have broken some of the assumptions >>> around this optimization, but it should be fixable. >>> >>> And psample itself might need a clone() in case there was a trunc() before >>> sampling. psample will report skb->len as original packet size, but that >>> is will not be correct as we have a postponed cutlen, so the reported length >>> must be the length after the cut and that can't be done without actually >>> cutting the skb at this point. >>> >>>> >>>> Having said that, I'm not against adding a new action if that simplifies >>>> things. >>>> I think it might simplify things in ovs-vswitchd and even in the kernel. >>>> However, I have one concern. >>>> >>>> IIUC and based on your examples, your suggestions means we would not have >>>> sampling rate inside emit_sample, right? This might be a problem because >>>> psample >>>> does have sampling rate which I believe to be crucial to balance >>>> performance >>>> impact and accuracy. >>> >>> psample doesn't limit the rate, it just reports a number. Rate is part of >>> the main sample() action, i.e. >>> >>> sample(sample=75%,actions(emit_sample(cookie=...))) >>> >>> Having rate also in the 'emit_sample' would be another API duplication. >> >> I agree, but this is the confusing part :) If you use the psample TC action, >> it will have the rate-limiting as part of the action. >> >> So if we offload the above, we will have a nested tc psample action, the >> outer one with the desired sample rate, and the inner one with a 100% sample >> rate. Actually, the inner one needs to be handled by sw as it will go to >> vswitchd (and than back to psample?). I guess this avoids the idea of doing >> this fast… Maybe if emit_sample() is the only action we can do some >> optimisation, but it needs to be translatable back and forth from TC to >> OVS_ATTRs. >> >> I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by the >> way Nvidia was going to work on once the sFlow part was accepted. > > The actual sample() action can not be offloaded in a generic case. > There is no smaple action in TC that would accept arbitrary action > list. We can only offload a few corner cases. Today we offload > none. With the sFlow series we can offload a single specific > OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW. > > And we can't offload more than one sampling method within a single > OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent > probabilities, which is not a desired behavior. > > So, we'll always have special case offloading for each particular > action inside the sample(). ACK, just wanted to clarify that we should take extra care when we do this. And we should build on top of the sflow series :) Not sure about the naming, emit_sample, it’s not really a sample in the case we are not sampling anything, the action is always taken. Maybe OVS_ACTION_ATTR_LOCAL_COPY? But this comes close to OVS_ACTION_ATTR_USERSPACE? >> >>> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not >>> adding a new table for this new smapling method, i.e. it will not be >>> possible to set sampling rate for psample with the current database >>> configuration. >>> >>>> This value must be present in the psample sample or else >>>> users will not know how inaccurate the sample is. >>> >>> Users can check the sampling configuration in the database if we will >>> have one. And they are configuring it in the first place. And it will >>> be confusing to see two rates in the same action list. >>> >>> It might be possible to carry the current rate in the OVS_CB(skb) and >>> pass it to the psample. >>> >>> In any case, the rate reported by psample will not be accurate if there >>> are nested sample actions, unless we carry and properly multiply / divide >>> probabilities when entering / exiting the sample action. >>> >>>> >>>> >>>>> Naming is still hard though. I like the word "emit" because it doesn't >>>>> have a >>>>> concept of directionality, i.e. we're not sending it to somewhere, we're >>>>> just >>>>> emitting it to the wild. That kind of matches with our vague definition >>>>> of a >>>>> "local" or "system" sampling. >>>>> >>>> >>>> I like the word "emit". >>>> >>>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. >>>>> output(sample, cookie=...). >>>>> >>>>> The main concern is that we already have truncation implemented, so we >>>>> can be more >>>>> flexible in a way we generate actions instead of adding new attributes. >>>>> >>>>> Thoughts? >>>> >>>> See my comment above. I am not suggesting we use the current trunc >>>> implementation nor to add a new one, just to use psample's. >>>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
