On 5/14/24 14:43, Eelco Chaudron wrote: > > > On 14 May 2024, at 14:28, Ilya Maximets wrote: > >> On 5/14/24 13:27, Eelco Chaudron wrote: >>> >>> >>> On 14 May 2024, at 13:05, Ilya Maximets wrote: >>> >>>> On 5/14/24 12:14, Adrian Moreno wrote: >>>>> >>>>> >>>>> On 5/14/24 11:09 AM, Ilya Maximets wrote: >>>>>> On 5/14/24 09:39, Adrian Moreno wrote: >>>>>>> >>>>>>> >>>>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote: >>>>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote: >>>>>>>>> >>>>>>>>>> The new odp sample attributes allow userspace to specify a group_id >>>>>>>>>> and >>>>>>>>>> user-defined cookie to be passed down to psample. >>>>>>>>>> >>>>>>>>>> Add support for parsing and formatting such action. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Adrian Moreno <[email protected]> >>>>>>>>> >>>>>>>>> Hi Adrian, >>>>>>>>> >>>>>>>>> See my comments below inline. >>>>>>>>> >>>>>>>>> //Eelco >>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> include/linux/openvswitch.h | 49 +++++++++--- >>>>>>>>>> lib/odp-execute.c | 3 + >>>>>>>>>> lib/odp-util.c | 150 >>>>>>>>>> ++++++++++++++++++++++++++--------- >>>>>>>>>> ofproto/ofproto-dpif-ipfix.c | 2 + >>>>>>>>>> python/ovs/flow/odp.py | 2 + >>>>>>>>>> tests/odp.at | 5 ++ >>>>>>>>>> 6 files changed, 163 insertions(+), 48 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/include/linux/openvswitch.h >>>>>>>>>> b/include/linux/openvswitch.h >>>>>>>>>> index d9fb991ef..3e748405c 100644 >>>>>>>>>> --- a/include/linux/openvswitch.h >>>>>>>>>> +++ b/include/linux/openvswitch.h >>>>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr { >>>>>>>>>> #define OVS_UFID_F_OMIT_MASK (1 << 1) >>>>>>>>>> #define OVS_UFID_F_OMIT_ACTIONS (1 << 2) >>>>>>>>>> >>>>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 >>>>>>>>>> /** >>>>>>>>>> * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE >>>>>>>>>> action. >>>>>>>>>> * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to >>>>>>>>>> sample with >>>>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr { >>>>>>>>>> * %UINT32_MAX samples all packets and intermediate values sample >>>>>>>>>> intermediate >>>>>>>>>> * fractions of packets. >>>>>>>>>> * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in >>>>>>>>>> sampling event. >>>>>>>>>> - * Actions are passed as nested attributes. >>>>>>>>>> + * Actions are passed as nested attributes. Optional if >>>>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set. >>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as >>>>>>>>>> psample group. >>>>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set. >>>>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie >>>>>>>>>> that, if >>>>>>>>>> + * provided, will be copied to the psample cookie. >>>>>>>>> >>>>>>>>> Should we mention any limit on the actual length of the cookie? >>>>>>>>> >>>>>>>>> Any reason we explicitly call this psample? From an OVS perspective, >>>>>>>>> this >>>>>>>>> could just be >>>>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE >>>>>>>>> and >>>>>>>>> _GROUP. Other datapaths, do not have PSAMPLE. >>>>>>>>> >>>>>>>> >>>>>>>> See my response to your comment on the cover-letter for possible name >>>>>>>> alternatives. >>>>>>>> >>>>>>>> >>>>>>>>> Thinking about it more, from an OVS perspective we should probably >>>>>>>>> not even >>>>>>>>> send down a COOKIE, but we should send down an obs_domain_id and >>>>>>>>> obs_point_id, >>>>>>>>> and then the kernel can move this into a cookie. >>>>>>>>> >>>>>>>> >>>>>>>> I did consider this. However, the opaque cookie fits well with the tc >>>>>>>> API which >>>>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if >>>>>>>> the action >>>>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the >>>>>>>> cookie anyways. >>>>>>>> >>>>>>>>> The collector itself could be called system/local collector, or >>>>>>>>> something like >>>>>>>>> that. This way it can use for example psample for kernel and UDST >>>>>>>>> probes for >>>>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and >>>>>>>>> obs_point_id). >>>>>>>>> >>>>>>>>> Also, the presence of any of this should not dictate the psample >>>>>>>>> action, we >>>>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag. >>>>>>>>> >>>>>>>> >>>>>>>> It clearer and it might simplify option verification a bit, but isn't >>>>>>>> a bit >>>>>>>> redundant? There is no flag for action execution for instance, the >>>>>>>> presence of >>>>>>>> the attribute is enough. >>>>>>>> >>>>>>>> An alternative would be to have nested attributes, e.g: >>>>>>>> >>>>>>>> enum ovs_sample_attr { >>>>>>>> OVS_SAMPLE_ATTR_UNSPEC, >>>>>>>> OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */ >>>>>>>> OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* >>>>>>>> attributes. */ >>>>>>>> OVS_SAMPLE_ATTR_SYSTEM, /* Nested OVS_SAMPLE_SYSTEM_ATTR_* >>>>>>>> attributes. */ >>>>>>>> >>>>>>>> __OVS_SAMPLE_ATTR_MAX, >>>>>>>> }; >>>>>>>> >>>>>>>> enum ovs_sample_system_attr { >>>>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN, /* u32 number */ >>>>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */ >>>>>>>> OVS_SAMPLE_SYSTEM_ATTR_COOKIE, /* Nested OVS_ACTION_ATTR_* >>>>>>>> >>>>>>>> __OVS_SSAMPLE_SYSTEM_ATTR_MAX, >>>>>>>> }; >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>> >>>>>>> Another benefit of nested attributes is the easier addition of other >>>>>>> sample-related attributes, I already have a candidate: >>>>>>> OVS_SAMPLE_SYSTEM_TRUNC >>>>>>> >>>>>>> This will be passed to psample (md->trunc_size) who will truncate the >>>>>>> packet >>>>>>> size to the specified value. This can be useful for large packets for >>>>>>> which we >>>>>>> are only interested in the headers. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> [snip] >>>>>>> >>>>>> >>>>>> Hrm. This makes me think these should not be attributes of a sample >>>>>> action at all, >>>>>> but rather we should have a separate EMIT_SAMPLE action that just calls >>>>>> psample, so >>>>>> we can combine it with other actions: >>>>>> >>>>>> sample(..., >>>>>> actions(userspace(...),trunc(100),emit_sample(cookie=...))) >>>>>> >>>>>> or even: >>>>>> >>>>>> sample(..., >>>>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...))) >>>>>> >>>>>> This will also simplify the requires_datapath_assistance() check as we >>>>>> can just >>>>>> check the action and not iterate over the list of attributes. And it >>>>>> will be >>>>>> capable to only send this one action to the datapath and not the whole >>>>>> thing. >>>>>> >>>>> >>>>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid >>>>> using the >>>>> existing trunc implementation. >>>>> >>>>> Truncation is implemented inside psample so we would not be maintaining >>>>> two >>>>> implementations. >>>> >>>> We will maintain trunc action and a trunc attribute for sampling, so we >>>> will >>>> have a duplication here. >>>> >>>>> psample implementation does not require packet clone (which is >>>>> needed for OVS trunc) plus it can be easily optimized out if there is no >>>>> listener in the multicast group (I plan to add that optimization in the >>>>> next >>>>> kernel series). >>>> >>>> 'trunc' doesn't require a clone in case there are no other actions after, >>>> which >>>> is the main use case for drop sampling. >>>> >>>> BTW, having explicit drop actions might have broken some of the assumptions >>>> around this optimization, but it should be fixable. >>>> >>>> And psample itself might need a clone() in case there was a trunc() before >>>> sampling. psample will report skb->len as original packet size, but that >>>> is will not be correct as we have a postponed cutlen, so the reported >>>> length >>>> must be the length after the cut and that can't be done without actually >>>> cutting the skb at this point. >>>> >>>>> >>>>> Having said that, I'm not against adding a new action if that simplifies >>>>> things. >>>>> I think it might simplify things in ovs-vswitchd and even in the kernel. >>>>> However, I have one concern. >>>>> >>>>> IIUC and based on your examples, your suggestions means we would not have >>>>> sampling rate inside emit_sample, right? This might be a problem because >>>>> psample >>>>> does have sampling rate which I believe to be crucial to balance >>>>> performance >>>>> impact and accuracy. >>>> >>>> psample doesn't limit the rate, it just reports a number. Rate is part of >>>> the main sample() action, i.e. >>>> >>>> sample(sample=75%,actions(emit_sample(cookie=...))) >>>> >>>> Having rate also in the 'emit_sample' would be another API duplication. >>> >>> I agree, but this is the confusing part :) If you use the psample TC >>> action, it will have the rate-limiting as part of the action. >>> >>> So if we offload the above, we will have a nested tc psample action, the >>> outer one with the desired sample rate, and the inner one with a 100% >>> sample rate. Actually, the inner one needs to be handled by sw as it will >>> go to vswitchd (and than back to psample?). I guess this avoids the idea of >>> doing this fast… Maybe if emit_sample() is the only action we can do some >>> optimisation, but it needs to be translatable back and forth from TC to >>> OVS_ATTRs. >>> >>> I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by >>> the way Nvidia was going to work on once the sFlow part was accepted. >> >> The actual sample() action can not be offloaded in a generic case. >> There is no smaple action in TC that would accept arbitrary action >> list. We can only offload a few corner cases. Today we offload >> none. With the sFlow series we can offload a single specific >> OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW. >> >> And we can't offload more than one sampling method within a single >> OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent >> probabilities, which is not a desired behavior. >> >> So, we'll always have special case offloading for each particular >> action inside the sample(). > > ACK, just wanted to clarify that we should take extra care when we do this. > And we should build on top of the sflow series :)
Re-reading it. :) > > Not sure about the naming, emit_sample, it’s not really a sample in the case > we are not sampling anything, the action is always taken. We're not sampling, we're emitting an already prepared sample. > Maybe OVS_ACTION_ATTR_LOCAL_COPY? But this comes close to > OVS_ACTION_ATTR_USERSPACE? > >>> >>>> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not >>>> adding a new table for this new smapling method, i.e. it will not be >>>> possible to set sampling rate for psample with the current database >>>> configuration. >>>> >>>>> This value must be present in the psample sample or else >>>>> users will not know how inaccurate the sample is. >>>> >>>> Users can check the sampling configuration in the database if we will >>>> have one. And they are configuring it in the first place. And it will >>>> be confusing to see two rates in the same action list. >>>> >>>> It might be possible to carry the current rate in the OVS_CB(skb) and >>>> pass it to the psample. >>>> >>>> In any case, the rate reported by psample will not be accurate if there >>>> are nested sample actions, unless we carry and properly multiply / divide >>>> probabilities when entering / exiting the sample action. >>>> >>>>> >>>>> >>>>>> Naming is still hard though. I like the word "emit" because it doesn't >>>>>> have a >>>>>> concept of directionality, i.e. we're not sending it to somewhere, we're >>>>>> just >>>>>> emitting it to the wild. That kind of matches with our vague definition >>>>>> of a >>>>>> "local" or "system" sampling. >>>>>> >>>>> >>>>> I like the word "emit". >>>>> >>>>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. >>>>>> output(sample, cookie=...). >>>>>> >>>>>> The main concern is that we already have truncation implemented, so we >>>>>> can be more >>>>>> flexible in a way we generate actions instead of adding new attributes. >>>>>> >>>>>> Thoughts? >>>>> >>>>> See my comment above. I am not suggesting we use the current trunc >>>>> implementation nor to add a new one, just to use psample's. >>>>> >>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
