On 14 May 2024, at 13:05, Ilya Maximets wrote:
> On 5/14/24 12:14, Adrian Moreno wrote: >> >> >> On 5/14/24 11:09 AM, Ilya Maximets wrote: >>> On 5/14/24 09:39, Adrian Moreno wrote: >>>> >>>> >>>> On 5/10/24 12:45 PM, Adrian Moreno wrote: >>>>> >>>>> >>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote: >>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote: >>>>>> >>>>>>> The new odp sample attributes allow userspace to specify a group_id and >>>>>>> user-defined cookie to be passed down to psample. >>>>>>> >>>>>>> Add support for parsing and formatting such action. >>>>>>> >>>>>>> Signed-off-by: Adrian Moreno <[email protected]> >>>>>> >>>>>> Hi Adrian, >>>>>> >>>>>> See my comments below inline. >>>>>> >>>>>> //Eelco >>>>>> >>>>>>> --- >>>>>>> include/linux/openvswitch.h | 49 +++++++++--- >>>>>>> lib/odp-execute.c | 3 + >>>>>>> lib/odp-util.c | 150 >>>>>>> ++++++++++++++++++++++++++--------- >>>>>>> ofproto/ofproto-dpif-ipfix.c | 2 + >>>>>>> python/ovs/flow/odp.py | 2 + >>>>>>> tests/odp.at | 5 ++ >>>>>>> 6 files changed, 163 insertions(+), 48 deletions(-) >>>>>>> >>>>>>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h >>>>>>> index d9fb991ef..3e748405c 100644 >>>>>>> --- a/include/linux/openvswitch.h >>>>>>> +++ b/include/linux/openvswitch.h >>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr { >>>>>>> #define OVS_UFID_F_OMIT_MASK (1 << 1) >>>>>>> #define OVS_UFID_F_OMIT_ACTIONS (1 << 2) >>>>>>> >>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 >>>>>>> /** >>>>>>> * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE >>>>>>> action. >>>>>>> * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to >>>>>>> sample with >>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr { >>>>>>> * %UINT32_MAX samples all packets and intermediate values sample >>>>>>> intermediate >>>>>>> * fractions of packets. >>>>>>> * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling >>>>>>> event. >>>>>>> - * Actions are passed as nested attributes. >>>>>>> + * Actions are passed as nested attributes. Optional if >>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set. >>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as >>>>>>> psample group. >>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set. >>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie >>>>>>> that, if >>>>>>> + * provided, will be copied to the psample cookie. >>>>>> >>>>>> Should we mention any limit on the actual length of the cookie? >>>>>> >>>>>> Any reason we explicitly call this psample? From an OVS perspective, this >>>>>> could just be >>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE >>>>>> and >>>>>> _GROUP. Other datapaths, do not have PSAMPLE. >>>>>> >>>>> >>>>> See my response to your comment on the cover-letter for possible name >>>>> alternatives. >>>>> >>>>> >>>>>> Thinking about it more, from an OVS perspective we should probably not >>>>>> even >>>>>> send down a COOKIE, but we should send down an obs_domain_id and >>>>>> obs_point_id, >>>>>> and then the kernel can move this into a cookie. >>>>>> >>>>> >>>>> I did consider this. However, the opaque cookie fits well with the tc API >>>>> which >>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if the >>>>> action >>>>> is offloaded OVS needs to "encode" the two 32-bit values into the cookie >>>>> anyways. >>>>> >>>>>> The collector itself could be called system/local collector, or >>>>>> something like >>>>>> that. This way it can use for example psample for kernel and UDST probes >>>>>> for >>>>>> usespace. Both can pass the group and cookie (obs_domain_id and >>>>>> obs_point_id). >>>>>> >>>>>> Also, the presence of any of this should not dictate the psample action, >>>>>> we >>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag. >>>>>> >>>>> >>>>> It clearer and it might simplify option verification a bit, but isn't a >>>>> bit >>>>> redundant? There is no flag for action execution for instance, the >>>>> presence of >>>>> the attribute is enough. >>>>> >>>>> An alternative would be to have nested attributes, e.g: >>>>> >>>>> enum ovs_sample_attr { >>>>> OVS_SAMPLE_ATTR_UNSPEC, >>>>> OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */ >>>>> OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* >>>>> attributes. */ >>>>> OVS_SAMPLE_ATTR_SYSTEM, /* Nested OVS_SAMPLE_SYSTEM_ATTR_* >>>>> attributes. */ >>>>> >>>>> __OVS_SAMPLE_ATTR_MAX, >>>>> }; >>>>> >>>>> enum ovs_sample_system_attr { >>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN, /* u32 number */ >>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */ >>>>> OVS_SAMPLE_SYSTEM_ATTR_COOKIE, /* Nested OVS_ACTION_ATTR_* >>>>> >>>>> __OVS_SSAMPLE_SYSTEM_ATTR_MAX, >>>>> }; >>>>> >>>>> WDYT? >>>>> >>>> >>>> Another benefit of nested attributes is the easier addition of other >>>> sample-related attributes, I already have a candidate: >>>> OVS_SAMPLE_SYSTEM_TRUNC >>>> >>>> This will be passed to psample (md->trunc_size) who will truncate the >>>> packet >>>> size to the specified value. This can be useful for large packets for >>>> which we >>>> are only interested in the headers. >>>> >>>> WDYT? >>>> >>>> [snip] >>>> >>> >>> Hrm. This makes me think these should not be attributes of a sample action >>> at all, >>> but rather we should have a separate EMIT_SAMPLE action that just calls >>> psample, so >>> we can combine it with other actions: >>> >>> sample(..., actions(userspace(...),trunc(100),emit_sample(cookie=...))) >>> >>> or even: >>> >>> sample(..., >>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...))) >>> >>> This will also simplify the requires_datapath_assistance() check as we can >>> just >>> check the action and not iterate over the list of attributes. And it will >>> be >>> capable to only send this one action to the datapath and not the whole >>> thing. >>> >> >> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid using >> the >> existing trunc implementation. >> >> Truncation is implemented inside psample so we would not be maintaining two >> implementations. > > We will maintain trunc action and a trunc attribute for sampling, so we will > have a duplication here. > >> psample implementation does not require packet clone (which is >> needed for OVS trunc) plus it can be easily optimized out if there is no >> listener in the multicast group (I plan to add that optimization in the next >> kernel series). > > 'trunc' doesn't require a clone in case there are no other actions after, > which > is the main use case for drop sampling. > > BTW, having explicit drop actions might have broken some of the assumptions > around this optimization, but it should be fixable. > > And psample itself might need a clone() in case there was a trunc() before > sampling. psample will report skb->len as original packet size, but that > is will not be correct as we have a postponed cutlen, so the reported length > must be the length after the cut and that can't be done without actually > cutting the skb at this point. > >> >> Having said that, I'm not against adding a new action if that simplifies >> things. >> I think it might simplify things in ovs-vswitchd and even in the kernel. >> However, I have one concern. >> >> IIUC and based on your examples, your suggestions means we would not have >> sampling rate inside emit_sample, right? This might be a problem because >> psample >> does have sampling rate which I believe to be crucial to balance performance >> impact and accuracy. > > psample doesn't limit the rate, it just reports a number. Rate is part of > the main sample() action, i.e. > > sample(sample=75%,actions(emit_sample(cookie=...))) > > Having rate also in the 'emit_sample' would be another API duplication. I agree, but this is the confusing part :) If you use the psample TC action, it will have the rate-limiting as part of the action. So if we offload the above, we will have a nested tc psample action, the outer one with the desired sample rate, and the inner one with a 100% sample rate. Actually, the inner one needs to be handled by sw as it will go to vswitchd (and than back to psample?). I guess this avoids the idea of doing this fast… Maybe if emit_sample() is the only action we can do some optimisation, but it needs to be translatable back and forth from TC to OVS_ATTRs. I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by the way Nvidia was going to work on once the sFlow part was accepted. > BTW, "sampling" column is part of IPFIX and sFlow tables and you're not > adding a new table for this new smapling method, i.e. it will not be > possible to set sampling rate for psample with the current database > configuration. > >> This value must be present in the psample sample or else >> users will not know how inaccurate the sample is. > > Users can check the sampling configuration in the database if we will > have one. And they are configuring it in the first place. And it will > be confusing to see two rates in the same action list. > > It might be possible to carry the current rate in the OVS_CB(skb) and > pass it to the psample. > > In any case, the rate reported by psample will not be accurate if there > are nested sample actions, unless we carry and properly multiply / divide > probabilities when entering / exiting the sample action. > >> >> >>> Naming is still hard though. I like the word "emit" because it doesn't >>> have a >>> concept of directionality, i.e. we're not sending it to somewhere, we're >>> just >>> emitting it to the wild. That kind of matches with our vague definition of >>> a >>> "local" or "system" sampling. >>> >> >> I like the word "emit". >> >>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. output(sample, >>> cookie=...). >>> >>> The main concern is that we already have truncation implemented, so we can >>> be more >>> flexible in a way we generate actions instead of adding new attributes. >>> >>> Thoughts? >> >> See my comment above. I am not suggesting we use the current trunc >> implementation nor to add a new one, just to use psample's. >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
