On 5/14/24 09:42, Adrian Moreno wrote:
> 
> 
> On 5/10/24 12:14 PM, Eelco Chaudron wrote:
>>
>>
>> On 10 May 2024, at 10:23, Adrian Moreno wrote:
>>
>>> On 5/10/24 9:14 AM, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>>>
>>>>> This is the userspace counterpart of the work being done in the kernel
>>>>> [1]. Sending it as RFC to get some early feedback on the overall
>>>>> solution.
>>>>>
>>>>> ** Problem description **
>>>>> Currently, OVS supports several observability features, such as
>>>>> per-bridge IPFIX, per-flow IPFIX and sFlow. However, given the
>>>>> complexity of OVN-generated pipelines, a single sample is typically not
>>>>> enough to troubleshoot what is OVN/OVS is doing with the packet. We need
>>>>> highler level metadata alongside the packet sample.
>>>>>
>>>>> This can be achieved by the means of per-flow IPFIX sampling and
>>>>> NXAST_SAMPLE action, through which OVN can add two arbitrary 32-bit
>>>>> values (observation_domain_id and observation_point_id) that are sent
>>>>> alongside the packet information in the IPFIX frames.
>>>>>
>>>>> However, there is a fundamental limitation of the existing sampling
>>>>> infrastructure, specially in the kernel datapath: samples are handled by
>>>>> ovs-vswitchd, forcing the need for an upcall (userspace action). The
>>>>> fact that samples share the same netlink sockets and handler thread cpu
>>>>> time with actual packet misse, can easily cause packet drops making this
>>>>> solution unfit for use in highly loaded production systems.
>>>>>
>>>>> Users are left with little option than guessing what sampling rate will
>>>>> be OK for their traffic pattern and dealing with the lost accuracy.
>>>>>
>>>>> ** Feature Description **
>>>>> In order to solve this situation and enable this feature to be safely
>>>>> turned on in production, this RFC uses the psample support proposed in
>>>>> [1] to send NXAST_SAMPLE samples to psample adding the observability
>>>>> domain and point information as metadata.
>>>>>
>>>>> ~~ API ~~
>>>>> The API is simply a new field called "psample_group" in the
>>>>> Flow_Sample_Collector_Set (FSCS) table. Configuring this value is
>>>>> orthogonal to also associating the FSCS entry to an entry in the IPFIX
>>>>> table.
>>>>>
>>>>> Apart from that configuration, the controller needs to add NXAST_SAMPLE
>>>>> actions that refer the entry's id.
>>>>>
>>>>> ~~ HW Offload ~~
>>>>> psample is already supported by tc using the act_sample action. The
>>>>> metadata is taken from the actions' cookie.
>>>>> This series also adds support for offloading the odp sample action,
>>>>> only when it includes psample information but not nested actions.
>>>>>
>>>>> A bit of special care has to be taken in the tc layer to not store the
>>>>> ufid in the sample's cookie as it's used to carry action-specific data.
>>>>>
>>>>> ~~ Datapath support ~~
>>>>> This new behavior of the datapth sample action is only supported on the
>>>>> kernel datapath. A more detailed analysis is needed (and planned) to
>>>>> find a way to also optimize the userspace datapath. However, although
>>>>> potentially inefficient, there is not that much risk of dropping packets
>>>>> with the existing IPFIX infrastructure.
>>>>>
>>>>> ~~ Testing ~~
>>>>> The series includes an utility program called "ovs-psample" that listens
>>>>> to packets multicasted by the psample module and prints them (also
>>>>> printing the obs_domain and obs_point ids). In addition the kernel
>>>>> series includes a tracepoint for easy testing and troubleshooting.
>>>>>
>>>>> [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=847473
>>>>
>>>> Hi Andrew,
>>>>
>>>
>>> Who is Andrew? :-D
>>
>> Sorry :( Handling too many things in parallel...
>>
>>>> Thanks for sending out this RFC series. I did a code-only review (no 
>>>> testing), and the main concern I have (besides some locking) is the direct 
>>>> mapping to psample. If we decide that for userspace we are going to use 
>>>> USDT probes, we need another target, and we will duplicate a lot of code. 
>>>> My proposal would be to have a more general target name like system, or 
>>>> local (or a better name ;). This will be a system-local dpif target with a 
>>>> 32-bit group id.
>>>>
>>>
>>> Regarding the name, I agree adding "psample" to the API can be confusing 
>>> for userspace datapath. Besides, I think it should also be in sync with the 
>>> odp naming:
>>>
>>> "system": My initial thought was it could be confused with the "system@" 
>>> name of the kernel datapath, but that is not present in the OVSDB level so 
>>> it could work. SAMPLE_ATTR_{SYSTEM_GROUP,SYSTEM_USER_COOKIE}?
>>>
>>> "offload": I did consider this but I discarded it as it could be confused 
>>> with actual hw/tc offloading of the sample action.
>>>
>>> "local": Funny, I initially thought of the opposite, i.e: "external" as the 
>>> sample collection and potential encoding (which is currently "local" to 
>>> ovs-vswitchd) is now some "external" process. Still I guess the fact you 
>>> thought of the opposite term suggests it's not expressive enough.

Collector is always an external process...

>>>
>>> "raw": As in, samples are not IPFIX any more (unless the external collector 
>>> decides to export them in IPFIX) but we're exporting the raw packet 
>>> (alongside some metadata). SAMPLE_ATTR_{RAW_GROUP, RAW_USER_COOKIE}?
>>>
>>> "dpif": As in, it depends on the dpif implementation. However, this term 
>>> kind of internal to OVS (it appears only once in ovs-vswitchd.conf.db(5) 
>>> without much clarification). Its more human synonym "datapath", or "dp" for 
>>> short would yield kind of redundant odp attributes: 
>>> SAMPLE_ATTR_{DP_GROUP,DP_USER_COOKIE} but could still work...
>>>
>>> Any other suggestion or preference?
>>
>> Yes naming is hard, I just thought of ‘localhost’, ‘localEndpoint’, just 
>> adding another option…
>>
>> Does anyone else have a cool name?!?
> 
> 
> Ilya, do you have any suggestion or preference? Mine are "raw" and "system".

TBH, "local" sounds best to me, because other sampling methods send
packets somewhere remote, while this one only emits them locally on
the same system.

"system" maybe a second best, but it is so ambiguous so it doesn't
really mean anything.  "raw" also means nothing to me and it is not
even raw, it's wrapped into a netlink message with some cookie / IDs
attached to it.

For the kernel uAPI we may avoid naming the sampling method by naming
the action instead.  See the other thread.

We can't do that for the database interface, in-code functions and
structures though, so we need a name for those.  I'd vote of "local",
"local_sample", e.g.:

   create FLow_Sample_Collector_Set id=1 local-sample-group=10

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to