On 10 May 2024, at 10:23, Adrian Moreno wrote:
> On 5/10/24 9:14 AM, Eelco Chaudron wrote: >> >> >> On 24 Apr 2024, at 21:53, Adrian Moreno wrote: >> >>> This is the userspace counterpart of the work being done in the kernel >>> [1]. Sending it as RFC to get some early feedback on the overall >>> solution. >>> >>> ** Problem description ** >>> Currently, OVS supports several observability features, such as >>> per-bridge IPFIX, per-flow IPFIX and sFlow. However, given the >>> complexity of OVN-generated pipelines, a single sample is typically not >>> enough to troubleshoot what is OVN/OVS is doing with the packet. We need >>> highler level metadata alongside the packet sample. >>> >>> This can be achieved by the means of per-flow IPFIX sampling and >>> NXAST_SAMPLE action, through which OVN can add two arbitrary 32-bit >>> values (observation_domain_id and observation_point_id) that are sent >>> alongside the packet information in the IPFIX frames. >>> >>> However, there is a fundamental limitation of the existing sampling >>> infrastructure, specially in the kernel datapath: samples are handled by >>> ovs-vswitchd, forcing the need for an upcall (userspace action). The >>> fact that samples share the same netlink sockets and handler thread cpu >>> time with actual packet misse, can easily cause packet drops making this >>> solution unfit for use in highly loaded production systems. >>> >>> Users are left with little option than guessing what sampling rate will >>> be OK for their traffic pattern and dealing with the lost accuracy. >>> >>> ** Feature Description ** >>> In order to solve this situation and enable this feature to be safely >>> turned on in production, this RFC uses the psample support proposed in >>> [1] to send NXAST_SAMPLE samples to psample adding the observability >>> domain and point information as metadata. >>> >>> ~~ API ~~ >>> The API is simply a new field called "psample_group" in the >>> Flow_Sample_Collector_Set (FSCS) table. Configuring this value is >>> orthogonal to also associating the FSCS entry to an entry in the IPFIX >>> table. >>> >>> Apart from that configuration, the controller needs to add NXAST_SAMPLE >>> actions that refer the entry's id. >>> >>> ~~ HW Offload ~~ >>> psample is already supported by tc using the act_sample action. The >>> metadata is taken from the actions' cookie. >>> This series also adds support for offloading the odp sample action, >>> only when it includes psample information but not nested actions. >>> >>> A bit of special care has to be taken in the tc layer to not store the >>> ufid in the sample's cookie as it's used to carry action-specific data. >>> >>> ~~ Datapath support ~~ >>> This new behavior of the datapth sample action is only supported on the >>> kernel datapath. A more detailed analysis is needed (and planned) to >>> find a way to also optimize the userspace datapath. However, although >>> potentially inefficient, there is not that much risk of dropping packets >>> with the existing IPFIX infrastructure. >>> >>> ~~ Testing ~~ >>> The series includes an utility program called "ovs-psample" that listens >>> to packets multicasted by the psample module and prints them (also >>> printing the obs_domain and obs_point ids). In addition the kernel >>> series includes a tracepoint for easy testing and troubleshooting. >>> >>> [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=847473 >> >> Hi Andrew, >> > > Who is Andrew? :-D Sorry :( Handling too many things in parallel... >> Thanks for sending out this RFC series. I did a code-only review (no >> testing), and the main concern I have (besides some locking) is the direct >> mapping to psample. If we decide that for userspace we are going to use USDT >> probes, we need another target, and we will duplicate a lot of code. My >> proposal would be to have a more general target name like system, or local >> (or a better name ;). This will be a system-local dpif target with a 32-bit >> group id. >> > > Regarding the name, I agree adding "psample" to the API can be confusing for > userspace datapath. Besides, I think it should also be in sync with the odp > naming: > > "system": My initial thought was it could be confused with the "system@" name > of the kernel datapath, but that is not present in the OVSDB level so it > could work. SAMPLE_ATTR_{SYSTEM_GROUP,SYSTEM_USER_COOKIE}? > > "offload": I did consider this but I discarded it as it could be confused > with actual hw/tc offloading of the sample action. > > "local": Funny, I initially thought of the opposite, i.e: "external" as the > sample collection and potential encoding (which is currently "local" to > ovs-vswitchd) is now some "external" process. Still I guess the fact you > thought of the opposite term suggests it's not expressive enough. > > "raw": As in, samples are not IPFIX any more (unless the external collector > decides to export them in IPFIX) but we're exporting the raw packet > (alongside some metadata). SAMPLE_ATTR_{RAW_GROUP, RAW_USER_COOKIE}? > > "dpif": As in, it depends on the dpif implementation. However, this term kind > of internal to OVS (it appears only once in ovs-vswitchd.conf.db(5) without > much clarification). Its more human synonym "datapath", or "dp" for short > would yield kind of redundant odp attributes: > SAMPLE_ATTR_{DP_GROUP,DP_USER_COOKIE} but could still work... > > Any other suggestion or preference? Yes naming is hard, I just thought of ‘localhost’, ‘localEndpoint’, just adding another option… Does anyone else have a cool name?!? >> I’ll send out my comments to the individual patches shortly. >> >> //Eelco >> >>> Adrian Moreno (11): >>> odp-util: Add support to new psample uAPI. >>> ofproto_dpif: Check for psample support. >>> ofproto: Add ofproto-dpif-psample. >>> vswitchd: Add psample to schema and configure it. >>> ofproto_dpif_xlate: Use psample for OFP samples. >>> utilities: Add ovs-psample. >>> netlink-offload-tc: Rename act_cookie->flow_cookie. >>> netdev-offload-tc: Add sample support. >>> ofproto-dpif-xlate-cache: Add psample to xcache. >>> ofproto-dpif-psample: Add command to show stats. >>> tests: Add test for sample offloading. >>> >>> Documentation/automake.mk | 1 + >>> Documentation/conf.py | 2 + >>> Documentation/ref/index.rst | 1 + >>> Documentation/ref/ovs-psample.8.rst | 47 ++++ >>> include/linux/automake.mk | 6 +- >>> include/linux/openvswitch.h | 49 ++++- >>> include/linux/pkt_cls.h | 2 + >>> include/linux/psample.h | 64 ++++++ >>> include/linux/tc_act/tc_sample.h | 26 +++ >>> lib/netdev-offload-tc.c | 75 ++++++- >>> lib/odp-execute.c | 3 + >>> lib/odp-util.c | 150 +++++++++---- >>> lib/tc.c | 98 ++++++++- >>> lib/tc.h | 9 +- >>> ofproto/automake.mk | 2 + >>> ofproto/ofproto-dpif-ipfix.c | 2 + >>> ofproto/ofproto-dpif-psample.c | 266 ++++++++++++++++++++++ >>> ofproto/ofproto-dpif-psample.h | 39 ++++ >>> ofproto/ofproto-dpif-xlate-cache.c | 11 +- >>> ofproto/ofproto-dpif-xlate-cache.h | 6 + >>> ofproto/ofproto-dpif-xlate.c | 158 +++++++++---- >>> ofproto/ofproto-dpif-xlate.h | 3 +- >>> ofproto/ofproto-dpif.c | 96 +++++++- >>> ofproto/ofproto-dpif.h | 7 +- >>> ofproto/ofproto-provider.h | 9 + >>> ofproto/ofproto.c | 10 + >>> ofproto/ofproto.h | 8 + >>> python/ovs/flow/odp.py | 2 + >>> rhel/openvswitch-fedora.spec.in | 2 + >>> rhel/openvswitch.spec.in | 1 + >>> tests/odp.at | 5 + >>> tests/system-common-macros.at | 4 + >>> tests/system-offloads-traffic.at | 53 +++++ >>> utilities/automake.mk | 9 + >>> utilities/ovs-psample.c | 330 ++++++++++++++++++++++++++++ >>> vswitchd/bridge.c | 54 ++++- >>> vswitchd/vswitch.ovsschema | 7 +- >>> vswitchd/vswitch.xml | 32 ++- >>> 38 files changed, 1533 insertions(+), 116 deletions(-) >>> create mode 100644 Documentation/ref/ovs-psample.8.rst >>> create mode 100644 include/linux/psample.h >>> create mode 100644 include/linux/tc_act/tc_sample.h >>> create mode 100644 ofproto/ofproto-dpif-psample.c >>> create mode 100644 ofproto/ofproto-dpif-psample.h >>> create mode 100644 utilities/ovs-psample.c >>> >>> -- >>> 2.44.0 >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
