On 11/15/22 10:49, Ales Musil wrote: > On Fri, Nov 4, 2022 at 4:50 PM Adrian Moreno <[email protected]> wrote: > >> Very often when troubleshooting networking issues in an OVN cluster one >> would like to know if any packet (or a specific one) is being dropped by >> OVN. >> >> Currently, this cannot be known because of two main reasons: >> >> 1 - Implicit drops: Some tables do not have a default action >> (priority=0, match=1). In this case, a packet that does not match any >> rule will be silently dropped. >> >> 2 - Even on explicit drops, we only know a packet was dropped. We lack >> information about that packet. >> >> In order to improve this, this series introduces a two-fold solution: >> >> - First, make all drops explicit: >> - northd add a default (match = "1") "drop;" action to those tables >> that currently lack one. >> - ovn-controller add an explicit drop action on those tables are not >> associated with logical flows (i.e: physical-to-logical mappings). >> >> - Secondly, allow sampling of all drops. By introducing a new OVN >> action: "sample" (equivalent to OVS's), OVN can make OVS sample the >> packets as they are dropped. In order to be able to correlate those >> samples back to what exact rule generated them, the user specifies the >> a 8-bit observation_domain_id. Based on that, the samples contain >> the following fields: >> - obs_domain_id: >> - 8 most significant bits = the provided observation_domain_id. >> - 24 least significant bits = the datapath's tunnely key if the >> drop comes from a lflow or zero otherwise. >> - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the >> cookie) if the drop comes from an lflow or the table number >> otherwise. >> >> Based on the above changes in the flows, all of which are optional, >> users can collect IPFIX samples of the packets that are dropped by OVN >> which contain header information useful for debugging. >> >> * Note on observation_domain_ids: >> By allowing the user to specify only the 8 most significant bits of the >> obs_domain_id and having OVN combine it with the datapath's tunnel key, >> OVN could be extended to support more than one "sampling" application. >> For instance, ACL sampling could be developed in the future and, by >> specifying a different observation_domain_id, it could co-exist with the >> drop sampling mode implemented in the current series while still >> allowing to uniquely identify the flow that created the sample. >> >> * Notes on testing and usage: >> Any IPFIX collector that parses ObservationPointID and >> ObservationDomainID fields can be used. For instance, nfdump 1.7 >> supports these fields in nfdump. Example of how to capture and analyze >> drops: >> # Enable debug sampling: >> $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1 >> options:debug_drop_domain_id=1 >> # Start nfcapd: >> nfcapd -p 2055 -l nfcap & >> # Configue sampling on the OVS you want to inspect: >> $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX >> targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set >> bridge=@br id=1 >> # Inspect samples and figure out what LogicalFlow caused them: >> $ nfdump -r nfcap -o fmt:'%line %odid %opid' >> Date first seen Duration Proto Src IP Addr:Port >> Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID >> 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 -> >> 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7 >> 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 -> >> 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7 >> [...] >> $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7 >> _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691 >> actions : >> "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie); >> /* drop */" >> controller_meter : [] >> external_ids : {source="northd.c:12504", >> stage-name=lr_in_ip_input} >> logical_datapath : [] >> logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51 >> match : "ip4.mcast || ip6.mcast" >> pipeline : ingress >> priority : 82 >> table_id : 3 >> tags : {} >> hash : 0 >> >> V4 -> V5: Added documentation >> V3 -> V4: Make explicit drops the default behavior. >> V2 -> V3: Fix rebase problem on unit test >> V1 -> V2 >> - Rebased and Addressed Mark's comments. >> - Added NEWS section. >> >> >> Adrian Moreno (3): >> actions: add sample action >> northd: make default drops explicit >> northd: add drop sampling >> >> NEWS | 2 + >> controller/lflow.c | 1 + >> controller/ovn-controller.c | 44 ++++++ >> controller/physical.c | 77 ++++++++- >> controller/physical.h | 6 + >> include/ovn/actions.h | 16 ++ >> lib/actions.c | 120 ++++++++++++++ >> northd/automake.mk | 2 + >> northd/debug.c | 98 ++++++++++++ >> northd/debug.h | 30 ++++ >> northd/northd.c | 109 ++++++++----- >> northd/ovn-northd.8.xml | 66 +++++++- >> ovn-nb.xml | 28 ++++ >> ovn-sb.xml | 81 ++++++++++ >> tests/ovn-northd.at | 84 ++++++++++ >> tests/ovn.at | 303 ++++++++++++++++++++++++++++++++---- >> tests/test-ovn.c | 3 + >> utilities/ovn-trace.c | 2 + >> 18 files changed, 996 insertions(+), 76 deletions(-) >> create mode 100644 northd/debug.c >> create mode 100644 northd/debug.h >> >> -- >> 2.37.3 >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > The whole series looks good to me, thanks. > > Reviewed-by: Ales Musil <[email protected]> >
Thanks Adrian, Ales, Mark, Numan! The series looks OK to me too. I only have a few minor comments; I replied to the individual patches. I'm OK to take care of fixing those minor issues myself before pushing the patches. Just let me know what you prefer. Thanks, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
