On Fri, Nov 4, 2022 at 4:50 PM Adrian Moreno <[email protected]> wrote:
> Very often when troubleshooting networking issues in an OVN cluster one
> would like to know if any packet (or a specific one) is being dropped by
> OVN.
>
> Currently, this cannot be known because of two main reasons:
>
> 1 - Implicit drops: Some tables do not have a default action
> (priority=0, match=1). In this case, a packet that does not match any
> rule will be silently dropped.
>
> 2 - Even on explicit drops, we only know a packet was dropped. We lack
> information about that packet.
>
> In order to improve this, this series introduces a two-fold solution:
>
> - First, make all drops explicit:
> - northd add a default (match = "1") "drop;" action to those tables
> that currently lack one.
> - ovn-controller add an explicit drop action on those tables are not
> associated with logical flows (i.e: physical-to-logical mappings).
>
> - Secondly, allow sampling of all drops. By introducing a new OVN
> action: "sample" (equivalent to OVS's), OVN can make OVS sample the
> packets as they are dropped. In order to be able to correlate those
> samples back to what exact rule generated them, the user specifies the
> a 8-bit observation_domain_id. Based on that, the samples contain
> the following fields:
> - obs_domain_id:
> - 8 most significant bits = the provided observation_domain_id.
> - 24 least significant bits = the datapath's tunnely key if the
> drop comes from a lflow or zero otherwise.
> - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the
> cookie) if the drop comes from an lflow or the table number
> otherwise.
>
> Based on the above changes in the flows, all of which are optional,
> users can collect IPFIX samples of the packets that are dropped by OVN
> which contain header information useful for debugging.
>
> * Note on observation_domain_ids:
> By allowing the user to specify only the 8 most significant bits of the
> obs_domain_id and having OVN combine it with the datapath's tunnel key,
> OVN could be extended to support more than one "sampling" application.
> For instance, ACL sampling could be developed in the future and, by
> specifying a different observation_domain_id, it could co-exist with the
> drop sampling mode implemented in the current series while still
> allowing to uniquely identify the flow that created the sample.
>
> * Notes on testing and usage:
> Any IPFIX collector that parses ObservationPointID and
> ObservationDomainID fields can be used. For instance, nfdump 1.7
> supports these fields in nfdump. Example of how to capture and analyze
> drops:
> # Enable debug sampling:
> $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1
> options:debug_drop_domain_id=1
> # Start nfcapd:
> nfcapd -p 2055 -l nfcap &
> # Configue sampling on the OVS you want to inspect:
> $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX
> targets=\"172.18.0.1:2055\" -- create Flow_Sample_Collector_Set
> bridge=@br id=1
> # Inspect samples and figure out what LogicalFlow caused them:
> $ nfdump -r nfcap -o fmt:'%line %odid %opid'
> Date first seen Duration Proto Src IP Addr:Port
> Dst IP Addr:Port Packets Bytes Flows obsDomainID obsPointID
> 1970-01-01 01:09:36.000 00:00:00.000 UDP 172.18.0.1:49230 ->
> 239.255.255.250:1900 12 6356 1 0x001000009 0x00d8dd23c7
> 1970-01-01 01:01:34.000 00:00:00.000 UDP 172.18.0.1:5353 ->
> 224.0.0.251:5353 165 89257 1 0x001000009 0x00d8dd23c7
> [...]
> $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7
> _uuid : d8dd23c7-1451-4ea3-add7-8d68b4be4691
> actions :
> "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie);
> /* drop */"
> controller_meter : []
> external_ids : {source="northd.c:12504",
> stage-name=lr_in_ip_input}
> logical_datapath : []
> logical_dp_group : 0dc1b195-c647-4277-aea0-0bad5e896f51
> match : "ip4.mcast || ip6.mcast"
> pipeline : ingress
> priority : 82
> table_id : 3
> tags : {}
> hash : 0
>
> V4 -> V5: Added documentation
> V3 -> V4: Make explicit drops the default behavior.
> V2 -> V3: Fix rebase problem on unit test
> V1 -> V2
> - Rebased and Addressed Mark's comments.
> - Added NEWS section.
>
>
> Adrian Moreno (3):
> actions: add sample action
> northd: make default drops explicit
> northd: add drop sampling
>
> NEWS | 2 +
> controller/lflow.c | 1 +
> controller/ovn-controller.c | 44 ++++++
> controller/physical.c | 77 ++++++++-
> controller/physical.h | 6 +
> include/ovn/actions.h | 16 ++
> lib/actions.c | 120 ++++++++++++++
> northd/automake.mk | 2 +
> northd/debug.c | 98 ++++++++++++
> northd/debug.h | 30 ++++
> northd/northd.c | 109 ++++++++-----
> northd/ovn-northd.8.xml | 66 +++++++-
> ovn-nb.xml | 28 ++++
> ovn-sb.xml | 81 ++++++++++
> tests/ovn-northd.at | 84 ++++++++++
> tests/ovn.at | 303 ++++++++++++++++++++++++++++++++----
> tests/test-ovn.c | 3 +
> utilities/ovn-trace.c | 2 +
> 18 files changed, 996 insertions(+), 76 deletions(-)
> create mode 100644 northd/debug.c
> create mode 100644 northd/debug.h
>
> --
> 2.37.3
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
The whole series looks good to me, thanks.
Reviewed-by: Ales Musil <[email protected]>
--
Ales Musil
Senior Software Engineer - OVN Core
Red Hat EMEA <https://www.redhat.com>
[email protected] IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev