On Fri, Nov 4, 2022 at 4:50 PM Adrian Moreno <[email protected]> wrote:

> Very often when troubleshooting networking issues in an OVN cluster one
> would like to know if any packet (or a specific one) is being dropped by
> OVN.
>
> Currently, this cannot be known because of two main reasons:
>
> 1 - Implicit drops: Some tables do not have a default action
> (priority=0, match=1). In this case, a packet that does not match any
> rule will be silently dropped.
>
> 2 - Even on explicit drops, we only know a packet was dropped. We lack
> information about that packet.
>
> In order to improve this, this series introduces a two-fold solution:
>
> - First, make all drops explicit:
>    - northd add a default (match = "1") "drop;" action to those tables
>    that currently lack one.
>    - ovn-controller add an explicit drop action on those tables are not
>    associated with logical flows (i.e: physical-to-logical mappings).
>
> - Secondly, allow sampling of all drops. By introducing a new OVN
>   action: "sample" (equivalent to OVS's), OVN can make OVS sample the
>   packets as they are dropped. In order to be able to correlate those
>   samples back to what exact rule generated them, the user specifies the
>   a 8-bit observation_domain_id. Based on that, the samples contain
>   the following fields:
>   - obs_domain_id:
>      - 8 most significant bits = the provided observation_domain_id.
>      - 24 least significant bits = the datapath's tunnely key if the
>        drop comes from a lflow or zero otherwise.
>   - obs_point_id: the first 32-bits of the lflow's UUID (i.e: the
>     cookie) if the drop comes from an lflow or the table number
>     otherwise.
>
> Based on the above changes in the flows, all of which are optional,
> users can collect IPFIX samples of the packets that are dropped by OVN
> which contain header information useful for debugging.
>
> * Note on observation_domain_ids:
> By allowing the user to specify only the 8 most significant bits of the
> obs_domain_id and having OVN combine it with the datapath's tunnel key,
> OVN could be extended to support more than one "sampling" application.
> For instance, ACL sampling could be developed in the future and, by
> specifying a different observation_domain_id, it could co-exist with the
> drop sampling mode implemented in the current series while still
> allowing to uniquely identify the flow that created the sample.
>
> * Notes on testing and usage:
> Any IPFIX collector that parses ObservationPointID and
> ObservationDomainID fields can be used. For instance, nfdump 1.7
> supports these fields in nfdump. Example of how to capture and analyze
> drops:
> # Enable debug sampling:
> $ ovn-nbctl set NB_Global . options:debug_drop_collector_set=1
> options:debug_drop_domain_id=1
> # Start nfcapd:
> nfcapd -p 2055 -l nfcap &
> # Configue sampling on the OVS you want to inspect:
> $ ovs-vsctl --id=@br get Bridge br-int -- --id=@i create IPFIX
> targets=\"172.18.0.1:2055\" --  create Flow_Sample_Collector_Set
> bridge=@br id=1
> # Inspect samples and figure out what LogicalFlow caused them:
> $ nfdump -r nfcap -o fmt:'%line %odid %opid'
> Date first seen             Duration     Proto      Src IP Addr:Port
> Dst IP Addr:Port   Packets    Bytes Flows obsDomainID   obsPointID
> 1970-01-01 01:09:36.000     00:00:00.000 UDP         172.18.0.1:49230 ->
> 239.255.255.250:1900        12     6356     1 0x001000009 0x00d8dd23c7
> 1970-01-01 01:01:34.000     00:00:00.000 UDP         172.18.0.1:5353  ->
> 224.0.0.251:5353       165    89257     1 0x001000009 0x00d8dd23c7
> [...]
> $ ovn-sb vn-sbctl list Logical_Flow | grep -A 11 d8dd23c7
> _uuid               : d8dd23c7-1451-4ea3-add7-8d68b4be4691
> actions             :
> "sample(probability=65535,collector_set=1,obs_domain=1,obs_point=$cookie);
> /* drop */"
> controller_meter    : []
> external_ids        : {source="northd.c:12504",
> stage-name=lr_in_ip_input}
> logical_datapath    : []
> logical_dp_group    : 0dc1b195-c647-4277-aea0-0bad5e896f51
> match               : "ip4.mcast || ip6.mcast"
> pipeline            : ingress
> priority            : 82
> table_id            : 3
> tags                : {}
> hash                : 0
>
> V4 -> V5: Added documentation
> V3 -> V4: Make explicit drops the default behavior.
> V2 -> V3: Fix rebase problem on unit test
> V1 -> V2
> - Rebased and Addressed Mark's comments.
> - Added NEWS section.
>
>
> Adrian Moreno (3):
>   actions: add sample action
>   northd: make default drops explicit
>   northd: add drop sampling
>
>  NEWS                        |   2 +
>  controller/lflow.c          |   1 +
>  controller/ovn-controller.c |  44 ++++++
>  controller/physical.c       |  77 ++++++++-
>  controller/physical.h       |   6 +
>  include/ovn/actions.h       |  16 ++
>  lib/actions.c               | 120 ++++++++++++++
>  northd/automake.mk          |   2 +
>  northd/debug.c              |  98 ++++++++++++
>  northd/debug.h              |  30 ++++
>  northd/northd.c             | 109 ++++++++-----
>  northd/ovn-northd.8.xml     |  66 +++++++-
>  ovn-nb.xml                  |  28 ++++
>  ovn-sb.xml                  |  81 ++++++++++
>  tests/ovn-northd.at         |  84 ++++++++++
>  tests/ovn.at                | 303 ++++++++++++++++++++++++++++++++----
>  tests/test-ovn.c            |   3 +
>  utilities/ovn-trace.c       |   2 +
>  18 files changed, 996 insertions(+), 76 deletions(-)
>  create mode 100644 northd/debug.c
>  create mode 100644 northd/debug.h
>
> --
> 2.37.3
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
The whole series looks good to me, thanks.

Reviewed-by: Ales Musil <[email protected]>

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected]    IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to