Hi Adrian,
Since this is an RFC series, I haven't taken a deep look at the code, so
I apologize if my comments below are not correct.
On 4/25/22 07:17, Adrian Moreno wrote:
Very often when troubleshooting networking issues in an OVN cluster one
would like to know if any packet (or a specific one) is being dropped by
OVN.
Currently, this cannot be known because of two main reasons:
1 - Implicit drops: Some tables do not have a default action
(priority=0, match=1). In this case, a packet that does not match any
rule will be silently dropped.
2 - Even on explicit drops, we only know a packet was dropped. We lack
information about that packet.
In order to improve this, this RFC proposes a two-fold solution:
- First, create a debug-mode option that makes northd add a default
"drop;" action on those tables that currently lack one.
- Secondly, allow sampling of all drops. By introducing a new OVN
action: "sample" (equivalent to OVS's), OVN can make OVS sample the
packets as they are dropped and insert the first 32 bits of the
Logical Flow's UUID (a.k.a cookie) into the IPFIX sample's
ObservationPointId. That way a collector can see the packet's
header information as well as what Logical Flow dropped it.
This RFC has some limitations I'd like some specific
feedback/guidance on:
* Per-datapath flows
Even if a Logical Flow is created with "match=1", the controller will
insert the datapath metadata match. This might be good enough for most
cases but could miss packets if there's a bug in OVN. A possible
approach could be to propagate the "drop-debug" configuration to the
SB and make the controller insert the default drops but without a
Logical Flow, how would we trace it back?
I think the most important data points to track are:
* The packet contents
* The logical datapath where the drop occurred (i.e. metadata)
* (possibly) register values at the time of the drop
Since the logical datapath is important, I think we're going to end up
with an OF flow per datapath.
The question still remains whether we need to add explicit drop and
sample logical flows to the southbound database.
I think having logical flows for the explicit drops and samples is a
good idea for the time being. I think it adds clarity to what OVN is
doing. The only reason we should move the logic down to ovn-controller
is if our testing shows significant performance degradation. And I do
mean *significant*. Since this is something that is only intended to be
turned on conditionally, it doesn't need to be lightning fast. However,
if turning on drop debugging causes everything to grind to a halt, then
we probably should look into optimizing the process.
Another approach (suggested by Dumitru) could be to have OVN detect that
a lflow actually applies to all datapaths and remove the metadata
match which would also remove the number of Openflow flows.
I guess my question here is how we will know on which logical datapath
the packet was being processed when the drop occurred. If you can set
the observation point ID to the value of the OXM_METADATA register, then
this suggestion would work fine. Otherwise, I don't know how you'd
reduce this to a single flow and still get the necessary information in
the sample.
* Use of ObservationPointID
In this RFC, I just used the ObservationPointID (IPFIX element 138)
because it's already supported in the OVS's NXAST_SAMPLE. This allows us
to encode 32bits which is good enough for the cookie. If we wanted to
encode more information we'd have to look for another IPFIX element.
I had mentioned above that having the data from the registers as part of
the sample is good for debugging purposes, but I understand this may be
a big ask. Right now you're putting the logical flow UUID here, and
that's fine. It's easy to trace that back to a particular datapath. The
alternative would be to use the UUID of the datapath directly instead.
Adrian Moreno (3):
actions: add sample action
northd: add drop_debugging option
debug: add sampling of drop actions
include/ovn/actions.h | 11 ++++
lib/actions.c | 111 +++++++++++++++++++++++++++++++++++++
northd/automake.mk | 2 +
northd/debug.c | 98 +++++++++++++++++++++++++++++++++
northd/debug.h | 41 ++++++++++++++
northd/northd.c | 125 ++++++++++++++++++++++++++++--------------
ovn-nb.xml | 29 ++++++++++
tests/ovn.at | 10 +++-
tests/test-ovn.c | 2 +
utilities/ovn-trace.c | 3 +
10 files changed, 390 insertions(+), 42 deletions(-)
create mode 100644 northd/debug.c
create mode 100644 northd/debug.h
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev