[...]

Some good alternatives to using the lflow UUID as a datapath identifier would be

* The value of the OXM_METADATA register.
* The southbound datapath_binding UUID.


If we use any of these identifiers, we would know what datapath dropped the packet but not what lflow in it's pipeline did it, Right?

OK, I see your point here now. For some reason I was thinking that the sample() would capture the current OF flow where the sample was sent, but that's not how it works.

As was discussed before, logical flows can be coalesced if datapath groups are enabled. So a logical flow UUID is ambiguous when trying to determine which logical datapath dropped the packet.

One idea would be to implicitly disable logical datapath groups when drop debugging is enabled. This way, logical flow UUIDs are not ambiguous. One downside is that the southbound database will become larger, likely resulting in OVSDB and ovn-controller incurring more CPU and taking longer to process the data. Since this is an active debugging scenario, that may not be a problem. The other potential downside would be if removing logical datapath groups causes the problem to present itself differently. That could turn a simple issue into a heisenbug.

Another approach would be to encode data directly into the ObservationPointID. From what I can tell, the key things you need to know where a packet was dropped are

* Logical datapath
* OF table
* OF priority

Logical datapaths are 24 bits. I'm not sure what the maximum size for OF tables are, but with OVN, 8 bits is plenty. Priorities are 16 bits. Therefore, 48 bits would be necessary to encode all of that. Since the ObservationPointID is only 32 bits, that makes this idea not work as-is. You'd have to include another field, potentially.


If the logical datapaths are only 24 bits I think we can squeeze the information in. What we can specify in the sample action is:
ObservationPointID: 32bit
ObservationDomainID: 32bit

Let's say for each IPFIX "application" that OVN might want to implement in the future (drop-debugging is just one of them) we define a "base_obs_domain_id" where:

        0 < base_obs_domain_id < 256

and we make the controller insert:

        obs_domain_id = (base_obs_domain_id << 24 | logical_datapath)

In addition, for ovn-debug we use:

        obs_point_id = lflow_uuid

This would allow for 254 different non-overlaping uses of IPFIX by OVN plus 16Mi observation_domain_ids for external users to use (base_domain_id = 0).

For drop debugging we'll always know the logical flow that caused the drop and the logical datapath it belongs to.

Would this work?
Where can I find those unique 24bits that identify the datapath? Are they available when on action encoding (i.e: through struct ovnact_encode_params)?

Thanks for the patience and guidance.
--
Adrián Moreno

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to