If someone creates a lot of meters (>1017) in the kernel datapath
and then re-starts ovs-vswitchd, OVS fails to probe meter support
and refuses to install any meters afterwards:
dpif_netlink|INFO|The kernel module has a broken meter implementation.
The reason is that probing for broken meters relies on creating
two meters with high meter IDs. Inside the kernel, however, the
meter table is not a real hash table since v5.10 and commit:
c7c4c44c9a95 ("net: openvswitch: expand the meters supported number")
Instead, it's just an array with meter IDs mapped to indexes with a
simple modulo operation. This array can expand, but only when it is
full. There is no handling of collisions, so if the meter at
ID % size is not the right one, then the lookup just fails. This is
fine as long as userspace creates meters with densely packed IDs,
which is the case for ovs-vswitchd... except for probing.
While probing, we attempt to create meters 54545401 and 54545402.
Without expanding the table size, these map onto 1017 and 1018.
So, creation of these meters fails if there are already meters with
IDs 1017 or 1018. At this point OVS declares meter implementation
broken.
Ideally, we should make the "hash" table in the kernel handle
collisions and otherwise be a more or less proper hash table. But
we can also improve probing in userspace and avoid this issue by
choosing lower numbered meter IDs and trying to get them from the
kernel first before trying to create. Choosing high values at the
top of the 0-1023 range, so they are guaranteed to fit into the
minimal size table in the kernel (1024). If one of these already
exists and has a proper ID, then meters are likely working fine
and we don't need to install new ones for probing. If these meters
are not in the kernel, then we can try to create and check.
This logic should work fine for older or future kernels with a
proper hash table as well.
There is no Fixes tag here as the check was correct at the moment
it was introduced. It's the kernel change that broke it.
Reported-at: https://github.com/openvswitch/ovs-issues/issues/337
Signed-off-by: Ilya Maximets <[email protected]>
---
lib/dpif-netlink.c | 47 +++++++++++++++++++++++++++++++++--------
tests/system-traffic.at | 28 ++++++++++++++++++++++++
2 files changed, 66 insertions(+), 9 deletions(-)
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 2b33d0fa6..6ca5c4c5c 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -4276,15 +4276,23 @@ dpif_netlink_meter_get_stats(const struct dpif *dpif_,
ARRAY_SIZE(ovs_meter_stats_policy));
if (error) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
- VLOG_INFO_RL(&rl, "dpif_netlink_meter_transact %s failed",
- command == OVS_METER_CMD_GET ? "get" : "del");
+ VLOG_RL(&rl, error == ENOENT ? VLL_DBG : VLL_WARN,
+ "dpif_netlink_meter_transact %s failed: %s",
+ command == OVS_METER_CMD_GET ? "get" : "del",
+ ovs_strerror(error));
return error;
}
- if (stats
- && a[OVS_METER_ATTR_ID]
- && a[OVS_METER_ATTR_STATS]
- && nl_attr_get_u32(a[OVS_METER_ATTR_ID]) == meter_id.uint32) {
+ if (a[OVS_METER_ATTR_ID]
+ && nl_attr_get_u32(a[OVS_METER_ATTR_ID]) != meter_id.uint32) {
+ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+ VLOG_INFO_RL(&rl,
+ "Kernel returned a different meter id than requested");
+ ofpbuf_delete(msg);
+ return EINVAL;
+ }
+
+ if (stats && a[OVS_METER_ATTR_STATS]) {
/* return stats */
const struct ovs_flow_stats *stat;
const struct nlattr *nla;
@@ -4362,13 +4370,34 @@ probe_broken_meters__(struct dpif *dpif)
{
/* This test is destructive if a probe occurs while ovs-vswitchd is
* running (e.g., an ovs-dpctl meter command is called), so choose a
- * random high meter id to make this less likely to occur. */
- ofproto_meter_id id1 = { 54545401 };
- ofproto_meter_id id2 = { 54545402 };
+ * high meter id to make this less likely to occur.
+ *
+ * In Linux kernel v5.10+ meters are stored in a table that is not
+ * a real hash table. It's just an array with 'meter_id % size' used
+ * as an index. The numbers are chosen to fit into the minimal table
+ * size (1024) without wrapping, so these IDs are guaranteed to be
+ * found under normal conditions in the meter table, if such meters
+ * exist. It's possible to break this check by creating some meters
+ * in the kernel manually with different IDs that map onto the same
+ * indexes, but that should not be a big problem since ovs-vswitchd
+ * always allocates densely packed meter IDs with an id-pool.
+ *
+ * These IDs will also work in cases where the table in the kernel is
+ * a proper hash table. */
+ ofproto_meter_id id1 = { 1021 };
+ ofproto_meter_id id2 = { 1022 };
struct ofputil_meter_band band = {OFPMBT13_DROP, 0, 1, 0};
struct ofputil_meter_config config1 = { 1, OFPMF13_KBPS, 1, &band};
struct ofputil_meter_config config2 = { 2, OFPMF13_KBPS, 1, &band};
+ /* First check if these meters are already in the kernel. If we get
+ * a proper response from the kernel with all the good meter IDs, then
+ * meters are likley supported correctly. */
+ if (!dpif_netlink_meter_get(dpif, id1, NULL, 0)
+ || !dpif_netlink_meter_get(dpif, id2, NULL, 0)) {
+ return false;
+ }
+
/* Try adding two meters and make sure that they both come back with
* the proper meter id. Use the "__" version so that we don't cause
* a recurve deadlock. */
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 0c41c39ff..58a46af0a 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2813,6 +2813,34 @@ OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c
"${packet}") -eq 5])
OVS_TRAFFIC_VSWITCHD_STOP
AT_CLEANUP
+AT_BANNER([Meters])
+
+AT_SETUP([meters - datapath probe])
+dnl Linux kernel 4.18+ should properly support meters.
+OVS_CHECK_MIN_KERNEL(4, 18)
+OVS_TRAFFIC_VSWITCHD_START()
+
+dnl Create a lot of meters.
+for i in $(seq 1023); do
+ AT_CHECK([ovs-ofctl -OOpenFlow13 add-meter br0 "meter=$i kbps band=type=drop
rate=100"])
+done
+
+dnl Kill the process to avoid any potential datapath cleanups.
+pid=$(cat ovs-vswitchd.pid)
+AT_CHECK([kill $pid])
+OVS_WAIT_WHILE([kill -0 $pid])
+
+dnl Start it back.
+AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file], [0], [],
[stderr])
+on_exit "kill_ovs_vswitchd"
+
+dnl It should still be possible to create meters.
+AT_CHECK([ovs-ofctl -OOpenFlow13 add-meter br0 "meter=1 kbps band=type=drop
rate=100"])
+AT_CHECK([grep -q "broken meter implementation" ovs-vswitchd.log], [1])
+
+OVS_TRAFFIC_VSWITCHD_STOP(['/terminating with signal/d'])
+AT_CLEANUP
+
AT_BANNER([MPLS])
AT_SETUP([mpls - encap header dp-support])
--
2.51.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev