On 7/7/23 17:04, Ilya Maximets wrote:
On 7/6/23 18:22, Eelco Chaudron wrote:


On 2 Jun 2023, at 16:13, Adrian Moreno wrote:

Use TCA_POLICE_RATE64 if the rate cannot be expressed using 32bits.

This breaks the 32Gbps barrier. The new barrier is ~4Tbps caused by
netdev's API expressing kbps rates using 32-bit integers.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137643
Signed-off-by: Adrian Moreno <amore...@redhat.com>

This patch looks good to me, however, I have one additional general issue with 
this series.

For example, if the running kernel does not support TCA_POLICE_RATE64, 
TCA_HTB_RATE64, or TCA_HTB_CEIL64 but the kernel it was compiled on does. This 
will break the implementation. Maybe we need some probe in 
netdev_tc_init_flow_api()?

netdev_tc_init_flow_api() will not be called if hw-offload is not enabled.
But anyway, we only add 64bit arguments if we have to.  i.e. if the value
fits into 32bit range, 64bit argument will not be used.  That should provide
required backward compatibility.  If we need 64bit, but kernel doesn't
support, then we need to fail anyway.  Current code will configure incorrect
values, there is no need preserving that.


However looking at the current code we already use TCA_POLICE_PKTRATE64 which 
is later than TCA_POLICE_RATE64, so we should be good?! If this is true we also 
do not need all the ‘#ifdef HAVE_TCA_POLICE_PKTRATE64’ and related code in the 
this patch.

kpkts policing in a relatively new feature in OVS, so we do not expect it
to be used on old kernels.  So, use of TCA_POLICE_PKTRATE64 should be fine.


The HTB ones were added in 2013, so maybe we are good? If we are we can remove 
all the ‘#ifdef HAVE_TCA_HTB_RATE64’ and related code in patch 4.

Simon/Ilya any opinions on this?

Cheers,

Eelco


---
  acinclude.m4            | 10 ++++++++++
  lib/netdev-linux.c      | 19 ++++++++++++-------
  lib/netdev-linux.h      |  2 +-
  lib/tc.c                |  2 ++
  tests/atlocal.in        |  1 +
  tests/system-traffic.at | 21 +++++++++++++++++++++
  6 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 1e5a9872d..3ac7072b5 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -221,6 +221,16 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [
                 [Define to 1 if TCA_HTB_RATE64 is available.])],
      [AC_SUBST(HAVE_TCA_HTB_RATE64,no)]
      )
+
+  AC_COMPILE_IFELSE([
+    AC_LANG_PROGRAM([#include <linux/pkt_cls.h>], [
+        int x = TCA_POLICE_PKTRATE64;
+    ])],
+    [AC_SUBST(HAVE_TCA_POLICE_PKTRATE64,yes)
+     AC_DEFINE([HAVE_TCA_POLICE_PKTRATE64], [1],
+               [Define to 1 if TCA_POLICE_PKTRATE64 is available.])],
+    [AC_SUBST(HAVE_TCA_POLICE_PKTRATE64,no)]
+    )
  ])

  dnl OVS_CHECK_LINUX_SCTP_CT
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 3526fbfc3..6a9f36f1e 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -494,7 +494,7 @@ static struct tcmsg *netdev_linux_tc_make_request(const 
struct netdev *,
                                                    unsigned int flags,
                                                    struct ofpbuf *);

-static int tc_add_policer(struct netdev *, uint32_t kbits_rate,
+static int tc_add_policer(struct netdev *, uint64_t kbits_rate,
                            uint32_t kbits_burst, uint32_t kpkts_rate,
                            uint32_t kpkts_burst);

@@ -2660,6 +2660,7 @@ nl_msg_put_act_police(struct ofpbuf *request, uint32_t 
index,
                        uint64_t pkts_rate, uint64_t pkts_burst,
                        uint32_t notexceed_act, bool single_action)
  {
+    uint64_t bytes_rate = kbits_rate / 8 * 1000;
      size_t offset, act_offset;
      struct tc_police police;
      uint32_t prio = 0;
@@ -2674,8 +2675,13 @@ nl_msg_put_act_police(struct ofpbuf *request, uint32_t 
index,
      nl_msg_act_police_start_nest(request, ++prio, &offset, &act_offset,
                                   single_action);
      if (police.rate.rate) {
-        tc_put_rtab(request, TCA_POLICE_RATE, &police.rate, 0);
+        tc_put_rtab(request, TCA_POLICE_RATE, &police.rate, bytes_rate);
      }
+#ifdef HAVE_TCA_POLICE_PKTRATE64
+    if (bytes_rate > UINT32_MAX) {
+        nl_msg_put_u64(request, TCA_POLICE_RATE64, bytes_rate);
+    }
+#endif
      if (pkts_rate) {
          uint64_t pkt_burst_ticks;
          /* Here tc_bytes_to_ticks is used to convert packets rather than bytes
@@ -2689,7 +2695,7 @@ nl_msg_put_act_police(struct ofpbuf *request, uint32_t 
index,
  }

  static int
-tc_add_matchall_policer(struct netdev *netdev, uint32_t kbits_rate,
+tc_add_matchall_policer(struct netdev *netdev, uint64_t kbits_rate,
                          uint32_t kbits_burst, uint32_t kpkts_rate,
                          uint32_t kpkts_burst)
  {
@@ -5703,9 +5709,8 @@ tc_policer_init(struct tc_police *tc_police, uint64_t 
kbits_rate,
   * Returns 0 if successful, otherwise a positive errno value.
   */
  static int
-tc_add_policer(struct netdev *netdev, uint32_t kbits_rate,
-               uint32_t kbits_burst, uint32_t kpkts_rate,
-               uint32_t kpkts_burst)
+tc_add_policer(struct netdev *netdev, uint64_t kbits_rate,
+               uint32_t kbits_burst, uint32_t kpkts_rate, uint32_t kpkts_burst)
  {
      size_t basic_offset, police_offset;
      struct ofpbuf request;
@@ -5739,7 +5744,7 @@ tc_add_policer(struct netdev *netdev, uint32_t kbits_rate,
  }

  int
-tc_add_policer_action(uint32_t index, uint32_t kbits_rate,
+tc_add_policer_action(uint32_t index, uint64_t kbits_rate,
                        uint32_t kbits_burst, uint32_t pkts_rate,
                        uint32_t pkts_burst, bool update)
  {
diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
index 9a416ce50..ec19b0ded 100644
--- a/lib/netdev-linux.h
+++ b/lib/netdev-linux.h
@@ -29,7 +29,7 @@ struct netdev;
  int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
                                    const char *flag_name, bool enable);
  int linux_get_ifindex(const char *netdev_name);
-int tc_add_policer_action(uint32_t index, uint32_t kbits_rate,
+int tc_add_policer_action(uint32_t index, uint64_t kbits_rate,
                            uint32_t kbits_burst, uint32_t pkts_rate,
                            uint32_t pkts_burst, bool update);
  int tc_del_policer_action(uint32_t index, struct ofputil_meter_stats *stats);
diff --git a/lib/tc.c b/lib/tc.c
index 5c32c6f97..b2a697774 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1427,6 +1427,8 @@ static const struct nl_policy police_policy[] = {
      [TCA_POLICE_RATE] = { .type = NL_A_UNSPEC,
                            .min_len = 1024,
                            .optional = true, },
+    [TCA_POLICE_RATE64] = { .type = NL_A_U32,
+                            .optional = true, },
      [TCA_POLICE_PEAKRATE] = { .type = NL_A_UNSPEC,
                                .min_len = 1024,
                                .optional = true, },
diff --git a/tests/atlocal.in b/tests/atlocal.in
index e3ee2d48c..2fc4faf80 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -6,6 +6,7 @@ EGREP='@EGREP@'
  PYTHON3='@PYTHON3@'
  CFLAGS='@CFLAGS@'
  HAVE_TCA_HTB_RATE64='@HAVE_TCA_HTB_RATE64@'
+HAVE_TCA_POLICE_PKTRATE64='@HAVE_TCA_POLICE_PKTRATE64@'

  # PYTHONCOERCECLOCALE=0 disables the Unicode compatibility warning on
  # stderr that breaks almost any Python3 test (PEP 0538)
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 6200ec52e..b3683e856 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2382,6 +2382,27 @@ AT_CHECK([tc class show dev ovs-p1 | grep -q 'class htb 
.* HTB_CONF'])
  OVS_TRAFFIC_VSWITCHD_STOP
  AT_CLEANUP

+AT_SETUP([Ingress Policy - 64-bit])
+AT_SKIP_IF([test $HAVE_TC = no])
+AT_SKIP_IF([test $HAVE_TCA_POLICE_PKTRATE64 = no])
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_NAMESPACES(ns0)
+ADD_VETH(p0, ns0, br0, "10.1.1.1/24")
+
+AT_CHECK([ovs-vsctl set interface ovs-p0 ingress_policing_rate=50000000])
+AT_CHECK([ovs-vsctl set interface ovs-p0 ingress_policing_burst=40000000])

We have 64bit rate, but not burst.  Doesn't 40000000 overflow the value?

Yes, I had added the same burst because I was exploring if we can make it larger too. But I failed to find a way to express higher bursts and even the tc command caps it to an integer value.

I'll put a smaller one so not show such a weird result.


+
+AT_CHECK([tc -o -s -d filter show dev ovs-p0 ingress |
+  sed -n 's/.*\(rate [[0-9]]*[[a-zA-Z]]* burst [[0-9]]*[[a-zA-Z]]*\).*/\1/; T; 
p; q'],
+  [0],[dnl
+rate 50Gbit burst 1200575000b

This burst value seem to be overflowed.
40000000000 & (2**32 - 1) = 1345294336
Math doesn't match exactly, but anyway.

+])
+
+AT_CHECK([tc -s -d filter show dev ovs-p0 ingress |
+  grep -E "basic|matchall" > /dev/null], [0])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
  AT_BANNER([conntrack])

  AT_SETUP([conntrack - controller])
--
2.40.1

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



--
Adrián Moreno

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to