[ovs-dev] [PATCH 10/11] tests: Fix reading of OpenFlow byte counters in GRE test cases.

2023-02-01 Thread Eelco Chaudron
With some datapaths, read TC, it takes a bit longer to update the
OpenFlow statistics. Rather than adding an additional delay, try
to read the counters multiple times until we get the desired value.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/system-traffic.at |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 9fea221f2..ba95c2614 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1638,7 +1638,6 @@ dnl   br-underlay: with IP: 172.31.1.100
 dnl   ns0: connect to br-underlay, with IP: 10.1.1.1
 AT_SETUP([datapath - truncate and output to gre tunnel by simulated packets])
 OVS_CHECK_MIN_KERNEL(3, 10)
-CHECK_NO_TC_OFFLOAD()
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -1709,9 +1708,8 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=2" | 
sed -n 's/.*\(n\_bytes=[
 n_bytes=242
 ])
 dnl After truncation = outer ETH(14) + outer IP(20) + GRE(4) + 100 = 138B
-AT_CHECK([ovs-ofctl dump-flows br-underlay | grep "in_port=LOCAL" | sed -n 
's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
-n_bytes=138
-])
+OVS_WAIT_UNTIL_EQUAL([ovs-ofctl dump-flows br-underlay | grep "in_port=LOCAL" 
| sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [dnl
+n_bytes=138])
 
 dnl check tunnel pop path, from at_ns0 to at_ns1
 dnl This 200-byte packet is simulated on behalf of ns_gre0
@@ -1719,9 +1717,9 @@ ovs-ofctl -O OpenFlow13 packet-out br-underlay "in_port=1 
packet=02908ca8a149faa
 
 dnl After truncation = 100 byte at loopback device p2(4)
 AT_CHECK([ovs-appctl revalidator/purge], [0])
-AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip], [0], [dnl
- n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop
-])
+OVS_WAIT_UNTIL_EQUAL([ovs-ofctl dump-flows br0 | grep "in_port=4" | 
ofctl_strip], [dnl
+ n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop])
+
 
 dnl SLOW_ACTION: disable datapath truncate support
 dnl Repeat the test above, but exercise the SLOW_ACTION code path
@@ -1746,9 +1744,8 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=2" | 
sed -n 's/.*\(n\_bytes=[
 n_bytes=242
 ])
 dnl After truncation = outer ETH(14) + outer IP(20) + GRE(4) + 100 = 138B
-AT_CHECK([ovs-ofctl dump-flows br-underlay | grep "in_port=LOCAL" | sed -n 
's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
-n_bytes=138
-])
+OVS_WAIT_UNTIL_EQUAL([ovs-ofctl dump-flows br-underlay | grep "in_port=LOCAL" 
| sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [dnl
+n_bytes=138])
 
 dnl check tunnel pop path, from at_ns0 to at_ns1
 dnl This 200-byte packet is simulated on behalf of ns_gre0
@@ -1773,7 +1770,6 @@ AT_SETUP([datapath - truncate and output to gre tunnel])
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_BR([br-underlay])

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 11/11] tests: Comment currently failing TC system-traffic tests.

2023-02-01 Thread Eelco Chaudron
I commented the three remaining failures when running tc with the
system-traffic tests. In addition I ran the following test to verify
we did not see any failures with recheck enabled:

  for i in {1..50}; do make check-offloads || \
make check-offloads TESTSUITEFLAGS="--recheck" || break; \
echo "ALL_50_OK: $i"; done;

Unfortunately, a bunch of test cases showed occasional failures.
For now, they are excluded from the test cases and need further
investigation. They are:

  datapath - truncate and output to gre tunnel
  datapath - truncate and output to gre tunnel by simulated packets

These tests where executed on a Fedora37 machine with the kernel
6.1.5-200.fc37.x86_64 installed.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/system-offloads-testsuite-macros.at |   21 +
 tests/system-traffic.at   |2 ++
 2 files changed, 23 insertions(+)

diff --git a/tests/system-offloads-testsuite-macros.at 
b/tests/system-offloads-testsuite-macros.at
index 322166b8c..e50dc07fb 100644
--- a/tests/system-offloads-testsuite-macros.at
+++ b/tests/system-offloads-testsuite-macros.at
@@ -30,6 +30,27 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
 ])
 
 # Macro to exclude tests that will fail with TC offload enabled.
+# We currently have the below tests disabled in system-traffic.at
+# for the following reasons:
+#
+#  TC does not support moving ports to a different namespace than vswitchd's
+#  namespace, so we need to disable this test.
+#- 'conntrack - multiple namespaces, internal ports'
+#
+#  The kernel's tcf_ct_act() function does not seem to take care of any (QinQ)
+#  VLAN headers causing commits to fail. However, if this is solved, we have to
+#  make sure conntrack does not break the VLAN boundary, i.e., putting together
+#  two packets with different CVLAN+SVLAN values.
+#- 'conntrack - IPv4 fragmentation + cvlan'
+#
+#  Fragmentation handling in ct zone 9 does not seem to work correctly.
+#  When moving this test over to the default zone all works fine.
+#- 'conntrack - Fragmentation over vxlan'
+#
+#  Occasionally we fail with invalid byte counts.
+#- 'datapath - truncate and output to gre tunnel by simulated packets'
+#- 'datapath - truncate and output to gre tunnel'
+#
 m4_define([CHECK_NO_TC_OFFLOAD],
 [
  AT_SKIP_IF([:])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index ba95c2614..c61be644f 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1639,6 +1639,7 @@ dnl   ns0: connect to br-underlay, with IP: 10.1.1.1
 AT_SETUP([datapath - truncate and output to gre tunnel by simulated packets])
 OVS_CHECK_MIN_KERNEL(3, 10)
 AT_SKIP_IF([test $HAVE_NC = no])
+CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_BR([br-underlay], [set bridge br-underlay 
other-config:hwaddr=\"02:90:8c:a8:a1:49\"])
@@ -1770,6 +1771,7 @@ AT_SETUP([datapath - truncate and output to gre tunnel])
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
+CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_BR([br-underlay])

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 08/11] odp-util: Make odp_flow_key_from_flow__ nlattr order the same as the kernel.

2023-02-01 Thread Eelco Chaudron
Make the order of the Netlink attributes for odp_flow_key_from_flow__()
the same as the kernel will return them.

This will make sure the attributes displayed in the dpctl/dump-flows
output appear in the same order for all datapath.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 lib/odp-util.c|   21 +-
 tests/dpif-netdev.at  |   28 +++---
 tests/mcast-snooping.at   |4 +-
 tests/nsh.at  |   10 ++---
 tests/odp.at  |   84 -
 tests/ofproto-dpif.at |   30 +++
 tests/packet-type-aware.at|   22 +--
 tests/pmd.at  |2 -
 tests/system-traffic.at   |1 
 tests/tunnel-push-pop-ipv6.at |2 -
 tests/tunnel-push-pop.at  |2 -
 tests/tunnel.at   |2 -
 12 files changed, 102 insertions(+), 106 deletions(-)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 5fc312f8c..dbd4554d0 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -6204,6 +6204,11 @@ odp_flow_key_from_flow__(const struct odp_flow_key_parms 
*parms,
 const struct flow *mask = parms->mask;
 const struct flow *data = export_mask ? mask : flow;
 
+if (parms->support.recirc) {
+nl_msg_put_u32(buf, OVS_KEY_ATTR_RECIRC_ID, data->recirc_id);
+nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, data->dp_hash);
+}
+
 nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, data->skb_priority);
 
 if (flow_tnl_dst_is_set(>tunnel) ||
@@ -6212,6 +6217,12 @@ odp_flow_key_from_flow__(const struct odp_flow_key_parms 
*parms,
 parms->key_buf, NULL);
 }
 
+/* Add an ingress port attribute if this is a mask or 'in_port.odp_port'
+ * is not the magical value "ODPP_NONE". */
+if (export_mask || flow->in_port.odp_port != ODPP_NONE) {
+nl_msg_put_odp_port(buf, OVS_KEY_ATTR_IN_PORT, data->in_port.odp_port);
+}
+
 nl_msg_put_u32(buf, OVS_KEY_ATTR_SKB_MARK, data->pkt_mark);
 
 if (parms->support.ct_state) {
@@ -6255,16 +6266,6 @@ odp_flow_key_from_flow__(const struct odp_flow_key_parms 
*parms,
 ct->ipv6_proto = data->ct_nw_proto;
 }
 }
-if (parms->support.recirc) {
-nl_msg_put_u32(buf, OVS_KEY_ATTR_RECIRC_ID, data->recirc_id);
-nl_msg_put_u32(buf, OVS_KEY_ATTR_DP_HASH, data->dp_hash);
-}
-
-/* Add an ingress port attribute if this is a mask or 'in_port.odp_port'
- * is not the magical value "ODPP_NONE". */
-if (export_mask || flow->in_port.odp_port != ODPP_NONE) {
-nl_msg_put_odp_port(buf, OVS_KEY_ATTR_IN_PORT, data->in_port.odp_port);
-}
 
 nl_msg_put_be32(buf, OVS_KEY_ATTR_PACKET_TYPE, data->packet_type);
 
diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
index 9af70a68d..baab60a22 100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -72,13 +72,13 @@ ovs-appctl time/warp 5000
 AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:01,dst=50:54:00:00:02:00),eth_type(0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9),tcp_flags(ack)'])
OVS_WAIT_UNTIL([grep "miss upcall" ovs-vswitchd.log])
AT_CHECK([grep -A 1 'miss upcall' ovs-vswitchd.log | tail -n 1], [0], [dnl
-skb_priority(0),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),recirc_id(0),dp_hash(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:01,dst=50:54:00:00:02:00),eth_type(0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9),tcp_flags(ack)
+recirc_id(0),dp_hash(0),skb_priority(0),in_port(1),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:01,dst=50:54:00:00:02:00),eth_type(0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9),tcp_flags(ack)
 ])
 
 AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:06:00),eth_type(0x0800),ipv4(src=10.0.0.5,dst=10.0.0.6,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9),tcp_flags(ack)'
 --len 1024])
OVS_WAIT_UNTIL([test `grep -c "miss upcall" ovs-vswitchd.log` -ge 2])
AT_CHECK([grep -A 1 'miss upcall' ovs-vswitchd.log | tail -n 1], [0], [dnl
-skb_priority(0),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),recirc_id(0),dp_hash(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:05,dst=50:54:00:00:06:00),eth_type(0x0800),ipv4(src=10.0.0.5,dst=10.0.0.6,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9),tcp_flags(ack)
+recirc_id(0),dp_hash(0),skb_priority(0),in_port(1),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:05,dst=50:54:00:00:06:00),eth_type(0x0800),ipv4(src=10.0.0.5,dst=10.0.0.6,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9),tcp_flags(ack)
 ])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
@@ -139,7 +139,7 @@ m4_define([DPIF_NETDEV_MISS_FLOW_INSTALL],
 
OVS_WAIT_UNTIL([grep "miss upcall" 

[ovs-dev] [PATCH 09/11] netdev-offload-tc: If the flow has not been used, report it as such.

2023-02-01 Thread Eelco Chaudron
If a tc flow was installed but has not yet been used, report it as such.

In addition, add a delay to the "IGMP - flood under normal action" test
case to make it work with many repetitions. This delay is also present
in other ICMP/IGMP tests.

f98e418fbdb6 ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 lib/tc.c|   14 +-
 tests/system-traffic.at |1 -
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/tc.c b/lib/tc.c
index 1fb2b4a92..4c07e2216 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1366,7 +1366,19 @@ get_user_hz(void)
 static void
 nl_parse_tcf(const struct tcf_t *tm, struct tc_flower *flower)
 {
-uint64_t lastused = time_msec() - (tm->lastuse * 1000 / get_user_hz());
+uint64_t lastused;
+
+/* On creation both tm->install and tm->lastuse are set to jiffies
+ * by the kernel. So if both values are the same, the flow has not been
+ * used yet.
+ *
+ * Note that tm->firstuse can not be used due to some kernel bug, i.e.,
+ * hardware offloaded flows do not update tm->firstuse. */
+if (tm->lastuse == tm->install) {
+lastused = 0;
+} else {
+lastused = time_msec() - (tm->lastuse * 1000 / get_user_hz());
+}
 
 if (flower->lastused < lastused) {
 flower->lastused = lastused;
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index a93bdb26f..9fea221f2 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -7144,7 +7144,6 @@ AT_CLEANUP
 AT_BANNER([IGMP])
 
 AT_SETUP([IGMP - flood under normal action])
-CHECK_NO_TC_OFFLOAD()
 
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_NAMESPACES(at_ns0, at_ns1)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 07/11] test: Fix 'conntrack - Multiple ICMP traverse' for tc case.

2023-02-01 Thread Eelco Chaudron
tc does not include ethernet header length in packet byte count.
This fix will allow the packets that go trough tc to be 14 bytes less.

This difference in the TC implementation is already described in
tc-offload.rst.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/system-traffic.at |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index eb03c69be..19ec98617 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -7054,7 +7054,6 @@ AT_SETUP([conntrack - Multiple ICMP traverse])
 dnl This tracks sending ICMP packets via conntrack multiple times for the
 dnl same packet
 CHECK_CONNTRACK()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 OVS_CHECK_CT_CLEAR()
 
@@ -7086,7 +7085,7 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(10.1.1)], [0], [dnl
 
icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0)
 ])
 
-AT_CHECK([ovs-ofctl dump-flows br0 | grep table=2, | 
OFPROTO_CLEAR_DURATION_IDLE],
+AT_CHECK([ovs-ofctl dump-flows br0 | grep table=2, | 
OFPROTO_CLEAR_DURATION_IDLE | sed 's/n_bytes=70,/n_bytes=84,/'],
  [0], [dnl
  cookie=0x0, duration=, table=2, n_packets=2, n_bytes=84, 
idle_age=, priority=10,ct_state=+new+trk,in_port=1 actions=drop
  cookie=0x0, duration=, table=2, n_packets=0, n_bytes=0, 
idle_age=, priority=10,ct_state=+est+trk actions=drop

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 05/11] netdev-offload-tc: Conntrack ALGs are not supported with tc.

2023-02-01 Thread Eelco Chaudron
tc does not support conntrack ALGs. Even worse, with tc enabled, they
should not be used/configured at all. This is because even though TC
will ignore the rules with ALG configured, i.e., they will flow through
the kernel module, return traffic might flow through a tc conntrack
rule, and it will not invoke the ALG helper.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 Documentation/howto/tc-offload.rst|   11 +++
 lib/netdev-offload-tc.c   |4 
 tests/system-offloads-testsuite-macros.at |6 ++
 tests/system-traffic.at   |   15 ---
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/Documentation/howto/tc-offload.rst 
b/Documentation/howto/tc-offload.rst
index f6482c8af..681dff13e 100644
--- a/Documentation/howto/tc-offload.rst
+++ b/Documentation/howto/tc-offload.rst
@@ -112,3 +112,14 @@ First flow packet not processed by meter
 Packets that are received by ovs-vswitchd through an upcall before the actual
 meter flow is installed, are not passing TC police action and therefore are
 not considered for policing.
+
+Conntrack Application Layer Gateways (ALG)
+++
+
+TC does not support conntrack helpers, i.e., ALGs. TC will not offload flows if
+the ALG keyword is present within the ct() action. However, this will not allow
+ALGs to work within the datapath, as the return traffic without the ALG keyword
+might run through a TC rule, which internally will not call the conntrack
+helper required.
+
+So if ALG support is required, tc offload must be disabled.
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index f9f05cd5c..4e94430ec 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1425,6 +1425,10 @@ parse_put_flow_ct_action(struct tc_flower *flower,
 get_32aligned_u128(_label->mask);
 }
 break;
+/* The following option we do not support in tc-ct, and should
+ * not be ignored for proper operation. */
+case OVS_CT_ATTR_HELPER:
+return EOPNOTSUPP;
 }
 }
 
diff --git a/tests/system-offloads-testsuite-macros.at 
b/tests/system-offloads-testsuite-macros.at
index 2129cf7f0..5d7044f42 100644
--- a/tests/system-offloads-testsuite-macros.at
+++ b/tests/system-offloads-testsuite-macros.at
@@ -34,3 +34,9 @@ m4_define([CHECK_NO_TC_OFFLOAD],
 [
  AT_SKIP_IF([:])
 ])
+
+# Conntrack ALGs are not supported for tc.
+m4_define([CHECK_CONNTRACK_ALG],
+[
+ AT_SKIP_IF([:])
+])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index b2751e7e4..05e0473ec 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -4827,7 +4827,6 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - FTP])
-CHECK_NO_TC_OFFLOAD()
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_ALG()
@@ -4937,7 +4936,6 @@ AT_SETUP([conntrack - FTP over IPv6])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -4993,7 +4991,6 @@ AT_SETUP([conntrack - IPv6 FTP Passive])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -5053,7 +5050,6 @@ AT_SETUP([conntrack - FTP with multiple expectations])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -5120,7 +5116,6 @@ AT_SETUP([conntrack - TFTP])
 AT_SKIP_IF([test $HAVE_TFTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -5756,7 +5751,6 @@ m4_define([CHECK_FTP_NAT],
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -6064,7 +6058,6 @@ AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -6125,7 +6118,6 @@ AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -6186,7 +6178,6 @@ AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -6247,7 +6238,6 @@ AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -6308,7 +6298,6 @@ AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 CHECK_CONNTRACK_ALG()
-CHECK_NO_TC_OFFLOAD()
 
 

[ovs-dev] [PATCH 06/11] test: tc does not support conntrack timeout, skip the related test.

2023-02-01 Thread Eelco Chaudron
The tc conntrack implementation does not support the timeout option.
The current implementation is silently ignoring the timeout option
by adding a general conntrack entry.

This patch will skip the related test by overriding the support macro.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/system-offloads-testsuite-macros.at |6 ++
 tests/system-traffic.at   |1 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/system-offloads-testsuite-macros.at 
b/tests/system-offloads-testsuite-macros.at
index 5d7044f42..322166b8c 100644
--- a/tests/system-offloads-testsuite-macros.at
+++ b/tests/system-offloads-testsuite-macros.at
@@ -40,3 +40,9 @@ m4_define([CHECK_CONNTRACK_ALG],
 [
  AT_SKIP_IF([:])
 ])
+
+# Conntrack timeout not supported for tc.
+m4_define([CHECK_CONNTRACK_TIMEOUT],
+[
+ AT_SKIP_IF([:])
+])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 05e0473ec..eb03c69be 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -4333,7 +4333,6 @@ AT_CLEANUP
 AT_SETUP([conntrack - zone-based timeout policy])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_TIMEOUT()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0, at_ns1)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 04/11] test: Flush datapath when changing rules on the fly.

2023-02-01 Thread Eelco Chaudron
Flush datapath flows as TC flows take some more time to be flushed out.
The flush speeds this up.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/system-traffic.at |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index df8459c66..b2751e7e4 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2848,6 +2848,9 @@ AT_CHECK([ovs-appctl dpctl/dump-flows --names 
filter=in_port=ovs-p0 dnl
 AT_CHECK([ovs-ofctl mod-flows br0 dnl
 'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" 
actions=ct(table=0,zone=15)'])
 
+dnl Wait for a flow flush as some datapaths (read TC) might take time to clear.
+AT_CHECK([ovs-appctl revalidator/wait], [0])
+
 NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o 
wget0.log])
 
 AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 dnl
@@ -2858,7 +2861,6 @@ AT_CLEANUP
 
 AT_SETUP([conntrack - zones from other field, more tests])
 CHECK_CONNTRACK()
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -2897,6 +2899,9 @@ AT_CHECK([ovs-appctl dpctl/dump-flows --names 
filter=in_port=ovs-p0 dnl
 
 AT_CHECK([ovs-ofctl mod-flows br0 
'priority=100,ct_state=-trk,tcp,in_port="ovs-p0" 
actions=ct(table=0,zone=15,commit,exec(load:0x000f->NXM_NX_CT_LABEL[[0..31]]))'])
 
+dnl Wait for a flow flush as some datapaths (read TC) might take time to clear.
+AT_CHECK([ovs-appctl revalidator/wait], [0])
+
 NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o 
wget0.log])
 
 AT_CHECK([ovs-appctl dpctl/dump-flows --names filter=in_port=ovs-p0 dnl

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 03/11] netdev-offload-tc: Fix tc conntrack force commit support.

2023-02-01 Thread Eelco Chaudron
tc was not setting the OVS_CT_ATTR_FORCE_COMMIT flag when a forced
commit was requested. This patch will fix this.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 lib/netdev-offload-tc.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 4ced81f89..f9f05cd5c 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -887,7 +887,11 @@ parse_tc_flower_to_actions__(struct tc_flower *flower, 
struct ofpbuf *buf,
 ct_offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_CT);
 
 if (action->ct.commit) {
-nl_msg_put_flag(buf, OVS_CT_ATTR_COMMIT);
+if (action->ct.force) {
+nl_msg_put_flag(buf, OVS_CT_ATTR_FORCE_COMMIT);
+} else {
+nl_msg_put_flag(buf, OVS_CT_ATTR_COMMIT);
+}
 }
 
 if (action->ct.zone) {
@@ -1376,7 +1380,12 @@ parse_put_flow_ct_action(struct tc_flower *flower,
 NL_ATTR_FOR_EACH_UNSAFE (ct_attr, ct_left, ct, ct_len) {
 switch (nl_attr_type(ct_attr)) {
 case OVS_CT_ATTR_COMMIT: {
-action->ct.commit = true;
+action->ct.commit = true;
+}
+break;
+case OVS_CT_ATTR_FORCE_COMMIT: {
+action->ct.commit = true;
+action->ct.force = true;
 }
 break;
 case OVS_CT_ATTR_ZONE: {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 02/11] test: Do not use MPLS implicit null label in test cases.

2023-02-01 Thread Eelco Chaudron
TC flower does not allow the push of the implicit null labels (RFC3032).
Avoid the use of such labels in the MPLS test cases.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/system-traffic.at |   10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index c84fecce3..df8459c66 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1199,7 +1199,6 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - mpls actions])
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -1216,8 +1215,8 @@ AT_CHECK([ovs-vsctl add-port br0 patch0])
 AT_CHECK([ovs-vsctl add-port br1 patch1])
 
 AT_DATA([flows.txt], [dnl
-table=0,priority=100,dl_type=0x0800 
actions=push_mpls:0x8847,set_mpls_label:3,resubmit(,1)
-table=0,priority=100,dl_type=0x8847,mpls_label=3 
actions=pop_mpls:0x0800,resubmit(,1)
+table=0,priority=100,dl_type=0x0800 
actions=push_mpls:0x8847,set_mpls_label:4,resubmit(,1)
+table=0,priority=100,dl_type=0x8847,mpls_label=4 
actions=pop_mpls:0x0800,resubmit(,1)
 table=0,priority=10 actions=resubmit(,1)
 table=1,priority=10 actions=normal
 ])
@@ -1237,7 +1236,6 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - multiple mpls label pop])
-CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -1254,10 +1252,10 @@ AT_CHECK([ovs-vsctl add-port br0 patch0])
 AT_CHECK([ovs-vsctl add-port br1 patch1])
 
 AT_DATA([flows.txt], [dnl
-table=0,priority=100,dl_type=0x0800 
actions=push_mpls:0x8847,set_mpls_label:3,push_mpls:0x8847,set_mpls_label:2,push_mpls:0x8847,set_mpls_label:1,resubmit(,3)
+table=0,priority=100,dl_type=0x0800 
actions=push_mpls:0x8847,set_mpls_label:4,push_mpls:0x8847,set_mpls_label:2,push_mpls:0x8847,set_mpls_label:1,resubmit(,3)
 table=0,priority=100,dl_type=0x8847,mpls_label=1 
actions=pop_mpls:0x8847,resubmit(,1)
 table=1,priority=100,dl_type=0x8847,mpls_label=2 
actions=pop_mpls:0x8847,resubmit(,2)
-table=2,priority=100,dl_type=0x8847,mpls_label=3 
actions=pop_mpls:0x0800,resubmit(,3)
+table=2,priority=100,dl_type=0x8847,mpls_label=4 
actions=pop_mpls:0x0800,resubmit(,3)
 table=0,priority=10 actions=resubmit(,3)
 table=3,priority=10 actions=normal
 ])

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 01/11] tests: Include working system-traffic tests into the system-offloads-testsuite.

2023-02-01 Thread Eelco Chaudron
Include and run the system-traffic.at tests as part of the system offload
testsuite. Exclude all the tests that will not run without any special
modifications.

Signed-off-by: Eelco Chaudron 
Acked-by: Roi Dayan 
---
 tests/automake.mk |3 ++
 tests/ofproto-macros.at   |1 +
 tests/system-offloads-testsuite-macros.at |   36 +
 tests/system-offloads-testsuite.at|3 ++
 tests/system-traffic.at   |   27 ++
 5 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 tests/system-offloads-testsuite-macros.at

diff --git a/tests/automake.mk b/tests/automake.mk
index c8de3fe28..86e496a5b 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -186,7 +186,8 @@ SYSTEM_TESTSUITE_AT = \
 SYSTEM_OFFLOADS_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-offloads-traffic.at \
-   tests/system-offloads-testsuite.at
+   tests/system-offloads-testsuite.at \
+   tests/system-offloads-testsuite-macros.at
 
 SYSTEM_DPDK_TESTSUITE_AT = \
tests/system-common-macros.at \
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 676d55aa9..690331f35 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -195,6 +195,7 @@ m4_define([_OVS_VSWITCHD_START],
 /netdev_offload|INFO|netdev: Flow API Enabled/d
 /probe tc:/d
 /setting extended ack support failed/d
+/recirc_id sharing not supported/d
 /tc: Using policy/d']])
 ])
 
diff --git a/tests/system-offloads-testsuite-macros.at 
b/tests/system-offloads-testsuite-macros.at
new file mode 100644
index 0..2129cf7f0
--- /dev/null
+++ b/tests/system-offloads-testsuite-macros.at
@@ -0,0 +1,36 @@
+AT_COPYRIGHT([Copyright (c) 2022 Red Hat, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+# The goal is to run as many as possible of the system-traffic tests with
+# OVS tc offload enabled. We do this by overriding the
+# OVS_TRAFFIC_VSWITCHD_START() with offloading enabled.
+m4_define([OVS_TRAFFIC_VSWITCHD_START],
+  [AT_CHECK([modprobe openvswitch])
+   on_exit 'modprobe -r openvswitch'
+   m4_foreach([mod], [[vport_geneve], [vport_gre], [vport_lisp], [vport_stt], 
[vport_vxlan]],
+  [modprobe -q mod || echo "Module mod not loaded."
+   on_exit 'modprobe -q -r mod'
+  ])
+   on_exit 'ovs-dpctl del-dp ovs-system'
+   on_exit 'ovs-appctl dpctl/flush-conntrack'
+   _OVS_VSWITCHD_START([], [-- set Open_vSwitch . other_config:hw-offload=true 
$3])
+   dnl Add bridges, ports, etc.
+   AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
uuidfilt])], [0], [$2])
+])
+
+# Macro to exclude tests that will fail with TC offload enabled.
+m4_define([CHECK_NO_TC_OFFLOAD],
+[
+ AT_SKIP_IF([:])
+])
diff --git a/tests/system-offloads-testsuite.at 
b/tests/system-offloads-testsuite.at
index eb5d2d4b3..23637d4f5 100644
--- a/tests/system-offloads-testsuite.at
+++ b/tests/system-offloads-testsuite.at
@@ -23,3 +23,6 @@ m4_include([tests/system-common-macros.at])
 m4_include([tests/system-kmod-macros.at])
 
 m4_include([tests/system-offloads-traffic.at])
+
+m4_include([tests/system-offloads-testsuite-macros.at])
+m4_include([tests/system-traffic.at])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 470c3f640..c84fecce3 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1199,6 +1199,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - mpls actions])
+CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -1236,6 +1237,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - multiple mpls label pop])
+CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])])
 
 ADD_NAMESPACES(at_ns0, at_ns1)
@@ -1638,6 +1640,7 @@ dnl   br-underlay: with IP: 172.31.1.100
 dnl   ns0: connect to br-underlay, with IP: 10.1.1.1
 AT_SETUP([datapath - truncate and output to gre tunnel by simulated packets])
 OVS_CHECK_MIN_KERNEL(3, 10)
+CHECK_NO_TC_OFFLOAD()
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -1772,6 +1775,7 @@ AT_SETUP([datapath - truncate and output to gre tunnel])
 AT_SKIP_IF([test $HAVE_NC = no])
 OVS_CHECK_KERNEL_EXCL(3, 10, 4, 15)
 OVS_CHECK_GRE()
+CHECK_NO_TC_OFFLOAD()
 OVS_TRAFFIC_VSWITCHD_START()
 
 ADD_BR([br-underlay])
@@ -2856,6 +2860,7 @@ AT_CLEANUP
 
 AT_SETUP([conntrack - zones from other field, more tests])
 

[ovs-dev] [PATCH 00/11] tests: Add system-traffic.at tests to check-offloads.

2023-02-01 Thread Eelco Chaudron
This series makes it possible to include system-traffic.at tests into
"make check-offloads" tests.

The last patch of the series explains which tests are still not passing
and might need some more work.

I'll try to work on the remaining failing test cases or find someone
who can work on them.

These tests where executed on a Fedora37 machine with the kernel
6.1.5-200.fc37.x86_64 installed.

v9:
  - Exclude "recirc_id sharing not supported" warning from the log.
  - Reworked to use skip based macro rather than skip list.
  - Fixes some spellings.
  - Removed patches for issues no longer existing when using the
latest kernel/OVS.
v8:
  - Re-based on top of latest OVS master.
v7:
  - Removed left over merge comment, and re-run all tests.
v6:
  - Added ACKs from v5
  - Changed 'netdev-offload-tc: If the flow has not been used, report
it as such.' to also work on hardware offloaded flows.
v5:
  - Include all patches, v4 went out with missing two patches :(
v4:
  - Fix rename from system-traffic.at to sym-traffic.at in patch 11
v3:
  - Fixed missing MACRO's in patches 4, 6 and 10.
v2:
  - Fix commit message on last patch
  - Moved handling of system-traffic.at tests to a separate file
system-offloads.at
  - Re-based to the latest ovs master branch
  - Added Roi's ACKs

Eelco Chaudron (11):
  tests: Include working system-traffic tests into the 
system-offloads-testsuite.
  test: Do not use MPLS implicit null label in test cases.
  netdev-offload-tc: Fix tc conntrack force commit support.
  test: Flush datapath when changing rules on the fly.
  netdev-offload-tc: Conntrack ALGs are not supported with tc.
  test: tc does not support conntrack timeout, skip the related test.
  test: Fix 'conntrack - Multiple ICMP traverse' for tc case.
  odp-util: Make odp_flow_key_from_flow__ nlattr order the same as the 
kernel.
  netdev-offload-tc: If the flow has not been used, report it as such.
  tests: Fix reading of OpenFlow byte counters in GRE test cases.
  tests: Comment currently failing TC system-traffic tests.


 Documentation/howto/tc-offload.rst| 11 +++
 lib/netdev-offload-tc.c   | 17 -
 lib/odp-util.c| 21 +++---
 lib/tc.c  | 14 +++-
 tests/automake.mk |  3 +-
 tests/dpif-netdev.at  | 28 
 tests/mcast-snooping.at   |  4 +-
 tests/nsh.at  | 10 +--
 tests/odp.at  | 84 +++
 tests/ofproto-dpif.at | 30 
 tests/ofproto-macros.at   |  1 +
 tests/packet-type-aware.at| 22 +++---
 tests/pmd.at  |  2 +-
 tests/system-offloads-testsuite-macros.at | 69 +++
 tests/system-offloads-testsuite.at|  3 +
 tests/system-traffic.at   | 37 ++
 tests/tunnel-push-pop-ipv6.at |  2 +-
 tests/tunnel-push-pop.at  |  2 +-
 tests/tunnel.at   |  2 +-
 19 files changed, 239 insertions(+), 123 deletions(-)
 create mode 100644 tests/system-offloads-testsuite-macros.at

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 回复: 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread 陶 缘
Thank you Jakub, for your reminder. i will take care of the comment in the code 
as well in the next submission

eddy

发件人: Jakub Kicinski 
发送时间: 2023年2月2日 5:36
收件人: 陶 缘 
抄送: Jiri Pirko ; net...@vger.kernel.org 
; Pravin B Shelar ; David S. Miller 
; Eric Dumazet ; Paolo Abeni 
; d...@openvswitch.org ; 
linux-ker...@vger.kernel.org 
主题: Re: 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

On Wed, 1 Feb 2023 21:35:15 -0800 Jakub Kicinski wrote:
> On Thu, 2 Feb 2023 05:09:51 + 陶 缘 wrote:
> > I guest you are pointing to the field "From: taoyuan_e...@hotmail.com" in 
> > the patch header linked from "Headers show" section in the patch page
> >
> >
> > https://patchwork.kernel.org/project/netdevbpf/patch/os3p286mb22954422e3dd09ff5fd6b091f5...@os3p286mb2295.jpnp286.prod.outlook.com/
> >
> > I will fix that accordingly.
>
> The From is correct, please look at the entire email Jiri also
> commented about the code.
>
> Two more notes:
>  - don't top post on the list
>  - reply in plain text not HTML

One more thing, please make sure you read the process information
for Linux networking:

https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread Jakub Kicinski
On Wed, 1 Feb 2023 21:35:15 -0800 Jakub Kicinski wrote:
> On Thu, 2 Feb 2023 05:09:51 + 陶 缘 wrote:
> > I guest you are pointing to the field "From: taoyuan_e...@hotmail.com" in 
> > the patch header linked from "Headers show" section in the patch page
> > 
> > 
> > https://patchwork.kernel.org/project/netdevbpf/patch/os3p286mb22954422e3dd09ff5fd6b091f5...@os3p286mb2295.jpnp286.prod.outlook.com/
> > 
> > I will fix that accordingly.  
> 
> The From is correct, please look at the entire email Jiri also
> commented about the code.
> 
> Two more notes:
>  - don't top post on the list
>  - reply in plain text not HTML

One more thing, please make sure you read the process information 
for Linux networking:

https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread Jakub Kicinski
On Thu, 2 Feb 2023 05:09:51 + 陶 缘 wrote:
> I guest you are pointing to the field "From: taoyuan_e...@hotmail.com" in the 
> patch header linked from "Headers show" section in the patch page
> 
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/os3p286mb22954422e3dd09ff5fd6b091f5...@os3p286mb2295.jpnp286.prod.outlook.com/
> 
> I will fix that accordingly.

The From is correct, please look at the entire email Jiri also
commented about the code.

Two more notes:
 - don't top post on the list
 - reply in plain text not HTML
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread 陶 缘
Hi, Jiri:

I guest you are pointing to the field "From: taoyuan_e...@hotmail.com" in the 
patch header linked from "Headers show" section in the patch page


https://patchwork.kernel.org/project/netdevbpf/patch/os3p286mb22954422e3dd09ff5fd6b091f5...@os3p286mb2295.jpnp286.prod.outlook.com/

I will fix that accordingly.

Thanks for your time

eddy


发件人: Jiri Pirko 
发送时间: 2023年2月1日 13:44
收件人: taoyuan_e...@hotmail.com 
抄送: net...@vger.kernel.org ; Pravin B Shelar 
; David S. Miller ; Eric Dumazet 
; Jakub Kicinski ; Paolo Abeni 
; d...@openvswitch.org ; 
linux-ker...@vger.kernel.org 
主题: Re: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

Wed, Feb 01, 2023 at 02:24:39PM CET, taoyuan_e...@hotmail.com wrote:
>From: Eddy Tao 
>
>'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
>However, its size is hardcoded to CONFIG_NR_CPUS bits, which can be
>8192 by default, it costs memory and slows down ovs_flow_alloc.
>This fix uses actual CPU number instead
>
>Signed-off-by: Eddy Tao 

Eddy, looks like you missed my second comment to v1. Could you please
check again?

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd.c: Validate port type to avoid unexpected behavior.

2023-02-01 Thread Han Zhou
In ovn_igmp_group_get_ports(), it accesses a union member that should
exist only if the port is a LSP: port->peer->od->mcast_info.rtr.relay.
But in theory it is possible that the "port" is in fact a LRP, because
it is a result of ovn_port_find() with the port name coming from a SB DB
entry. So it is better to validate that first.

Signed-off-by: Han Zhou 
---
 northd/northd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/northd/northd.c b/northd/northd.c
index 0944a7b5673f..2d82eccfeff8 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -4871,7 +4871,7 @@ ovn_igmp_group_get_ports(const struct sbrec_igmp_group 
*sb_igmp_group,
 struct ovn_port *port =
 ovn_port_find(ovn_ports, sb_igmp_group->ports[i]->logical_port);
 
-if (!port) {
+if (!port || !port->nbsp) {
 continue;
 }
 
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-nb.xml: Fix "mcast_querier".

2023-02-01 Thread Han Zhou
On Wed, Feb 1, 2023 at 1:38 AM Dumitru Ceara  wrote:
>
> On 2/1/23 07:25, Han Zhou wrote:
> > Signed-off-by: Han Zhou 
> > ---
>
> Hi Han,
>
> Thanks for the fix!
>
> >  ovn-nb.xml | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index 217eb877b055..929f4c966966 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -630,7 +630,7 @@
> >  These options control IP Multicast Snooping configuration of
the
> >  logical switch. To enable IP Multicast Snooping set
> >   to true. To
enable IP
> > -Multicast Querier set 
> > +Multicast Querier set 
> >  to true. If IP Multicast Querier is enabled
> >   and
> >   must be set.
>
> This looks good to me so:
>
> Acked-by: Dumitru Ceara 
>
> But I'm wondering if we should also document the defaults:
>
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 929f4c9669..4b52b99533 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -638,10 +638,14 @@
>type='{"type": "boolean"}'>
>  Enables/disables IP Multicast Snooping on the logical switch.
> +Default: false.
>
>type='{"type": "boolean"}'>
>  Enables/disables IP Multicast Querier on the logical switch.
> +Only applicable if 
> +is enabled.
> +Default: true.
>
>type='{"type": "boolean"}'>
> ---
>
> If you agree, do you think it's OK to squash this into your patch
> before applying it?
>
> Thanks,
> Dumitru
>
>
Thanks Dumitru! I folded your change into the patch and applied to main.

Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

2023-02-01 Thread Fedor Pchelkin
Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
not freed when an allocation of a key fails.

BUG: memory leak
unreferenced object 0x888116668000 (size 632):
  comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmem_cache_zalloc include/linux/slab.h:654 [inline]
[] ovs_flow_alloc+0x19/0x180 
net/openvswitch/flow_table.c:77
[] ovs_flow_cmd_new+0x1de/0xd40 
net/openvswitch/datapath.c:957
[<10a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 
net/netlink/genetlink.c:739
[] genl_family_rcv_msg net/netlink/genetlink.c:783 
[inline]
[] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
[<0286dd87>] netlink_rcv_skb+0x153/0x430 
net/netlink/af_netlink.c:2515
[<61fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
[<9dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 
[inline]
[<9dc0f111>] netlink_unicast+0x545/0x7f0 
net/netlink/af_netlink.c:1339
[<4a5ee816>] netlink_sendmsg+0x8e7/0xde0 
net/netlink/af_netlink.c:1934
[<482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
[<482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
[<698574ba>] sys_sendmsg+0x70a/0x870 net/socket.c:2356
[] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
[<83ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
[] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
[<4abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

To fix this the patch rearranges the goto labels to reflect the order of
object allocations and adds appropriate goto statements on the error
paths.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key")
Signed-off-by: Fedor Pchelkin 
Signed-off-by: Alexey Khoroshilov 
---
v1->v2: make goto statements structured

 net/openvswitch/datapath.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a71795355aec..fcee6012293b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1004,14 +1004,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
key = kzalloc(sizeof(*key), GFP_KERNEL);
if (!key) {
error = -ENOMEM;
-   goto err_kfree_key;
+   goto err_kfree_flow;
}
 
ovs_match_init(, key, false, );
error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
  a[OVS_FLOW_ATTR_MASK], log);
if (error)
-   goto err_kfree_flow;
+   goto err_kfree_key;
 
ovs_flow_mask_key(_flow->key, key, true, );
 
@@ -1019,14 +1019,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
error = ovs_nla_get_identifier(_flow->id, a[OVS_FLOW_ATTR_UFID],
   key, log);
if (error)
-   goto err_kfree_flow;
+   goto err_kfree_key;
 
/* Validate actions. */
error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
 _flow->key, , log);
if (error) {
OVS_NLERR(log, "Flow actions may not be safe on all matching 
packets.");
-   goto err_kfree_flow;
+   goto err_kfree_key;
}
 
reply = ovs_flow_cmd_alloc_info(acts, _flow->id, info, false,
@@ -1126,10 +1126,10 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
kfree_skb(reply);
 err_kfree_acts:
ovs_nla_free_flow_actions(acts);
-err_kfree_flow:
-   ovs_flow_free(new_flow, false);
 err_kfree_key:
kfree(key);
+err_kfree_flow:
+   ovs_flow_free(new_flow, false);
 error:
return error;
 }
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

2023-02-01 Thread Simon Horman
On Wed, Feb 01, 2023 at 07:28:09PM +0300, Fedor Pchelkin wrote:
> On 2/1/23 6:45 PM, Simon Horman wrote:
> > I see this would work by virtue of kfree(key) doing nothing
> > of key is NULL, the error case in question. And that otherwise key is
> > non-NULL if this path is hit.
> > 
> > However, the idiomatic approach to error handling is for the error path
> > to unwind resource allocations in the reverse order that they were made.
> > And for goto labels to control how far to unwind.
> > 
> 
> You are right, thanks. Have to keep 'goto' structured, otherwise there
> would be a 'goto' mess.
> 
> > So I think the following would be more in keeping with the intention of the
> > code. Even if it is a somewhat more verbose change.
> > 
> > *compile tested only!*
> 
> I'll test this on error paths and resend the patch.

Thanks, much appreciated.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] sparse: Fix build with DPDK and GCC 12.

2023-02-01 Thread David Marchand
rte_vect.h pulls some AVX512 instrinsics headers added in GCC 12 [1]
trigger a lot of warnings:

libtool: compile:  env "REAL_CC=ccache gcc" "CHECK=sparse -Wsparse-error
-I ../include/sparse -I ../include -m64 -I /usr/local/include
" cgcc -target=x86_64 -target=host_os_specs -D__MMX__=1
-D__MMX_WITH_SSE__=1 -D__SSE2_MATH__=1 -D__SSE_MATH__=1
-D__SSE__=1 -D__SSE2__=1 -DHAVE_CONFIG_H -I. -I.. -I ../include
-I ./include -I ../lib -I ./lib -Wstrict-prototypes -Wall
-Wextra -Wno-sign-compare -Wpointer-arith -Wformat
-Wformat-security -Wswitch-enum -Wunused-parameter
-Wbad-function-cast -Wcast-align -Wstrict-prototypes
-Wold-style-definition -Wmissing-prototypes
-Wmissing-field-initializers -fno-strict-aliasing
-Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument
-Wbool-compare -Wshift-negative-value -Wduplicated-cond -Wshadow
-Wmultistatement-macros -Wcast-align=strict -mssse3
-I/home/dmarchan/git/pub/dpdk.org/22.11/install/include
-include rte_config.h -I/usr/local/include -Werror
-D_FILE_OFFSET_BITS=64 -g -O2 -MT lib/bfd.lo -MD -MP
-MF lib/.deps/bfd.Tpo -c ../lib/bfd.c -o lib/bfd.o
../lib/bfd.c: note: in included file (through
/usr/lib/gcc/x86_64-redhat-linux/12//include/immintrin.h,
/usr/lib/gcc/x86_64-redhat-linux/12//include/x86intrin.h, ...):
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:38:9:
error: '_Float16' has implicit type
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:38:18:
error: Expected ; at end of declaration
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:38:18:
error: got __v8hf
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:62:41:
error: Expected ; at end of statement
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:62:41:
error: got {
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:420:32:
error: Expected ) in expression
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:420:32:
error: got __A
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2271:61:
error: Expected ) in function call
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2271:61:
error: got __A
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2279:61:
error: Expected ) in function call
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2279:61:
error: got __A
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2328:50:
error: Expected ) in function call
[...]

Besides, the list of headers by rte_memcpy.h is now out of sync with DPDK.
OVS should not have to care about this in any case: OVS takes care to
include the right headers in its sources, and DPDK now checks that its
exported headers are self-sufficient.

There should be no side effect in removing those headers inclusions from
the rte_memcpy.h fake header.

1: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a68412117fa4

Signed-off-by: David Marchand 
---
 include/sparse/rte_memcpy.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/include/sparse/rte_memcpy.h b/include/sparse/rte_memcpy.h
index 5cd3f013ea..6ecc28ae19 100644
--- a/include/sparse/rte_memcpy.h
+++ b/include/sparse/rte_memcpy.h
@@ -20,12 +20,6 @@
 #error "Use this header only with sparse.  It is not a correct implementation."
 #endif
 
-/* Include the same headers as the real rte_memcpy(). */
-#include 
-#include 
-#include 
-#include 
-
 /* Declare the same functions as the real rte_memcpy.h, without defining them.
  * This gives sparse the information it needs without provoking sparse's
  * complaints about the implementations. */
-- 
2.39.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

2023-02-01 Thread Fedor Pchelkin

On 2/1/23 6:45 PM, Simon Horman wrote:

I see this would work by virtue of kfree(key) doing nothing
of key is NULL, the error case in question. And that otherwise key is
non-NULL if this path is hit.

However, the idiomatic approach to error handling is for the error path
to unwind resource allocations in the reverse order that they were made.
And for goto labels to control how far to unwind.



You are right, thanks. Have to keep 'goto' structured, otherwise there
would be a 'goto' mess.


So I think the following would be more in keeping with the intention of the
code. Even if it is a somewhat more verbose change.

*compile tested only!*


I'll test this on error paths and resend the patch.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] tests: Fix Flaky system-tests "omit connection tracking ..."

2023-02-01 Thread Xavier Simonart
While conntrack entries were flushed between the sub-tests, flows from
previous tests might still be present, causing conntrack entries to be
re-created.

Fixes: a0f82efdd9df ("northd: bypass connection tracking for stateless flows 
when there are LB flows present")

Signed-off-by: Xavier Simonart 
---
 tests/system-ovn.at | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 2ece0f571..505c7f916 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -9839,7 +9839,7 @@ ovn-nbctl ls-lb-del foo lb1
 
 # add stateless acl
 check ovn-nbctl acl-add foo from-lport 1 1 allow-stateless
-check ovn-nbctl acl-add foo to-lport 1 1 allow-stateless
+check ovn-nbctl --wait=hv acl-add foo to-lport 1 1 allow-stateless
 
 AT_CHECK([ip netns exec foo1 wget   192.168.2.2 -t 3 -T 1], [0], [ignore], 
[ignore])
 
@@ -9984,7 +9984,7 @@ ovn-nbctl ls-lb-del foo lb1
 
 # add stateless acl
 check ovn-nbctl acl-add foo from-lport 1 1 allow-stateless
-check ovn-nbctl acl-add foo to-lport 1 1 allow-stateless
+check ovn-nbctl --wait=hv acl-add foo to-lport 1 1 allow-stateless
 
 AT_CHECK([ip netns exec foo1  wget http://[[fd12::2]] -t 3 -T 1], [0], 
[ignore], [ignore])
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

2023-02-01 Thread Simon Horman
On Tue, Jan 31, 2023 at 10:19:39PM +0300, Fedor Pchelkin wrote:
> Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
> not freed when an allocation of a key fails.
> 
> BUG: memory leak
> unreferenced object 0x888116668000 (size 632):
>   comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] kmem_cache_zalloc include/linux/slab.h:654 [inline]
> [] ovs_flow_alloc+0x19/0x180 
> net/openvswitch/flow_table.c:77
> [] ovs_flow_cmd_new+0x1de/0xd40 
> net/openvswitch/datapath.c:957
> [<10a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 
> net/netlink/genetlink.c:739
> [] genl_family_rcv_msg net/netlink/genetlink.c:783 
> [inline]
> [] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
> [<0286dd87>] netlink_rcv_skb+0x153/0x430 
> net/netlink/af_netlink.c:2515
> [<61fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
> [<9dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 
> [inline]
> [<9dc0f111>] netlink_unicast+0x545/0x7f0 
> net/netlink/af_netlink.c:1339
> [<4a5ee816>] netlink_sendmsg+0x8e7/0xde0 
> net/netlink/af_netlink.c:1934
> [<482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
> [<482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
> [<698574ba>] sys_sendmsg+0x70a/0x870 net/socket.c:2356
> [] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
> [<83ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
> [] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
> [<4abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> 
> To fix this the patch removes unnecessary err_kfree_key label and adds a
> proper goto statement on the key-allocation-error path.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key")
> Signed-off-by: Fedor Pchelkin 
> Signed-off-by: Alexey Khoroshilov 
> ---
>  net/openvswitch/datapath.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index a71795355aec..3d4b5d83d306 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   key = kzalloc(sizeof(*key), GFP_KERNEL);
>   if (!key) {
>   error = -ENOMEM;
> - goto err_kfree_key;
> + goto err_kfree_flow;
>   }
>  
>   ovs_match_init(, key, false, );
> @@ -1128,7 +1128,6 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   ovs_nla_free_flow_actions(acts);
>  err_kfree_flow:
>   ovs_flow_free(new_flow, false);
> -err_kfree_key:
>   kfree(key);
>  error:
>   return error;

I see this would work by virtue of kfree(key) doing nothing
of key is NULL, the error case in question. And that otherwise key is
non-NULL if this path is hit.

However, the idiomatic approach to error handling is for the error path
to unwind resource allocations in the reverse order that they were made.
And for goto labels to control how far to unwind.

So I think the following would be more in keeping with the intention of the
code. Even if it is a somewhat more verbose change.

*compile tested only!*

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a71795355aec..fcee6012293b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1004,14 +1004,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
key = kzalloc(sizeof(*key), GFP_KERNEL);
if (!key) {
error = -ENOMEM;
-   goto err_kfree_key;
+   goto err_kfree_flow;
}
 
ovs_match_init(, key, false, );
error = ovs_nla_get_match(net, , a[OVS_FLOW_ATTR_KEY],
  a[OVS_FLOW_ATTR_MASK], log);
if (error)
-   goto err_kfree_flow;
+   goto err_kfree_key;
 
ovs_flow_mask_key(_flow->key, key, true, );
 
@@ -1019,14 +1019,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
error = ovs_nla_get_identifier(_flow->id, a[OVS_FLOW_ATTR_UFID],
   key, log);
if (error)
-   goto err_kfree_flow;
+   goto err_kfree_key;
 
/* Validate actions. */
error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
 _flow->key, , log);
if (error) {
 

[ovs-dev] 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread 陶 缘
Hi, Jiri:

  
I looked into v2 content, both "From" field and "Signed-off-by" field are 
'Eddy Tao' now, in the CC list, 'Eddy Tao' was also
placed automatically/correctly

>>Hmm, I guess that the name should be rather "Dddy Taoyuan" ? Please fix, ==> 
>>this is point 1 I guess and it has been fixed in v2
>>also in your "From:" header and preferably email client. ===> this is point 2 
>>==> i think "git config --global user.name="Eddy Tao" also worked for this 
>>point

More about 'email client', smtp-mail.outlook.com is the smtp server and git 
send-email directs the patch file to to the recipients.
I did not have further email client settings regarding to name spelling.

Could you clarify more about the 'email client' part?

Thanks
eddy


发件人: Jiri Pirko 
发送时间: 2023年2月1日 13:44
收件人: taoyuan_e...@hotmail.com 
抄送: net...@vger.kernel.org ; Pravin B Shelar 
; David S. Miller ; Eric Dumazet 
; Jakub Kicinski ; Paolo Abeni 
; d...@openvswitch.org ; 
linux-ker...@vger.kernel.org 
主题: Re: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

Wed, Feb 01, 2023 at 02:24:39PM CET, taoyuan_e...@hotmail.com wrote:
>From: Eddy Tao 
>
>'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
>However, its size is hardcoded to CONFIG_NR_CPUS bits, which can be
>8192 by default, it costs memory and slows down ovs_flow_alloc.
>This fix uses actual CPU number instead
>
>Signed-off-by: Eddy Tao 

Eddy, looks like you missed my second comment to v1. Could you please
check again?

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovsdb-server: Fix handling of DNS name for listener configuration.

2023-02-01 Thread Frode Nordahl
Commit 08e9e5337383 fixed proper initialization of the dns-resolve
module, and made DNS resolution asynchronous.

A side effect of that change revealed a long standing logic bug
which broke ovsdb-server listener configuration using DNS names.

Previously this worked because the DNS resolution would block,
now that DNS resolution is asynchronous the code before this
change would assume the error from jsonrpc_pstream_open meant
the listener was in fact a specification for an active
outgoing connection, even when that was not the case.

To fix this a couple of changes was made to socket-util:
1) Pass optional result of dns resolution from inet_parse_passive.

When (re-)configuring listeners that use DNS names, we may need
to know whether the provided connection string is invalid or if
the provided DNS name has finished resolving.

2) Check dns resolution status in inet_open_passive.

If the connection string is valid, and contains a DNS name,
inet_open_passive will now return -ENODATA if dns resolution
failed.  DNS resolution failure may either mean the asynchronous
resolver has not completed yet, or that the name does not resolve.

Reported-at: https://bugs.launchpad.net/bugs/1998781
Fixes: 08e9e5337383 ("ovsdb: raft: Fix inability to read the database with DNS 
host names.")
Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving")
Signed-off-by: Frode Nordahl 
---
 lib/socket-util.c  | 13 ++---
 lib/socket-util.h  |  3 ++-
 ovsdb/jsonrpc-server.c | 43 ++
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/lib/socket-util.c b/lib/socket-util.c
index 38705cc51..eba350ccb 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -660,7 +660,8 @@ exit:
  * zeros '*ss' and returns false. */
 bool
 inet_parse_passive(const char *target_, int default_port,
-   struct sockaddr_storage *ss)
+   struct sockaddr_storage *ss,
+   bool resolve_host, bool *dns_failure)
 {
 char *target = xstrdup(target_);
 char *port, *host;
@@ -672,7 +673,7 @@ inet_parse_passive(const char *target_, int default_port,
 ok = false;
 } else {
 ok = parse_sockaddr_components(ss, host, port, default_port,
-   target_, true, NULL);
+   target_, resolve_host, dns_failure);
 }
 if (!ok) {
 memset(ss, 0, sizeof *ss);
@@ -710,8 +711,14 @@ inet_open_passive(int style, const char *target, int 
default_port,
 struct sockaddr_storage ss;
 int fd = 0, error;
 unsigned int yes = 1;
+bool dns_failure;
 
-if (!inet_parse_passive(target, default_port, )) {
+if (!inet_parse_passive(target, default_port, , true, _failure)) {
+if (dns_failure) {
+/* DNS failure means asynchronous DNS resolution is in progress,
+ * or that the name does currently not resolve. */
+return -ENODATA;
+}
 return -EAFNOSUPPORT;
 }
 kernel_chooses_port = ss_get_port() == 0;
diff --git a/lib/socket-util.h b/lib/socket-util.h
index bf66393df..4eec627e3 100644
--- a/lib/socket-util.h
+++ b/lib/socket-util.h
@@ -55,7 +55,8 @@ int inet_open_active(int style, const char *target, int 
default_port,
  struct sockaddr_storage *ssp, int *fdp, uint8_t dscp);
 
 bool inet_parse_passive(const char *target, int default_port,
-struct sockaddr_storage *ssp);
+struct sockaddr_storage *ssp,
+bool resolve_host, bool *dns_failure);
 int inet_open_passive(int style, const char *target, int default_port,
   struct sockaddr_storage *ssp, uint8_t dscp,
   bool kernel_print_port);
diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
index 916a1f414..b4eb17467 100644
--- a/ovsdb/jsonrpc-server.c
+++ b/ovsdb/jsonrpc-server.c
@@ -267,25 +267,36 @@ ovsdb_jsonrpc_server_add_remote(struct 
ovsdb_jsonrpc_server *svr,
 int error;
 
 error = jsonrpc_pstream_open(name, , options->dscp);
-if (error && error != EAFNOSUPPORT) {
-VLOG_ERR_RL(, "%s: listen failed: %s", name, ovs_strerror(error));
-return NULL;
-}
+switch (error) {
+case 0:
+case EAFNOSUPPORT:
+remote = xmalloc(sizeof *remote);
+remote->server = svr;
+remote->listener = listener;
+ovs_list_init(>sessions);
+remote->dscp = options->dscp;
+remote->read_only = options->read_only;
+remote->role = nullable_xstrdup(options->role);
+shash_add(>remotes, name, remote);
+if (!listener) {
+/* Not a listener, attempt creation of active jsonrpc sesssion. */
+ovsdb_jsonrpc_session_create(remote,
+ jsonrpc_session_open(name, true),
+ svr->read_only || remote->read_only);
+ 

[ovs-dev] [PATCH] net: openvswitch: fix flow memory leak in ovs_flow_cmd_new

2023-02-01 Thread Fedor Pchelkin
Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
not freed when an allocation of a key fails.

BUG: memory leak
unreferenced object 0x888116668000 (size 632):
  comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmem_cache_zalloc include/linux/slab.h:654 [inline]
[] ovs_flow_alloc+0x19/0x180 
net/openvswitch/flow_table.c:77
[] ovs_flow_cmd_new+0x1de/0xd40 
net/openvswitch/datapath.c:957
[<10a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 
net/netlink/genetlink.c:739
[] genl_family_rcv_msg net/netlink/genetlink.c:783 
[inline]
[] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
[<0286dd87>] netlink_rcv_skb+0x153/0x430 
net/netlink/af_netlink.c:2515
[<61fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
[<9dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 
[inline]
[<9dc0f111>] netlink_unicast+0x545/0x7f0 
net/netlink/af_netlink.c:1339
[<4a5ee816>] netlink_sendmsg+0x8e7/0xde0 
net/netlink/af_netlink.c:1934
[<482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
[<482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
[<698574ba>] sys_sendmsg+0x70a/0x870 net/socket.c:2356
[] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
[<83ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
[] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
[<4abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

To fix this the patch removes unnecessary err_kfree_key label and adds a
proper goto statement on the key-allocation-error path.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key")
Signed-off-by: Fedor Pchelkin 
Signed-off-by: Alexey Khoroshilov 
---
 net/openvswitch/datapath.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a71795355aec..3d4b5d83d306 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
key = kzalloc(sizeof(*key), GFP_KERNEL);
if (!key) {
error = -ENOMEM;
-   goto err_kfree_key;
+   goto err_kfree_flow;
}
 
ovs_match_init(, key, false, );
@@ -1128,7 +1128,6 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
ovs_nla_free_flow_actions(acts);
 err_kfree_flow:
ovs_flow_free(new_flow, false);
-err_kfree_key:
kfree(key);
 error:
return error;
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread Jiri Pirko
Wed, Feb 01, 2023 at 02:24:39PM CET, taoyuan_e...@hotmail.com wrote:
>From: Eddy Tao 
>
>'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
>However, its size is hardcoded to CONFIG_NR_CPUS bits, which can be
>8192 by default, it costs memory and slows down ovs_flow_alloc.
>This fix uses actual CPU number instead
>
>Signed-off-by: Eddy Tao 

Eddy, looks like you missed my second comment to v1. Could you please
check again?

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread taoyuan_eddy
From: Eddy Tao 

'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
However, its size is hardcoded to CONFIG_NR_CPUS bits, which can be
8192 by default, it costs memory and slows down ovs_flow_alloc.
This fix uses actual CPU number instead

Signed-off-by: Eddy Tao 
---
 net/openvswitch/flow.c   |  6 +++---
 net/openvswitch/flow.h   |  2 +-
 net/openvswitch/flow_table.c | 24 +---
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index e20d1a973417..06345cd8c777 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -107,7 +107,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 
tcp_flags,
 
rcu_assign_pointer(flow->stats[cpu],
   new_stats);
-   cpumask_set_cpu(cpu, 
>cpu_used_mask);
+   cpumask_set_cpu(cpu, 
flow->cpu_used_mask);
goto unlock;
}
}
@@ -135,7 +135,7 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
/* We open code this to make sure cpu 0 is always considered */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>cpu_used_mask)) {
+   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
flow->cpu_used_mask)) {
struct sw_flow_stats *stats = 
rcu_dereference_ovsl(flow->stats[cpu]);
 
if (stats) {
@@ -159,7 +159,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
int cpu;
 
/* We open code this to make sure cpu 0 is always considered */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>cpu_used_mask)) {
+   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
flow->cpu_used_mask)) {
struct sw_flow_stats *stats = 
ovsl_dereference(flow->stats[cpu]);
 
if (stats) {
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 073ab73ffeaa..b5711aff6e76 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -229,7 +229,7 @@ struct sw_flow {
 */
struct sw_flow_key key;
struct sw_flow_id id;
-   struct cpumask cpu_used_mask;
+   struct cpumask *cpu_used_mask;
struct sw_flow_mask *mask;
struct sw_flow_actions __rcu *sf_acts;
struct sw_flow_stats __rcu *stats[]; /* One for each CPU.  First one
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 0a0e4c283f02..63c95c9a814d 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -87,11 +87,12 @@ struct sw_flow *ovs_flow_alloc(void)
if (!stats)
goto err;
 
+   flow->cpu_used_mask = (struct cpumask *)>stats[nr_cpu_ids];
spin_lock_init(>lock);
 
RCU_INIT_POINTER(flow->stats[0], stats);
 
-   cpumask_set_cpu(0, >cpu_used_mask);
+   cpumask_set_cpu(0, flow->cpu_used_mask);
 
return flow;
 err:
@@ -115,7 +116,7 @@ static void flow_free(struct sw_flow *flow)
  flow->sf_acts);
/* We open code this to make sure cpu 0 is always considered */
for (cpu = 0; cpu < nr_cpu_ids;
-cpu = cpumask_next(cpu, >cpu_used_mask)) {
+cpu = cpumask_next(cpu, flow->cpu_used_mask)) {
if (flow->stats[cpu])
kmem_cache_free(flow_stats_cache,
(struct sw_flow_stats __force 
*)flow->stats[cpu]);
@@ -1194,9 +1195,26 @@ int ovs_flow_init(void)
BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long));
BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
+   /* Simply embedding 'struct cpumask' in 'struct sw_flow'
+* consumes memory unnecessarily large. Cpumask is an bitmap
+* of CONFIG_NR_CPUS bits, which is hardcoded in .config
+* and default value can be 8192, in this case is 1024 bytes.
+* It drops ovs_flow_alloc performance and cost memory.
+* We should use actual CPU count instead of hardcoded value.
+*
+* To address this, 'cpu_used_mask' is redefined to pointer
+* and append a cpumask_size() after 'stat' to hold the memory
+* for struct cpumask.
+*
+* cpumask APIs like cpumask_next and cpumask_set_cpu are defined
+* to never access bits beyond cpu count, as such above change is
+* safe even though the actual memory provided is smaller than
+* sizeof(struct cpumask)
+*/
flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
   + (nr_cpu_ids
- * sizeof(struct sw_flow_stats *)),
+

Re: [ovs-dev] [PATCH] net: openvswitch: reduce cpu_used_mask memory consumption

2023-02-01 Thread Jiri Pirko
Tue, Jan 31, 2023 at 02:58:22PM CET, taoyuan_e...@hotmail.com wrote:
>From: eddytaoyuan 
>
>struct cpumask cpu_used_mask is directly embedded in struct sw_flow
>however, its size is hardcoded to CONFIG_NR_CPUS bits, which
>can be as large as 8192 by default, it cost memory and slows down
>ovs_flow_alloc, this fix used actual CPU number instead
>
>Signed-off-by: eddytaoyuan 

Hmm, I guess that the name should be rather "Dddy Taoyuan" ? Please fix,
also in your "From:" header and preferably email client.


>---
> net/openvswitch/flow.c   |  6 +++---
> net/openvswitch/flow.h   |  2 +-
> net/openvswitch/flow_table.c | 25 ++---
> 3 files changed, 26 insertions(+), 7 deletions(-)
>
>diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>index e20d1a973417..06345cd8c777 100644
>--- a/net/openvswitch/flow.c
>+++ b/net/openvswitch/flow.c
>@@ -107,7 +107,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 
>tcp_flags,
> 
>   rcu_assign_pointer(flow->stats[cpu],
>  new_stats);
>-  cpumask_set_cpu(cpu, 
>>cpu_used_mask);
>+  cpumask_set_cpu(cpu, 
>flow->cpu_used_mask);
>   goto unlock;
>   }
>   }
>@@ -135,7 +135,7 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
>   memset(ovs_stats, 0, sizeof(*ovs_stats));
> 
>   /* We open code this to make sure cpu 0 is always considered */
>-  for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>>cpu_used_mask)) {
>+  for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>flow->cpu_used_mask)) {
>   struct sw_flow_stats *stats = 
> rcu_dereference_ovsl(flow->stats[cpu]);
> 
>   if (stats) {
>@@ -159,7 +159,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
>   int cpu;
> 
>   /* We open code this to make sure cpu 0 is always considered */
>-  for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>>cpu_used_mask)) {
>+  for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>flow->cpu_used_mask)) {
>   struct sw_flow_stats *stats = 
> ovsl_dereference(flow->stats[cpu]);
> 
>   if (stats) {
>diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
>index 073ab73ffeaa..b5711aff6e76 100644
>--- a/net/openvswitch/flow.h
>+++ b/net/openvswitch/flow.h
>@@ -229,7 +229,7 @@ struct sw_flow {
>*/
>   struct sw_flow_key key;
>   struct sw_flow_id id;
>-  struct cpumask cpu_used_mask;
>+  struct cpumask *cpu_used_mask;
>   struct sw_flow_mask *mask;
>   struct sw_flow_actions __rcu *sf_acts;
>   struct sw_flow_stats __rcu *stats[]; /* One for each CPU.  First one
>diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>index 0a0e4c283f02..c0fdff73272f 100644
>--- a/net/openvswitch/flow_table.c
>+++ b/net/openvswitch/flow_table.c
>@@ -87,11 +87,12 @@ struct sw_flow *ovs_flow_alloc(void)
>   if (!stats)
>   goto err;
> 
>+  flow->cpu_used_mask = (struct cpumask *)&(flow->stats[nr_cpu_ids]);
>   spin_lock_init(>lock);
> 
>   RCU_INIT_POINTER(flow->stats[0], stats);
> 
>-  cpumask_set_cpu(0, >cpu_used_mask);
>+  cpumask_set_cpu(0, flow->cpu_used_mask);
> 
>   return flow;
> err:
>@@ -115,7 +116,7 @@ static void flow_free(struct sw_flow *flow)
> flow->sf_acts);
>   /* We open code this to make sure cpu 0 is always considered */
>   for (cpu = 0; cpu < nr_cpu_ids;
>-   cpu = cpumask_next(cpu, >cpu_used_mask)) {
>+   cpu = cpumask_next(cpu, flow->cpu_used_mask)) {
>   if (flow->stats[cpu])
>   kmem_cache_free(flow_stats_cache,
>   (struct sw_flow_stats __force 
> *)flow->stats[cpu]);
>@@ -1194,9 +1195,27 @@ int ovs_flow_init(void)
>   BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long));
>   BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
> 
>+/*
>+ * Simply including 'struct cpumask' in 'struct sw_flow'
>+ * consumes memory unnecessarily large.
>+ * The reason is that compilation option CONFIG_NR_CPUS(default 8192)
>+ * decides the value of NR_CPUS, which in turn decides size of
>+ * 'struct cpumask', which means 1024 bytes are needed for the cpumask
>+ * It affects ovs_flow_alloc performance as well as memory footprint.
>+ * We should use actual CPU count instead of hardcoded value.
>+ *
>+ * To address this, 'cpu_used_mask' is redefined to pointer
>+ * and append a cpumask_size() after 'stat' to hold the actual memory
>+ * of struct cpumask
>+ *
>+ * cpumask APIs like cpumask_next and cpumask_set_cpu have been 

[ovs-dev] [PATCH v4] netdev-offload-tc: Preserve tc statistics when flow gets modified.

2023-02-01 Thread Eelco Chaudron
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.

This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.

It also updates the check_pkt_len tc test to purge the flows, so we do
not use existing updated tc flow counters, but start with fresh installed
set of datapath flows.

Signed-off-by: Eelco Chaudron 
---
Please note that for now two copies of the test case exists, but I will clean
this up once this gets applied by submitting a new revision of the
'tests: Add system-traffic.at tests to check-offloads' series.

-v2: Do not update the stats->used, as in terse dump they should be 0.
-v3: Added some comments based on the v2 review.
-v4: Re-based on latest upstream master.
 Updated commit message.
 Updated system-traffic test name.
 Fixed byte counter usage in check_pkt_lent test cases.
 Synced both new tests to use the same byte counter approach.

 lib/netdev-offload-tc.c  |   99 --
 lib/tc.h |1 
 tests/system-offloads-traffic.at |   78 +++---
 tests/system-traffic.at  |   64 +
 4 files changed, 219 insertions(+), 23 deletions(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 6e1bbaa28..134c24157 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -97,6 +97,12 @@ static int netdev_tc_parse_nl_actions(struct netdev *netdev,
   bool *recirc_act, bool more_actions,
   struct tc_action **need_jump_update);
 
+static void parse_tc_flower_to_stats(struct tc_flower *flower,
+ struct dpif_flow_stats *stats);
+
+static int get_ufid_adjust_stats(const ovs_u128 *ufid,
+ struct dpif_flow_stats *stats);
+
 static bool
 is_internal_port(const char *type)
 {
@@ -193,6 +199,9 @@ static struct ovs_mutex ufid_lock = OVS_MUTEX_INITIALIZER;
  * @ufid: ufid assigned to the flow
  * @id: tc filter id (tcf_id)
  * @netdev: netdev associated with the tc rule
+ * @adjust_stats: When flow gets updated with new actions, we need to adjust
+ *the reported stats to include previous values as the hardware
+ *rule is removed and re-added. This stats copy is used for it.
  */
 struct ufid_tc_data {
 struct hmap_node ufid_to_tc_node;
@@ -200,6 +209,7 @@ struct ufid_tc_data {
 ovs_u128 ufid;
 struct tcf_id id;
 struct netdev *netdev;
+struct dpif_flow_stats adjust_stats;
 };
 
 static void
@@ -233,12 +243,38 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
 ovs_mutex_unlock(_lock);
 }
 
+static void
+netdev_tc_adjust_stats(struct dpif_flow_stats *stats,
+   const struct dpif_flow_stats *adjust_stats)
+{
+/* Do not try to restore the stats->used, as in terse mode dumps TC doesn't
+ * report TCA_ACT_OPTIONS, so the 'lastused' value is not available, hence
+ * we report used as 0.
+ * tcp_flags is not collected by tc, so no need to update it. */
+stats->n_bytes += adjust_stats->n_bytes;
+stats->n_packets += adjust_stats->n_packets;
+}
+
 /* Wrapper function to delete filter and ufid tc mapping */
 static int
-del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid)
+del_filter_and_ufid_mapping(struct tcf_id *id, const ovs_u128 *ufid,
+struct dpif_flow_stats *stats)
 {
+struct tc_flower flower;
 int err;
 
+if (stats) {
+memset(stats, 0, sizeof *stats);
+if (!tc_get_flower(id, )) {
+struct dpif_flow_stats adjust_stats;
+
+parse_tc_flower_to_stats(, stats);
+if (!get_ufid_adjust_stats(ufid, _stats)) {
+netdev_tc_adjust_stats(stats, _stats);
+}
+}
+}
+
 err = tc_del_flower_filter(id);
 if (!err) {
 del_ufid_tc_mapping(ufid);
@@ -249,7 +285,7 @@ del_filter_and_ufid_mapping(struct tcf_id *id, const 
ovs_u128 *ufid)
 /* Add ufid entry to ufid_to_tc hashmap. */
 static void
 add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 *ufid,
-struct tcf_id *id)
+struct tcf_id *id, struct dpif_flow_stats *stats)
 {
 struct ufid_tc_data *new_data = xzalloc(sizeof *new_data);
 size_t ufid_hash = hash_bytes(ufid, sizeof *ufid, 0);
@@ -261,6 +297,9 @@ add_ufid_tc_mapping(struct netdev *netdev, const ovs_u128 
*ufid,
 new_data->ufid = *ufid;
 new_data->id = *id;
 new_data->netdev = netdev_ref(netdev);
+if (stats) {
+new_data->adjust_stats = *stats;
+}
 
 ovs_mutex_lock(_lock);
 hmap_insert(_to_tc, _data->ufid_to_tc_node, ufid_hash);
@@ -292,6 +331,30 @@ get_ufid_tc_mapping(const ovs_u128 *ufid, struct tcf_id 

Re: [ovs-dev] [PATCH v3] netdev-offload-tc: Preserve tc statistics when flow gets modified.

2023-02-01 Thread Eelco Chaudron



On 1 Feb 2023, at 12:02, Ilya Maximets wrote:

> On 1/31/23 15:38, Eelco Chaudron wrote:
>>
>>
>> On 31 Jan 2023, at 14:13, Ilya Maximets wrote:
>>
>>> On 1/13/23 13:57, Eelco Chaudron wrote:
 When a flow gets modified, i.e. the actions are changes, the tc layer will
 remove, and re-add the flow. This is causing all the counters to be reset.

 This patch will remember the previous tc counters and adjust any requests
 for statistics. This is done in a similar way as the rte_flow 
 implementation.

 It also updates the check_pkt_len tc test to purge the flows, so we do
 not use updated tc flows, with their counters.
>>>
>>> Could you clarify what exactly you mean here?
>>
>> I hope this is more clear:
>>
>> It also updates the check_pkt_len tc test to purge the flows, so we do
>> not use existing updated tc flow counters, but start with fresh installed
>> set of datapath flows.
>>

 Signed-off-by: Eelco Chaudron 

 ---
 -v2: Do not update the stats->used, as in terse dump they should be 0.
 -v3: Added some comments based on the v2 review.

 Please note that for now two copies of the test case exists, but I will 
 clean
 this up once this gets applied by submitting a new revision of the
 'tests: Add system-traffic.at tests to check-offloads' series.

  lib/netdev-offload-tc.c  |   98 
 --
  lib/tc.h |1
  tests/system-offloads-traffic.at |   65 +++--
  tests/system-traffic.at  |   64 +
  4 files changed, 207 insertions(+), 21 deletions(-)

 diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
 index ce7f8ad97..59c113187 100644
 --- a/lib/netdev-offload-tc.c
 +++ b/lib/netdev-offload-tc.c
 @@ -97,6 +97,12 @@ static int netdev_tc_parse_nl_actions(struct netdev 
 *netdev,
bool *recirc_act, bool more_actions,
struct tc_action 
 **need_jump_update);

 +static void parse_tc_flower_to_stats(struct tc_flower *flower,
 + struct dpif_flow_stats *stats);
 +
 +static int get_ufid_adjust_stats(const ovs_u128 *ufid,
 + struct dpif_flow_stats *stats);
>>>
>>> No need to specify parameter names in prototypes.
>>
>> I know,  but all forward declarations in this c file have them included, so 
>> I decided to do the same.
>> So I kept them for now, but let me know if you think they should be removed.
>>
 +
  static bool
  is_internal_port(const char *type)
  {
 @@ -193,6 +199,9 @@ static struct ovs_mutex ufid_lock = 
 OVS_MUTEX_INITIALIZER;
   * @ufid: ufid assigned to the flow
   * @id: tc filter id (tcf_id)
   * @netdev: netdev associated with the tc rule
 + * @adjust_stats: When flow gets updated with new actions, we need to 
 adjust
 + *the reported stats to include previous values as the 
 hardware
 + *rule is removed and re-added. This stats copy is used 
 for it.
   */
  struct ufid_tc_data {
  struct hmap_node ufid_to_tc_node;
 @@ -200,6 +209,7 @@ struct ufid_tc_data {
  ovs_u128 ufid;
  struct tcf_id id;
  struct netdev *netdev;
 +struct dpif_flow_stats adjust_stats;
  };

  static void
 @@ -233,12 +243,37 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
  ovs_mutex_unlock(_lock);
  }

 +static void
 +netdev_tc_adjust_stats(struct dpif_flow_stats *stats,
 +   struct dpif_flow_stats *adjust_stats)
>>>
>>> The adjust_stats pointer could be const.
>>
>> Will do in next rev.
>>
 +{
 +/* Do not try to restore the stats->used, as in terse
 + * mode dumps we will always report them as 0.
>>>
>>> I'm not sure I understand that part.  Why we will report them
>>> as zero?
>>
>> If we do a terse dump, we will not process the TLV holding this value, and 
>> we will return 0.
>> The revalidator code will override/update this value in udpif_update_used().
>>
>>
>>> Not restoring the 'used' might not be a problem for a general
>>> case, because freshly gathered stats will have more up to date
>>> 'used' value, but that might be a problem for the 'never' case
>>> where updated fow was not used after modification but it was
>>> used before.
>>
>> For the revalidator use case it should always return 0, so it will take care 
>> of it with udpif_update_used(). If we return the last used value retriefd 
>> during the tc_get_flower() in del_filter_and_ufid_mapping(). It causes the 
>> revalidator process to fail, as it will always report the wrong/old value 
>> (as it would be larger than 0, which was what I had before).
>
> Hrm, I see.
>
> Maybe re-word the comment to 

Re: [ovs-dev] [PATCH v3] netdev-offload-tc: Preserve tc statistics when flow gets modified.

2023-02-01 Thread Ilya Maximets
On 1/31/23 15:38, Eelco Chaudron wrote:
> 
> 
> On 31 Jan 2023, at 14:13, Ilya Maximets wrote:
> 
>> On 1/13/23 13:57, Eelco Chaudron wrote:
>>> When a flow gets modified, i.e. the actions are changes, the tc layer will
>>> remove, and re-add the flow. This is causing all the counters to be reset.
>>>
>>> This patch will remember the previous tc counters and adjust any requests
>>> for statistics. This is done in a similar way as the rte_flow 
>>> implementation.
>>>
>>> It also updates the check_pkt_len tc test to purge the flows, so we do
>>> not use updated tc flows, with their counters.
>>
>> Could you clarify what exactly you mean here?
> 
> I hope this is more clear:
> 
> It also updates the check_pkt_len tc test to purge the flows, so we do
> not use existing updated tc flow counters, but start with fresh installed
> set of datapath flows.
> 
>>>
>>> Signed-off-by: Eelco Chaudron 
>>>
>>> ---
>>> -v2: Do not update the stats->used, as in terse dump they should be 0.
>>> -v3: Added some comments based on the v2 review.
>>>
>>> Please note that for now two copies of the test case exists, but I will 
>>> clean
>>> this up once this gets applied by submitting a new revision of the
>>> 'tests: Add system-traffic.at tests to check-offloads' series.
>>>
>>>  lib/netdev-offload-tc.c  |   98 
>>> --
>>>  lib/tc.h |1
>>>  tests/system-offloads-traffic.at |   65 +++--
>>>  tests/system-traffic.at  |   64 +
>>>  4 files changed, 207 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index ce7f8ad97..59c113187 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -97,6 +97,12 @@ static int netdev_tc_parse_nl_actions(struct netdev 
>>> *netdev,
>>>bool *recirc_act, bool more_actions,
>>>struct tc_action **need_jump_update);
>>>
>>> +static void parse_tc_flower_to_stats(struct tc_flower *flower,
>>> + struct dpif_flow_stats *stats);
>>> +
>>> +static int get_ufid_adjust_stats(const ovs_u128 *ufid,
>>> + struct dpif_flow_stats *stats);
>>
>> No need to specify parameter names in prototypes.
> 
> I know,  but all forward declarations in this c file have them included, so I 
> decided to do the same.
> So I kept them for now, but let me know if you think they should be removed.
> 
>>> +
>>>  static bool
>>>  is_internal_port(const char *type)
>>>  {
>>> @@ -193,6 +199,9 @@ static struct ovs_mutex ufid_lock = 
>>> OVS_MUTEX_INITIALIZER;
>>>   * @ufid: ufid assigned to the flow
>>>   * @id: tc filter id (tcf_id)
>>>   * @netdev: netdev associated with the tc rule
>>> + * @adjust_stats: When flow gets updated with new actions, we need to 
>>> adjust
>>> + *the reported stats to include previous values as the 
>>> hardware
>>> + *rule is removed and re-added. This stats copy is used 
>>> for it.
>>>   */
>>>  struct ufid_tc_data {
>>>  struct hmap_node ufid_to_tc_node;
>>> @@ -200,6 +209,7 @@ struct ufid_tc_data {
>>>  ovs_u128 ufid;
>>>  struct tcf_id id;
>>>  struct netdev *netdev;
>>> +struct dpif_flow_stats adjust_stats;
>>>  };
>>>
>>>  static void
>>> @@ -233,12 +243,37 @@ del_ufid_tc_mapping(const ovs_u128 *ufid)
>>>  ovs_mutex_unlock(_lock);
>>>  }
>>>
>>> +static void
>>> +netdev_tc_adjust_stats(struct dpif_flow_stats *stats,
>>> +   struct dpif_flow_stats *adjust_stats)
>>
>> The adjust_stats pointer could be const.
> 
> Will do in next rev.
> 
>>> +{
>>> +/* Do not try to restore the stats->used, as in terse
>>> + * mode dumps we will always report them as 0.
>>
>> I'm not sure I understand that part.  Why we will report them
>> as zero?
> 
> If we do a terse dump, we will not process the TLV holding this value, and we 
> will return 0.
> The revalidator code will override/update this value in udpif_update_used().
> 
> 
>> Not restoring the 'used' might not be a problem for a general
>> case, because freshly gathered stats will have more up to date
>> 'used' value, but that might be a problem for the 'never' case
>> where updated fow was not used after modification but it was
>> used before.
> 
> For the revalidator use case it should always return 0, so it will take care 
> of it with udpif_update_used(). If we return the last used value retriefd 
> during the tc_get_flower() in del_filter_and_ufid_mapping(). It causes the 
> revalidator process to fail, as it will always report the wrong/old value (as 
> it would be larger than 0, which was what I had before).

Hrm, I see.

Maybe re-word the comment to clarify that TC doesn't report
TCA_ACT_OPTIONS in terse dumps, so the 'lastused' value is
not available?

> 
>>> + * tcp_flags is not used by tc, so no need to update it. */

Re: [ovs-dev] [PATCH v1 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread Ilya Maximets
On 2/1/23 07:33, taoyuan_e...@hotmail.com wrote:
> From: eddytaoyuan 
> 
> 'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
> However, its size is hardcoded to CONFIG_NR_CPUS bits, which can be
> 8192 by default, it costs memory and slows down ovs_flow_alloc.
> This fix uses actual CPU number instead
> 
> This submission is for 2.17 LTS only.
> 
> Since datapath has been moved to linux kernel since 3.0, I will file
> seperate review to kernel community in another thread

Hi.  Thanks for the patch!

According to our process [1], kernel datapath patches should be accepted
to the upstream kernel first.  After that they can be backported to the
kernel module in the OVS tree.

However, the datapath implementation, that still exists in branch 2.17,
is deprecated, so we do not accept changes aside from critical bug fixes.
Your change seems to be a performance optimization and not a bug fix.

Also, fixes should be backported in the order they appear in the upstream
kernel, and we're currently missing a few.

Note: for the future, patches for branch-2.17 should have 'branch-2.17'
in the subject prefix, i.e. '[PATCH branch-2.17]'.  This way CI bots will
know where to apply them.  And backports should have a specific format
described in [1].


[1] 
https://docs.openvswitch.org/en/latest/internals/contributing/backporting-patches/#changes-to-linux-kernel-components

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread taoyuan_eddy
From: eddytaoyuan 

'struct cpumask cpu_used_mask' is embedded in struct sw_flow.
However, its size is hardcoded to CONFIG_NR_CPUS bits, which can be
8192 by default, it costs memory and slows down ovs_flow_alloc.
This fix uses actual CPU number instead

Signed-off-by: eddytaoyuan 
---
 net/openvswitch/flow.c   |  6 +++---
 net/openvswitch/flow.h   |  2 +-
 net/openvswitch/flow_table.c | 25 ++---
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index e20d1a973417..06345cd8c777 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -107,7 +107,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 
tcp_flags,
 
rcu_assign_pointer(flow->stats[cpu],
   new_stats);
-   cpumask_set_cpu(cpu, 
>cpu_used_mask);
+   cpumask_set_cpu(cpu, 
flow->cpu_used_mask);
goto unlock;
}
}
@@ -135,7 +135,7 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
/* We open code this to make sure cpu 0 is always considered */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>cpu_used_mask)) {
+   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
flow->cpu_used_mask)) {
struct sw_flow_stats *stats = 
rcu_dereference_ovsl(flow->stats[cpu]);
 
if (stats) {
@@ -159,7 +159,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
int cpu;
 
/* We open code this to make sure cpu 0 is always considered */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>cpu_used_mask)) {
+   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
flow->cpu_used_mask)) {
struct sw_flow_stats *stats = 
ovsl_dereference(flow->stats[cpu]);
 
if (stats) {
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 073ab73ffeaa..b5711aff6e76 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -229,7 +229,7 @@ struct sw_flow {
 */
struct sw_flow_key key;
struct sw_flow_id id;
-   struct cpumask cpu_used_mask;
+   struct cpumask *cpu_used_mask;
struct sw_flow_mask *mask;
struct sw_flow_actions __rcu *sf_acts;
struct sw_flow_stats __rcu *stats[]; /* One for each CPU.  First one
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 0a0e4c283f02..c0fdff73272f 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -87,11 +87,12 @@ struct sw_flow *ovs_flow_alloc(void)
if (!stats)
goto err;
 
+   flow->cpu_used_mask = (struct cpumask *)&(flow->stats[nr_cpu_ids]);
spin_lock_init(>lock);
 
RCU_INIT_POINTER(flow->stats[0], stats);
 
-   cpumask_set_cpu(0, >cpu_used_mask);
+   cpumask_set_cpu(0, flow->cpu_used_mask);
 
return flow;
 err:
@@ -115,7 +116,7 @@ static void flow_free(struct sw_flow *flow)
  flow->sf_acts);
/* We open code this to make sure cpu 0 is always considered */
for (cpu = 0; cpu < nr_cpu_ids;
-cpu = cpumask_next(cpu, >cpu_used_mask)) {
+cpu = cpumask_next(cpu, flow->cpu_used_mask)) {
if (flow->stats[cpu])
kmem_cache_free(flow_stats_cache,
(struct sw_flow_stats __force 
*)flow->stats[cpu]);
@@ -1194,9 +1195,27 @@ int ovs_flow_init(void)
BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long));
BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
+/*
+ * Simply including 'struct cpumask' in 'struct sw_flow'
+ * consumes memory unnecessarily large.
+ * The reason is that compilation option CONFIG_NR_CPUS(default 8192)
+ * decides the value of NR_CPUS, which in turn decides size of
+ * 'struct cpumask', which means 1024 bytes are needed for the cpumask
+ * It affects ovs_flow_alloc performance as well as memory footprint.
+ * We should use actual CPU count instead of hardcoded value.
+ *
+ * To address this, 'cpu_used_mask' is redefined to pointer
+ * and append a cpumask_size() after 'stat' to hold the actual memory
+ * of struct cpumask
+ *
+ * cpumask APIs like cpumask_next and cpumask_set_cpu have been defined
+ * to never access bits beyond cpu count by design, thus above change 
is
+ * safe even though the actual memory provided is smaller than
+ * sizeof(struct cpumask)
+ */
flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
  

Re: [ovs-dev] [PATCH ovn] ovn-nb.xml: Fix "mcast_querier".

2023-02-01 Thread Dumitru Ceara
On 2/1/23 07:25, Han Zhou wrote:
> Signed-off-by: Han Zhou 
> ---

Hi Han,

Thanks for the fix!

>  ovn-nb.xml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index 217eb877b055..929f4c966966 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -630,7 +630,7 @@
>  These options control IP Multicast Snooping configuration of the
>  logical switch. To enable IP Multicast Snooping set
>   to true. To enable IP
> -Multicast Querier set 
> +Multicast Querier set  key="mcast_querier"/>
>  to true. If IP Multicast Querier is enabled
>   and
>   must be set.

This looks good to me so:

Acked-by: Dumitru Ceara 

But I'm wondering if we should also document the defaults:

diff --git a/ovn-nb.xml b/ovn-nb.xml
index 929f4c9669..4b52b99533 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -638,10 +638,14 @@
   
 Enables/disables IP Multicast Snooping on the logical switch.
+Default: false.
   
   
 Enables/disables IP Multicast Querier on the logical switch.
+Only applicable if 
+is enabled.
+Default: true.
   
   
---

If you agree, do you think it's OK to squash this into your patch
before applying it?

Thanks,
Dumitru


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] net: openvswitch: reduce cpu_used_mask memory consumption

2023-02-01 Thread taoyuan_eddy
From: eddytaoyuan 

struct cpumask cpu_used_mask is directly embedded in struct sw_flow
however, its size is hardcoded to CONFIG_NR_CPUS bits, which
can be as large as 8192 by default, it cost memory and slows down
ovs_flow_alloc, this fix used actual CPU number instead

Signed-off-by: eddytaoyuan 
---
 net/openvswitch/flow.c   |  6 +++---
 net/openvswitch/flow.h   |  2 +-
 net/openvswitch/flow_table.c | 25 ++---
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index e20d1a973417..06345cd8c777 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -107,7 +107,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 
tcp_flags,
 
rcu_assign_pointer(flow->stats[cpu],
   new_stats);
-   cpumask_set_cpu(cpu, 
>cpu_used_mask);
+   cpumask_set_cpu(cpu, 
flow->cpu_used_mask);
goto unlock;
}
}
@@ -135,7 +135,7 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
/* We open code this to make sure cpu 0 is always considered */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>cpu_used_mask)) {
+   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
flow->cpu_used_mask)) {
struct sw_flow_stats *stats = 
rcu_dereference_ovsl(flow->stats[cpu]);
 
if (stats) {
@@ -159,7 +159,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
int cpu;
 
/* We open code this to make sure cpu 0 is always considered */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
>cpu_used_mask)) {
+   for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, 
flow->cpu_used_mask)) {
struct sw_flow_stats *stats = 
ovsl_dereference(flow->stats[cpu]);
 
if (stats) {
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 073ab73ffeaa..b5711aff6e76 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -229,7 +229,7 @@ struct sw_flow {
 */
struct sw_flow_key key;
struct sw_flow_id id;
-   struct cpumask cpu_used_mask;
+   struct cpumask *cpu_used_mask;
struct sw_flow_mask *mask;
struct sw_flow_actions __rcu *sf_acts;
struct sw_flow_stats __rcu *stats[]; /* One for each CPU.  First one
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 0a0e4c283f02..c0fdff73272f 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -87,11 +87,12 @@ struct sw_flow *ovs_flow_alloc(void)
if (!stats)
goto err;
 
+   flow->cpu_used_mask = (struct cpumask *)&(flow->stats[nr_cpu_ids]);
spin_lock_init(>lock);
 
RCU_INIT_POINTER(flow->stats[0], stats);
 
-   cpumask_set_cpu(0, >cpu_used_mask);
+   cpumask_set_cpu(0, flow->cpu_used_mask);
 
return flow;
 err:
@@ -115,7 +116,7 @@ static void flow_free(struct sw_flow *flow)
  flow->sf_acts);
/* We open code this to make sure cpu 0 is always considered */
for (cpu = 0; cpu < nr_cpu_ids;
-cpu = cpumask_next(cpu, >cpu_used_mask)) {
+cpu = cpumask_next(cpu, flow->cpu_used_mask)) {
if (flow->stats[cpu])
kmem_cache_free(flow_stats_cache,
(struct sw_flow_stats __force 
*)flow->stats[cpu]);
@@ -1194,9 +1195,27 @@ int ovs_flow_init(void)
BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long));
BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
+/*
+ * Simply including 'struct cpumask' in 'struct sw_flow'
+ * consumes memory unnecessarily large.
+ * The reason is that compilation option CONFIG_NR_CPUS(default 8192)
+ * decides the value of NR_CPUS, which in turn decides size of
+ * 'struct cpumask', which means 1024 bytes are needed for the cpumask
+ * It affects ovs_flow_alloc performance as well as memory footprint.
+ * We should use actual CPU count instead of hardcoded value.
+ *
+ * To address this, 'cpu_used_mask' is redefined to pointer
+ * and append a cpumask_size() after 'stat' to hold the actual memory
+ * of struct cpumask
+ *
+ * cpumask APIs like cpumask_next and cpumask_set_cpu have been defined
+ * to never access bits beyond cpu count by design, thus above change 
is
+ * safe even though the actual memory provided is smaller than
+ * sizeof(struct cpumask)
+ */
flow_cache = kmem_cache_create("sw_flow", sizeof(struct