from:"Jakub Kicinski"

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-25 Thread Jakub Kicinski

On Thu, 25 Apr 2024 19:57:19 +0100 Simon Horman wrote:
> openvswitch.sh does not appear to have any dependencies on Open vSwitch
> user-space. My understanding is that, rather, it makes use of
> tools/testing/selftests/net/openvswitch/ovs-dpctl.py to talk to the Kernel
> using Netlink (which is also what Open vSwitch user-space does).
> 
> My brief testing indicates that for this the only dependencies
> when running on Amazon Linux 2 are python3 and pyroute2.
> 
> I think that it should be possible to port pmtu.sh to use ovs-dpctl.py.
> This would require some enhancements to ovs-dpctl.py to handle adding the
> tunnel vports (interfaces).
> 
> As an aside, to run the Open vSwitch tests in pmtu.sh the openvswitch
> kernel module is needed. So I think it would make sense to add
> CONFIG_OPENVSWITCH to tools/testing/selftests/net/config.
> 
> That would mean that tools/testing/selftests/net/config also has all
> the requirements to run openvswitch.sh. If so, we probably wouldn't need to
> add tools/testing/selftests/net/openvswitch/config or otherwise do anything
> special to configure the kernel for openvswitch.sh.

That sounds great, for simplicity we could move the ovs files down 
to the .../net/ directory. It'd be cool to not have to do that, and
let us separate tests more clearly into directories. But right now
every directory has its own runner so there's a high price to pay
for a primarily aesthetic gain :(
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Jakub Kicinski

On Wed, 24 Apr 2024 17:44:05 +0100 Simon Horman wrote:
> I have recently been exercising the Open vSwitch kernel selftests,
> using vng,

Speaking of ovs tests, we currently don't run them in CI (and suffer
related skips in pmtu.sh) because Amazon Linux doesn't have ovs
packaged and building it looks pretty hard.

Is there an easy way to build just the CLI tooling or get a pre-built
package somewhere?

Or perhaps you'd be willing to run the OvS tests and we can move 
the part of pmtu.sh into OvS test dir?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4] net: openvswitch: Check vport netdev name

2024-04-22 Thread Jakub Kicinski

On Fri, 19 Apr 2024 14:14:25 +0800 Jun Gu wrote:
>   vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), name);
> - if (!vport->dev) {
> + /* Ensure that the device exists and that the provided
> +  * name is not one of its aliases.
> +  */
> + if (!vport->dev || strcmp(name, ovs_vport_name(vport))) {
>   err = -ENODEV;
>   goto error_free_vport;
>   }

Sorry I applied this before I realised that it's buggy.
dev_get_by_name() will give you a reference on the device.
You must free it, so the error handling is different.
Please follow up ASAP to fix that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4] net: openvswitch: Check vport netdev name

2024-04-19 Thread Jakub Kicinski

On Fri, 19 Apr 2024 12:31:33 +0800 Jun Gu wrote:
> From: "jun.gu" 
> 
> Ensure that the provided netdev name is not one of its aliases to
> prevent unnecessary creation and destruction of the vport by
> ovs-vswitchd.
> 
> Signed-off-by: jun.gu  
> Acked-by: Eelco Chaudron 

I said: When you repost, start a new thread, do not post new version
in-reply-to.

If you don't understand what something means - ask :|
Now try again.
-- 
pw-bot: cr
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4] net: openvswitch: Check vport netdev name

2024-04-18 Thread Jakub Kicinski

On Thu, 18 Apr 2024 10:32:42 +0800 jun.gu wrote:
> + if ((!vport->dev) || strcmp(name, ovs_vport_name(vport))) {

Please drop the unnecessary brackets.
When you repost, start a new thread, do not post new version
in-reply-to.
-- 
pw-bot: cr
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v3 2/3] net: openvswitch: remove unnecessary linux/genetlink.h include

2024-03-29 Thread Jakub Kicinski

The only legit reason I could think of for net/genetlink.h
and linux/genetlink.h to be separate would be if one was
included by other headers and we wanted to keep it lightweight.
That is not the case, net/openvswitch/meter.h includes
linux/genetlink.h but for no apparent reason (for struct genl_family
perhaps? it's not necessary, types of externs do not need
to be known).

Signed-off-by: Jakub Kicinski 
---
CC: pshe...@ovn.org
CC: d...@openvswitch.org
---
 net/openvswitch/meter.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h
index ed11cd12b512..8bbf983cd244 100644
--- a/net/openvswitch/meter.h
+++ b/net/openvswitch/meter.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v3 3/3] genetlink: remove linux/genetlink.h

2024-03-29 Thread Jakub Kicinski

genetlink.h is a shell of what used to be a combined uAPI
and kernel header over a decade ago. It has fewer than
10 lines of code. Merge it into net/genetlink.h.
In some ways it'd be better to keep the combined header
under linux/ but it would make looking through git history
harder.

Acked-by: Sven Eckelmann 
Signed-off-by: Jakub Kicinski 
---
v3:
 - fix double space in a comment
v2:
 - remove extern
 - include linux/net.h
 - improve the comment, not "all" requests are serialized

CC: ja...@zx2c4.com
CC: mareklind...@neomailbox.ch
CC: s...@simonwunderlich.de
CC: a...@unstable.cc
CC: pshe...@ovn.org
CC: andriy.shevche...@linux.intel.com
CC: wiregu...@lists.zx2c4.com
CC: d...@openvswitch.org
---
 drivers/net/wireguard/main.c  |  2 +-
 include/linux/genetlink.h | 14 --
 include/linux/genl_magic_struct.h |  2 +-
 include/net/genetlink.h   | 10 +-
 net/batman-adv/main.c |  2 +-
 net/batman-adv/netlink.c  |  1 -
 net/openvswitch/datapath.c|  1 -
 7 files changed, 12 insertions(+), 20 deletions(-)
 delete mode 100644 include/linux/genetlink.h

diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c
index ee4da9ab8013..a00671b58701 100644
--- a/drivers/net/wireguard/main.c
+++ b/drivers/net/wireguard/main.c
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 static int __init wg_mod_init(void)
diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h
deleted file mode 100644
index 9dbd7ba9b858..
--- a/include/linux/genetlink.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_GENERIC_NETLINK_H
-#define __LINUX_GENERIC_NETLINK_H
-
-#include 
-
-/* All generic netlink requests are serialized by a global lock.  */
-extern void genl_lock(void);
-extern void genl_unlock(void);
-
-#define MODULE_ALIAS_GENL_FAMILY(family)\
- MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
-
-#endif /* __LINUX_GENERIC_NETLINK_H */
diff --git a/include/linux/genl_magic_struct.h 
b/include/linux/genl_magic_struct.h
index a419d93789ff..621b87a87d74 100644
--- a/include/linux/genl_magic_struct.h
+++ b/include/linux/genl_magic_struct.h
@@ -15,8 +15,8 @@
 #endif
 
 #include 
-#include 
 #include 
+#include 
 
 extern int CONCATENATE(GENL_MAGIC_FAMILY, _genl_register)(void);
 extern void CONCATENATE(GENL_MAGIC_FAMILY, _genl_unregister)(void);
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 9ece6e5a3ea8..9ab49bfeae78 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -2,12 +2,20 @@
 #ifndef __NET_GENERIC_NETLINK_H
 #define __NET_GENERIC_NETLINK_H
 
-#include 
+#include 
 #include 
 #include 
+#include 
 
 #define GENLMSG_DEFAULT_SIZE (NLMSG_DEFAULT_SIZE - GENL_HDRLEN)
 
+/* Non-parallel generic netlink requests are serialized by a global lock. */
+void genl_lock(void);
+void genl_unlock(void);
+
+#define MODULE_ALIAS_GENL_FAMILY(family) \
+ MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
+
 /* Binding to multicast group requires %CAP_NET_ADMIN */
 #define GENL_MCAST_CAP_NET_ADMIN   BIT(0)
 /* Binding to multicast group requires %CAP_SYS_ADMIN */
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 75119f1ffccc..8e0f44c71696 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -38,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index 0954757f0b8b..9362cd9d6f3d 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 11c69415c605..99d72543abd3 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 3/3] genetlink: remove linux/genetlink.h

2024-03-25 Thread Jakub Kicinski

On Mon, 25 Mar 2024 19:45:05 +0200 Andy Shevchenko wrote:
> > +/* Non-parallel generic netlink requests are serialized by a global lock.  
> > */  
> 
> While at it, maybe drop extra space? (I noticed it was originally like this.

Fair, I'll post v3.

> > +#define MODULE_ALIAS_GENL_FAMILY(family) \
> > + MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" 
> > family)  
> 
> I would still make this TAB indented.

That one I'd prefer to keep. Coin toss between going over 80 chars
and having someone atypical indent.
-- 
pw-bot: cr
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v2 2/3] net: openvswitch: remove unnecessary linux/genetlink.h include

2024-03-25 Thread Jakub Kicinski

The only legit reason I could think of for net/genetlink.h
and linux/genetlink.h to be separate would be if one was
included by other headers and we wanted to keep it lightweight.
That is not the case, net/openvswitch/meter.h includes
linux/genetlink.h but for no apparent reason (for struct genl_family
perhaps? it's not necessary, types of externs do not need
to be known).

Signed-off-by: Jakub Kicinski 
---
CC: pshe...@ovn.org
CC: d...@openvswitch.org
---
 net/openvswitch/meter.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h
index ed11cd12b512..8bbf983cd244 100644
--- a/net/openvswitch/meter.h
+++ b/net/openvswitch/meter.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v2 3/3] genetlink: remove linux/genetlink.h

2024-03-25 Thread Jakub Kicinski

genetlink.h is a shell of what used to be a combined uAPI
and kernel header over a decade ago. It has fewer than
10 lines of code. Merge it into net/genetlink.h.
In some ways it'd be better to keep the combined header
under linux/ but it would make looking through git history
harder.

Acked-by: Sven Eckelmann 
Signed-off-by: Jakub Kicinski 
---
v2:
 - remove extern
 - include linux/net.h
 - improve the comment, not "all" requests are serialized

CC: ja...@zx2c4.com
CC: mareklind...@neomailbox.ch
CC: s...@simonwunderlich.de
CC: a...@unstable.cc
CC: pshe...@ovn.org
CC: andriy.shevche...@linux.intel.com
CC: wiregu...@lists.zx2c4.com
CC: d...@openvswitch.org
---
 drivers/net/wireguard/main.c  |  2 +-
 include/linux/genetlink.h | 14 --
 include/linux/genl_magic_struct.h |  2 +-
 include/net/genetlink.h   | 10 +-
 net/batman-adv/main.c |  2 +-
 net/batman-adv/netlink.c  |  1 -
 net/openvswitch/datapath.c|  1 -
 7 files changed, 12 insertions(+), 20 deletions(-)
 delete mode 100644 include/linux/genetlink.h

diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c
index ee4da9ab8013..a00671b58701 100644
--- a/drivers/net/wireguard/main.c
+++ b/drivers/net/wireguard/main.c
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 static int __init wg_mod_init(void)
diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h
deleted file mode 100644
index 9dbd7ba9b858..
--- a/include/linux/genetlink.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_GENERIC_NETLINK_H
-#define __LINUX_GENERIC_NETLINK_H
-
-#include 
-
-/* All generic netlink requests are serialized by a global lock.  */
-extern void genl_lock(void);
-extern void genl_unlock(void);
-
-#define MODULE_ALIAS_GENL_FAMILY(family)\
- MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
-
-#endif /* __LINUX_GENERIC_NETLINK_H */
diff --git a/include/linux/genl_magic_struct.h 
b/include/linux/genl_magic_struct.h
index a419d93789ff..621b87a87d74 100644
--- a/include/linux/genl_magic_struct.h
+++ b/include/linux/genl_magic_struct.h
@@ -15,8 +15,8 @@
 #endif
 
 #include 
-#include 
 #include 
+#include 
 
 extern int CONCATENATE(GENL_MAGIC_FAMILY, _genl_register)(void);
 extern void CONCATENATE(GENL_MAGIC_FAMILY, _genl_unregister)(void);
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 9ece6e5a3ea8..7648dd6b8754 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -2,12 +2,20 @@
 #ifndef __NET_GENERIC_NETLINK_H
 #define __NET_GENERIC_NETLINK_H
 
-#include 
+#include 
 #include 
 #include 
+#include 
 
 #define GENLMSG_DEFAULT_SIZE (NLMSG_DEFAULT_SIZE - GENL_HDRLEN)
 
+/* Non-parallel generic netlink requests are serialized by a global lock.  */
+void genl_lock(void);
+void genl_unlock(void);
+
+#define MODULE_ALIAS_GENL_FAMILY(family) \
+ MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
+
 /* Binding to multicast group requires %CAP_NET_ADMIN */
 #define GENL_MCAST_CAP_NET_ADMIN   BIT(0)
 /* Binding to multicast group requires %CAP_SYS_ADMIN */
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 75119f1ffccc..8e0f44c71696 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -38,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index 0954757f0b8b..9362cd9d6f3d 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 11c69415c605..99d72543abd3 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next 3/3] genetlink: remove linux/genetlink.h

2024-03-09 Thread Jakub Kicinski

genetlink.h is a shell of what used to be a combined uAPI
and kernel header over a decade ago. It has fewer than
10 lines of code. Merge it into net/genetlink.h.
In some ways it'd be better to keep the combined header
under linux/ but it would make looking through git history
harder.

Signed-off-by: Jakub Kicinski 
---
CC: ja...@zx2c4.com
CC: mareklind...@neomailbox.ch
CC: s...@simonwunderlich.de
CC: a...@unstable.cc
CC: s...@narfation.org
CC: pshe...@ovn.org
CC: andriy.shevche...@linux.intel.com
CC: wiregu...@lists.zx2c4.com
CC: d...@openvswitch.org
---
 drivers/net/wireguard/main.c  |  2 +-
 include/linux/genetlink.h | 14 --
 include/linux/genl_magic_struct.h |  2 +-
 include/net/genetlink.h   |  9 -
 net/batman-adv/main.c |  2 +-
 net/batman-adv/netlink.c  |  1 -
 net/openvswitch/datapath.c|  1 -
 7 files changed, 11 insertions(+), 20 deletions(-)
 delete mode 100644 include/linux/genetlink.h

diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c
index ee4da9ab8013..a00671b58701 100644
--- a/drivers/net/wireguard/main.c
+++ b/drivers/net/wireguard/main.c
@@ -14,7 +14,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 static int __init wg_mod_init(void)
diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h
deleted file mode 100644
index 9dbd7ba9b858..
--- a/include/linux/genetlink.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_GENERIC_NETLINK_H
-#define __LINUX_GENERIC_NETLINK_H
-
-#include 
-
-/* All generic netlink requests are serialized by a global lock.  */
-extern void genl_lock(void);
-extern void genl_unlock(void);
-
-#define MODULE_ALIAS_GENL_FAMILY(family)\
- MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
-
-#endif /* __LINUX_GENERIC_NETLINK_H */
diff --git a/include/linux/genl_magic_struct.h 
b/include/linux/genl_magic_struct.h
index a419d93789ff..621b87a87d74 100644
--- a/include/linux/genl_magic_struct.h
+++ b/include/linux/genl_magic_struct.h
@@ -15,8 +15,8 @@
 #endif
 
 #include 
-#include 
 #include 
+#include 
 
 extern int CONCATENATE(GENL_MAGIC_FAMILY, _genl_register)(void);
 extern void CONCATENATE(GENL_MAGIC_FAMILY, _genl_unregister)(void);
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 9ece6e5a3ea8..090aa3e36ce3 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -2,12 +2,19 @@
 #ifndef __NET_GENERIC_NETLINK_H
 #define __NET_GENERIC_NETLINK_H
 
-#include 
+#include 
 #include 
 #include 
 
 #define GENLMSG_DEFAULT_SIZE (NLMSG_DEFAULT_SIZE - GENL_HDRLEN)
 
+/* All generic netlink requests are serialized by a global lock.  */
+extern void genl_lock(void);
+extern void genl_unlock(void);
+
+#define MODULE_ALIAS_GENL_FAMILY(family) \
+ MODULE_ALIAS_NET_PF_PROTO_NAME(PF_NETLINK, NETLINK_GENERIC, "-family-" family)
+
 /* Binding to multicast group requires %CAP_NET_ADMIN */
 #define GENL_MCAST_CAP_NET_ADMIN   BIT(0)
 /* Binding to multicast group requires %CAP_SYS_ADMIN */
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 75119f1ffccc..8e0f44c71696 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -38,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index 0954757f0b8b..9362cd9d6f3d 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 11c69415c605..99d72543abd3 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next 2/3] net: openvswitch: remove unnecessary linux/genetlink.h include

2024-03-09 Thread Jakub Kicinski

The only legit reason I could think of for net/genetlink.h
and linux/genetlink.h to be separate would be if one was
included by other headers and we wanted to keep it lightweight.
That is not the case, net/openvswitch/meter.h includes
linux/genetlink.h but for no apparent reason (for struct genl_family
perhaps? it's not necessary, types of externs do not need
to be known).

Signed-off-by: Jakub Kicinski 
---
CC: pshe...@ovn.org
CC: d...@openvswitch.org
---
 net/openvswitch/meter.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h
index ed11cd12b512..8bbf983cd244 100644
--- a/net/openvswitch/meter.h
+++ b/net/openvswitch/meter.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC 0/7] selftests: openvswitch: cleanups for running as selftests

2024-02-19 Thread Jakub Kicinski

On Fri, 16 Feb 2024 10:28:39 -0500 Aaron Conole wrote:
> The series is a host of cleanups to the openvswitch selftest suite
> which should be ready to run under the netdev selftest runners using
> vng.  For now, the testing has been done with RW directories, but
> additional testing will be done to try and keep it all as RO to be
> more friendly.

Would it be an option to make the output go into a dir in /tmp/ 
instead of in place, in the tree?

  mktemp -p /tmp/ -d ovs-test.X

or such?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 00/14] Batch 1: Annotate structs with __counted_by

2023-10-02 Thread Jakub Kicinski

On Wed, 27 Sep 2023 08:57:36 -0700 Kees Cook wrote:
> > Since the element count member must be set before accessing the annotated
> > flexible array member, some patches also move the member's initialization
> > earlier. (These are noted in the individual patches.)  
> 
> Hi, just checking on this batch of changes. Is it possible to take the
> 1-13 subset:

On it, sorry for the delay.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: reject negative ifindex

2023-08-15 Thread Jakub Kicinski

On Tue, 15 Aug 2023 08:41:50 -0400 Aaron Conole wrote:
> > Validate the inputes. Now the second command correctly returns:  
> 
> s/inputes/inputs/

Thanks, fixed when applying

> > $ ./cli.py --spec netlink/specs/ovs_vport.yaml \
> >  --do new \
> >  --json '{"upcall-pid": "0001", "name": "some-port0", 
> > "dp-ifindex":3,"ifindex":4294901760,"type":2}'
> >
> > lib.ynl.NlError: Netlink error: Numerical result out of range
> > nl_len = 108 (92) nl_flags = 0x300 nl_type = 2
> > error: -34  extack: {'msg': 'integer out of range', 'unknown': 
> > [[type:4 len:36] 
> > b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'],
> >  'bad-attr': '.ifindex'}
> >
> > Accept 0 since it used to be silently ignored.
> >
> > Fixes: 54c4ef34c4b6 ("openvswitch: allow specifying ifindex of new 
> > interfaces")
> > Reported-by: syzbot+7456b5dcf65111553...@syzkaller.appspotmail.com
> > Signed-off-by: Jakub Kicinski 
> > ---
> > CC: pshe...@ovn.org
> > CC: andrey.zhadche...@virtuozzo.com
> > CC: brau...@kernel.org
> > CC: d...@openvswitch.org
> > ---  
> 
> Thanks for the quick follow up.  I accidentally broke my system trying
> to setup to reproduce the syzbot splat.

Ah. Syzbot pointed at my commit so I thought others will just think
"not my bug" :)

> The attribute here isn't used by the ovs-vswitchd, so probably why we
> never caught an issue before.  I'll think about how to improve the
> fuzzing on the ovs module.  At the very least, maybe we can have some
> additional checks in the netlink selftest.

Speaking of fuzzing - reaching out to Dmitry crossed my mind.
When the first netlink specs got merged we briefly discussed
using them to guide syzbot a little. But then I thought - syzbot did
find this fairly quickly, it's more that previously we apparently had
no warning or crash for negative ifindex so there was no target to hit.

> I noticed that since I copied the definitions when building
> ovs-dpctl.py, I have the same kind of mistake there (using unsigned for
> ifindex).  I can submit a follow up to correct that definition.  Also,
> we might consider correcting the yaml.

FWIW I left the nla_put_u32() when outputting ifindex in the kernel as
well. I needed s32 for the range because min and max are 16 bit (to
conserve space in the policy) so I could not express the positive limit
on u32. Whether ifindex is u32 or s32 is a bit of a philosophical
question to me, as it only takes positive 31b values...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v3 03/10] genetlink: remove userhdr from struct genl_info

2023-08-14 Thread Jakub Kicinski

Only three families use info->userhdr today and going forward
we discourage using fixed headers in new families.
So having the pointer to user header in struct genl_info
is an overkill. Compute the header pointer at runtime.

Reviewed-by: Johannes Berg 
Reviewed-by: Jiri Pirko 
Signed-off-by: Jakub Kicinski 
---
CC: philipp.reis...@linbit.com
CC: lars.ellenb...@linbit.com
CC: christoph.boehmwal...@linbit.com
CC: ax...@kernel.dk
CC: pshe...@ovn.org
CC: jma...@redhat.com
CC: ying@windriver.com
CC: jacob.e.kel...@intel.com
CC: drbd-...@lists.linbit.com
CC: linux-bl...@vger.kernel.org
CC: d...@openvswitch.org
CC: tipc-discuss...@lists.sourceforge.net
---
 drivers/block/drbd/drbd_nl.c |  9 +
 include/net/genetlink.h  |  7 +--
 net/netlink/genetlink.c  |  1 -
 net/openvswitch/conntrack.c  |  2 +-
 net/openvswitch/datapath.c   | 29 -
 net/openvswitch/meter.c  | 10 +-
 net/tipc/netlink_compat.c|  2 +-
 7 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index cddae6f4b00f..d3538bd83fb3 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -159,7 +159,7 @@ static int drbd_msg_sprintf_info(struct sk_buff *skb, const 
char *fmt, ...)
 static int drbd_adm_prepare(struct drbd_config_context *adm_ctx,
struct sk_buff *skb, struct genl_info *info, unsigned flags)
 {
-   struct drbd_genlmsghdr *d_in = info->userhdr;
+   struct drbd_genlmsghdr *d_in = genl_info_userhdr(info);
const u8 cmd = info->genlhdr->cmd;
int err;
 
@@ -1396,8 +1396,9 @@ static void drbd_suspend_al(struct drbd_device *device)
 
 static bool should_set_defaults(struct genl_info *info)
 {
-   unsigned flags = ((struct drbd_genlmsghdr*)info->userhdr)->flags;
-   return 0 != (flags & DRBD_GENL_F_SET_DEFAULTS);
+   struct drbd_genlmsghdr *dh = genl_info_userhdr(info);
+
+   return 0 != (dh->flags & DRBD_GENL_F_SET_DEFAULTS);
 }
 
 static unsigned int drbd_al_extents_max(struct drbd_backing_dev *bdev)
@@ -4276,7 +4277,7 @@ static void device_to_info(struct device_info *info,
 int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)
 {
struct drbd_config_context adm_ctx;
-   struct drbd_genlmsghdr *dh = info->userhdr;
+   struct drbd_genlmsghdr *dh = genl_info_userhdr(info);
enum drbd_ret_code retcode;
 
retcode = drbd_adm_prepare(_ctx, skb, info, DRBD_ADM_NEED_RESOURCE);
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 0366d0925596..9dc21ec15734 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -95,7 +95,6 @@ struct genl_family {
  * @snd_portid: netlink portid of sender
  * @nlhdr: netlink message header
  * @genlhdr: generic netlink message header
- * @userhdr: user specific header
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
@@ -106,7 +105,6 @@ struct genl_info {
u32 snd_portid;
const struct nlmsghdr * nlhdr;
struct genlmsghdr * genlhdr;
-   void *  userhdr;
struct nlattr **attrs;
possible_net_t  _net;
void *  user_ptr[2];
@@ -123,6 +121,11 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
write_pnet(>_net, net);
 }
 
+static inline void *genl_info_userhdr(const struct genl_info *info)
+{
+   return (u8 *)info->genlhdr + GENL_HDRLEN;
+}
+
 #define GENL_SET_ERR_MSG(info, msg) NL_SET_ERR_MSG((info)->extack, msg)
 
 #define GENL_SET_ERR_MSG_FMT(info, msg, args...) \
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 0d4285688ab9..f98f730bb245 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -943,7 +943,6 @@ static int genl_family_rcv_msg_doit(const struct 
genl_family *family,
info.snd_portid = NETLINK_CB(skb).portid;
info.nlhdr = nlh;
info.genlhdr = nlmsg_data(nlh);
-   info.userhdr = nlmsg_data(nlh) + GENL_HDRLEN;
info.attrs = attrbuf;
info.extack = extack;
genl_info_net_set(, net);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 0cfa1e9482e6..0b9a785dea45 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1605,7 +1605,7 @@ static struct sk_buff *
 ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
 struct ovs_header **ovs_reply_header)
 {
-   struct ovs_header *ovs_header = info->userhdr;
+   struct ovs_header *ovs_header = genl_info_userhdr(info);
struct sk_buff *skb;
 
skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d33cb739883f..0a974eeef76e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -590,7

[ovs-dev] [PATCH net] net: openvswitch: reject negative ifindex

2023-08-14 Thread Jakub Kicinski

Recent changes in net-next (commit 759ab1edb56c ("net: store netdevs
in an xarray")) refactored the handling of pre-assigned ifindexes
and let syzbot surface a latent problem in ovs. ovs does not validate
ifindex, making it possible to create netdev ports with negative
ifindex values. It's easy to repro with YNL:

$ ./cli.py --spec netlink/specs/ovs_datapath.yaml \
 --do new \
 --json '{"upcall-pid": 1, "name":"my-dp"}'
$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
 --do new \
 --json '{"upcall-pid": "0001", "name": "some-port0", 
"dp-ifindex":3,"ifindex":4294901760,"type":2}'

$ ip link show
-65536: some-port0:  mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff
...

Validate the inputes. Now the second command correctly returns:

$ ./cli.py --spec netlink/specs/ovs_vport.yaml \
 --do new \
 --json '{"upcall-pid": "0001", "name": "some-port0", 
"dp-ifindex":3,"ifindex":4294901760,"type":2}'

lib.ynl.NlError: Netlink error: Numerical result out of range
nl_len = 108 (92) nl_flags = 0x300 nl_type = 2
error: -34  extack: {'msg': 'integer out of range', 'unknown': 
[[type:4 len:36] 
b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'],
 'bad-attr': '.ifindex'}

Accept 0 since it used to be silently ignored.

Fixes: 54c4ef34c4b6 ("openvswitch: allow specifying ifindex of new interfaces")
Reported-by: syzbot+7456b5dcf65111553...@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski 
---
CC: pshe...@ovn.org
CC: andrey.zhadche...@virtuozzo.com
CC: brau...@kernel.org
CC: d...@openvswitch.org
---
 net/openvswitch/datapath.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a6d2a0b1aa21..3d7a91e64c88 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1829,7 +1829,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
parms.port_no = OVSP_LOCAL;
parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
parms.desired_ifindex = a[OVS_DP_ATTR_IFINDEX]
-   ? nla_get_u32(a[OVS_DP_ATTR_IFINDEX]) : 0;
+   ? nla_get_s32(a[OVS_DP_ATTR_IFINDEX]) : 0;
 
/* So far only local changes have been made, now need the lock. */
ovs_lock();
@@ -2049,7 +2049,7 @@ static const struct nla_policy 
datapath_policy[OVS_DP_ATTR_MAX + 1] = {
[OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
[OVS_DP_ATTR_MASKS_CACHE_SIZE] =  NLA_POLICY_RANGE(NLA_U32, 0,
PCPU_MIN_UNIT_SIZE / sizeof(struct mask_cache_entry)),
-   [OVS_DP_ATTR_IFINDEX] = {.type = NLA_U32 },
+   [OVS_DP_ATTR_IFINDEX] = NLA_POLICY_MIN(NLA_S32, 0),
 };
 
 static const struct genl_small_ops dp_datapath_genl_ops[] = {
@@ -2302,7 +2302,7 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
parms.port_no = port_no;
parms.upcall_portids = a[OVS_VPORT_ATTR_UPCALL_PID];
parms.desired_ifindex = a[OVS_VPORT_ATTR_IFINDEX]
-   ? nla_get_u32(a[OVS_VPORT_ATTR_IFINDEX]) : 0;
+   ? nla_get_s32(a[OVS_VPORT_ATTR_IFINDEX]) : 0;
 
vport = new_vport();
err = PTR_ERR(vport);
@@ -2539,7 +2539,7 @@ static const struct nla_policy 
vport_policy[OVS_VPORT_ATTR_MAX + 1] = {
[OVS_VPORT_ATTR_TYPE] = { .type = NLA_U32 },
[OVS_VPORT_ATTR_UPCALL_PID] = { .type = NLA_UNSPEC },
[OVS_VPORT_ATTR_OPTIONS] = { .type = NLA_NESTED },
-   [OVS_VPORT_ATTR_IFINDEX] = { .type = NLA_U32 },
+   [OVS_VPORT_ATTR_IFINDEX] = NLA_POLICY_MIN(NLA_S32, 0),
[OVS_VPORT_ATTR_NETNSID] = { .type = NLA_S32 },
[OVS_VPORT_ATTR_UPCALL_STATS] = { .type = NLA_NESTED },
 };
-- 
2.41.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [net-next v4 1/7] net: openvswitch: add last-action drop reason

2023-08-10 Thread Jakub Kicinski

On Thu, 10 Aug 2023 14:13:29 -0400 Aaron Conole wrote:
> I think they can be resolved in the same way the mac80211 drops are
> resolved by using (__force u32) to pass the reason argument.

Yup, preferably by creating a helper which takes enum ovs_drop_reason
and does the forcing, rather than annotating each call site, to state
the obvious perhaps.
-- 
pw-bot: cr
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next v2 03/10] genetlink: remove userhdr from struct genl_info

2023-08-10 Thread Jakub Kicinski

Only three families use info->userhdr today and going forward
we discourage using fixed headers in new families.
So having the pointer to user header in struct genl_info
is an overkill. Compute the header pointer at runtime.

Reviewed-by: Johannes Berg 
Reviewed-by: Jiri Pirko 
Signed-off-by: Jakub Kicinski 
---
CC: philipp.reis...@linbit.com
CC: lars.ellenb...@linbit.com
CC: christoph.boehmwal...@linbit.com
CC: ax...@kernel.dk
CC: pshe...@ovn.org
CC: jma...@redhat.com
CC: ying@windriver.com
CC: jacob.e.kel...@intel.com
CC: drbd-...@lists.linbit.com
CC: linux-bl...@vger.kernel.org
CC: d...@openvswitch.org
CC: tipc-discuss...@lists.sourceforge.net
---
 drivers/block/drbd/drbd_nl.c |  9 +
 include/net/genetlink.h  |  7 +--
 net/netlink/genetlink.c  |  1 -
 net/openvswitch/conntrack.c  |  2 +-
 net/openvswitch/datapath.c   | 29 -
 net/openvswitch/meter.c  | 10 +-
 net/tipc/netlink_compat.c|  2 +-
 7 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index cddae6f4b00f..d3538bd83fb3 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -159,7 +159,7 @@ static int drbd_msg_sprintf_info(struct sk_buff *skb, const 
char *fmt, ...)
 static int drbd_adm_prepare(struct drbd_config_context *adm_ctx,
struct sk_buff *skb, struct genl_info *info, unsigned flags)
 {
-   struct drbd_genlmsghdr *d_in = info->userhdr;
+   struct drbd_genlmsghdr *d_in = genl_info_userhdr(info);
const u8 cmd = info->genlhdr->cmd;
int err;
 
@@ -1396,8 +1396,9 @@ static void drbd_suspend_al(struct drbd_device *device)
 
 static bool should_set_defaults(struct genl_info *info)
 {
-   unsigned flags = ((struct drbd_genlmsghdr*)info->userhdr)->flags;
-   return 0 != (flags & DRBD_GENL_F_SET_DEFAULTS);
+   struct drbd_genlmsghdr *dh = genl_info_userhdr(info);
+
+   return 0 != (dh->flags & DRBD_GENL_F_SET_DEFAULTS);
 }
 
 static unsigned int drbd_al_extents_max(struct drbd_backing_dev *bdev)
@@ -4276,7 +4277,7 @@ static void device_to_info(struct device_info *info,
 int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)
 {
struct drbd_config_context adm_ctx;
-   struct drbd_genlmsghdr *dh = info->userhdr;
+   struct drbd_genlmsghdr *dh = genl_info_userhdr(info);
enum drbd_ret_code retcode;
 
retcode = drbd_adm_prepare(_ctx, skb, info, DRBD_ADM_NEED_RESOURCE);
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 0366d0925596..9dc21ec15734 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -95,7 +95,6 @@ struct genl_family {
  * @snd_portid: netlink portid of sender
  * @nlhdr: netlink message header
  * @genlhdr: generic netlink message header
- * @userhdr: user specific header
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
@@ -106,7 +105,6 @@ struct genl_info {
u32 snd_portid;
const struct nlmsghdr * nlhdr;
struct genlmsghdr * genlhdr;
-   void *  userhdr;
struct nlattr **attrs;
possible_net_t  _net;
void *  user_ptr[2];
@@ -123,6 +121,11 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
write_pnet(>_net, net);
 }
 
+static inline void *genl_info_userhdr(const struct genl_info *info)
+{
+   return (u8 *)info->genlhdr + GENL_HDRLEN;
+}
+
 #define GENL_SET_ERR_MSG(info, msg) NL_SET_ERR_MSG((info)->extack, msg)
 
 #define GENL_SET_ERR_MSG_FMT(info, msg, args...) \
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 0d4285688ab9..f98f730bb245 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -943,7 +943,6 @@ static int genl_family_rcv_msg_doit(const struct 
genl_family *family,
info.snd_portid = NETLINK_CB(skb).portid;
info.nlhdr = nlh;
info.genlhdr = nlmsg_data(nlh);
-   info.userhdr = nlmsg_data(nlh) + GENL_HDRLEN;
info.attrs = attrbuf;
info.extack = extack;
genl_info_net_set(, net);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index fa955e892210..03a6220b7d99 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1604,7 +1604,7 @@ static struct sk_buff *
 ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
 struct ovs_header **ovs_reply_header)
 {
-   struct ovs_header *ovs_header = info->userhdr;
+   struct ovs_header *ovs_header = genl_info_userhdr(info);
struct sk_buff *skb;
 
skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a6d2a0b1aa21..80f5f755b0e6 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -589,7

Re: [ovs-dev] [PATCH net-next 03/10] genetlink: remove userhdr from struct genl_info

2023-08-09 Thread Jakub Kicinski

On Wed, 09 Aug 2023 22:59:47 +0200 Johannes Berg wrote:
> On Wed, 2023-08-09 at 11:26 -0700, Jakub Kicinski wrote:
> > Only three families use info->userhdr and fixed headers
> > are discouraged for new families. So remove the pointer
> > from struct genl_info to save some space. Compute
> > the header pointer at runtime. Saved space will be used
> > for a family pointer in later patches.  
> 
> Seems fine to me, but I'm not sure I buy the rationale that it's for
> saving space - it's a single pointer on the stack? I'd probably argue
> the computation being pointless for basically everyone except for a
> handful users?

Fair, I'll update all the commit messages.

> Reviewed-by: Johannes Berg 

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next 03/10] genetlink: remove userhdr from struct genl_info

2023-08-09 Thread Jakub Kicinski

Only three families use info->userhdr and fixed headers
are discouraged for new families. So remove the pointer
from struct genl_info to save some space. Compute
the header pointer at runtime. Saved space will be used
for a family pointer in later patches.

Signed-off-by: Jakub Kicinski 
---
CC: philipp.reis...@linbit.com
CC: lars.ellenb...@linbit.com
CC: christoph.boehmwal...@linbit.com
CC: ax...@kernel.dk
CC: pshe...@ovn.org
CC: jma...@redhat.com
CC: ying@windriver.com
CC: jacob.e.kel...@intel.com
CC: drbd-...@lists.linbit.com
CC: linux-bl...@vger.kernel.org
CC: d...@openvswitch.org
CC: tipc-discuss...@lists.sourceforge.net
---
 drivers/block/drbd/drbd_nl.c |  9 +
 include/net/genetlink.h  |  7 +--
 net/netlink/genetlink.c  |  1 -
 net/openvswitch/conntrack.c  |  2 +-
 net/openvswitch/datapath.c   | 29 -
 net/openvswitch/meter.c  | 10 +-
 net/tipc/netlink_compat.c|  2 +-
 7 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index cddae6f4b00f..d3538bd83fb3 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -159,7 +159,7 @@ static int drbd_msg_sprintf_info(struct sk_buff *skb, const 
char *fmt, ...)
 static int drbd_adm_prepare(struct drbd_config_context *adm_ctx,
struct sk_buff *skb, struct genl_info *info, unsigned flags)
 {
-   struct drbd_genlmsghdr *d_in = info->userhdr;
+   struct drbd_genlmsghdr *d_in = genl_info_userhdr(info);
const u8 cmd = info->genlhdr->cmd;
int err;
 
@@ -1396,8 +1396,9 @@ static void drbd_suspend_al(struct drbd_device *device)
 
 static bool should_set_defaults(struct genl_info *info)
 {
-   unsigned flags = ((struct drbd_genlmsghdr*)info->userhdr)->flags;
-   return 0 != (flags & DRBD_GENL_F_SET_DEFAULTS);
+   struct drbd_genlmsghdr *dh = genl_info_userhdr(info);
+
+   return 0 != (dh->flags & DRBD_GENL_F_SET_DEFAULTS);
 }
 
 static unsigned int drbd_al_extents_max(struct drbd_backing_dev *bdev)
@@ -4276,7 +4277,7 @@ static void device_to_info(struct device_info *info,
 int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)
 {
struct drbd_config_context adm_ctx;
-   struct drbd_genlmsghdr *dh = info->userhdr;
+   struct drbd_genlmsghdr *dh = genl_info_userhdr(info);
enum drbd_ret_code retcode;
 
retcode = drbd_adm_prepare(_ctx, skb, info, DRBD_ADM_NEED_RESOURCE);
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 0366d0925596..9dc21ec15734 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -95,7 +95,6 @@ struct genl_family {
  * @snd_portid: netlink portid of sender
  * @nlhdr: netlink message header
  * @genlhdr: generic netlink message header
- * @userhdr: user specific header
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
@@ -106,7 +105,6 @@ struct genl_info {
u32 snd_portid;
const struct nlmsghdr * nlhdr;
struct genlmsghdr * genlhdr;
-   void *  userhdr;
struct nlattr **attrs;
possible_net_t  _net;
void *  user_ptr[2];
@@ -123,6 +121,11 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
write_pnet(>_net, net);
 }
 
+static inline void *genl_info_userhdr(const struct genl_info *info)
+{
+   return (u8 *)info->genlhdr + GENL_HDRLEN;
+}
+
 #define GENL_SET_ERR_MSG(info, msg) NL_SET_ERR_MSG((info)->extack, msg)
 
 #define GENL_SET_ERR_MSG_FMT(info, msg, args...) \
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 0d4285688ab9..f98f730bb245 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -943,7 +943,6 @@ static int genl_family_rcv_msg_doit(const struct 
genl_family *family,
info.snd_portid = NETLINK_CB(skb).portid;
info.nlhdr = nlh;
info.genlhdr = nlmsg_data(nlh);
-   info.userhdr = nlmsg_data(nlh) + GENL_HDRLEN;
info.attrs = attrbuf;
info.extack = extack;
genl_info_net_set(, net);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index fa955e892210..03a6220b7d99 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1604,7 +1604,7 @@ static struct sk_buff *
 ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
 struct ovs_header **ovs_reply_header)
 {
-   struct ovs_header *ovs_header = info->userhdr;
+   struct ovs_header *ovs_header = genl_info_userhdr(info);
struct sk_buff *skb;
 
skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index a6d2a0b1aa21..80f5f755b0e6 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -589,7 +589,7 @@ static in

Re: [ovs-dev] [PATCH net-next 0/3] net: handle the exp removal problem with ovs upcall properly

2023-07-18 Thread Jakub Kicinski

On Sun, 16 Jul 2023 17:09:16 -0400 Xin Long wrote:
> With the OVS upcall, the original ct in the skb will be dropped, and when
> the skb comes back from userspace it has to create a new ct again through
> nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
> 
> However, the new ct will not be able to have the exp as the original ct
> has taken it away from the hash table in nf_ct_find_expectation(). This
> will cause some flow never to be matched, like:
> 
>   'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)'
>   'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
> 
> if the 2nd flow triggers the OVS upcall, the 3rd flow will never get
> matched.
> 
> OVS conntrack works around this by adding its own exp lookup function to
> not remove the exp from the hash table and saving the exp and its master
> info to the flow keys instead of create a real ct. But this way doesn't
> work for TC act_ct.
> 
> The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from
> the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This
> allows both OVS conntrack and TC act_ct to have a simple and clear fix
> for this problem in the patch 2/3 and 3/3.

Florian, Pablo, any opinion on these? 
Would you prefer to take them via netfilter?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Jakub Kicinski

On Mon, 10 Jul 2023 20:39:11 +0200 Ilya Maximets wrote:
> > As far as I understand what you're proposing, yes :)  
> 
> OK.  Just to spell it all out:
> 
> Userspace will install a flow with an OVS_FLOW_CMD_NEW:
> 
>   match:ip,tcp,... actions:something,something,drop(0)
>   match:ip,udp,... actions:something,something,drop(42)
> 
> drop() here represents the OVS_ACTION_ATTR_DROP.
> 
> Then, in net/openvswitch/actions.c:do_execute_actions(), while executing
> these actions:
> 
>   case OVS_ACTION_ATTR_DROP:
>   kfree_skb_reason(skb, nla_get_u32(a) ? OVS_DROP_ACTION_WITH_ERROR
>: OVS_DROP_ACTION);
> 
> Users can enable traces and catch the OVS_DROP_ACTION_WITH_ERROR.
> Later they can dump flows with OVS_FLOW_CMD_GET and see that the
> error value was 42.

nod

> >> Eric, Adrian, Aaron, do you see any problems with such implementation?
> >>
> >> P.S. There is a plan to add more drop reasons for other places in 
> >> openvswitch
> >>  module to catch more regular types of drops like memory issues or 
> >> upcall
> >>  failures.  So, the drop reason subsystem can be extended later.
> >>  The explicit drop action is a bit of an odd case here.  
> > 
> > If you have more than ~4 OvS specific reasons, I wonder if it still
> > makes sense to create a reason group/subsystem for OvS (a'la WiFi)?  
> 
> I believe, we will easily have more than 4 OVS-specific reasons.  A few
> from the top of my head:
>   - upcall failure (failed to send a packet to userspace)
>   - reached the limit for deferred actions
>   - reached the recursion limit
> 
> So, creation of a reason group/subsystem seems reasonable to me.

SG.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-10 Thread Jakub Kicinski

On Mon, 10 Jul 2023 18:51:19 +0200 Ilya Maximets wrote:
> Makes sense.  I wasn't sure that's a good solution from a kernel perspective
> either.  It's better than defining all these reasons, IMO, but it's not good
> enough to be considered acceptable, I agree.
> 
> How about we define just 2 reasons, e.g. OVS_DROP_REASON_EXPLICIT_ACTION and
> OVS_DROP_REASON_EXPLICIT_ACTION_WITH_ERROR (exact names can be different) ?
> One for an explicit drop action with a zero argument and one for an explicit
> drop with non-zero argument.
> 
> The exact reason for the error can be retrieved by other means, i.e by looking
> at the datapath flow dump or OVS logs/traces.
> 
> This way we can give a user who is catching packet drop traces a signal that
> there was something wrong with an OVS flow and they can look up exact details
> from the userspace / flow dump.
> 
> The point being, most of the flows will have a zero as a drop action argument,
> i.e. a regular explicit packet drop.  It will be hard to figure out which flow
> exactly we're hitting without looking at the full flow dump.  And if the value
> is non-zero, then it should be immediately obvious which flow is to blame from
> the dump, as we should not have a lot of such flows.
> 
> This would still allow us to avoid a maintenance burden of defining every 
> case,
> which are fairly meaningless for the kernel itself, while having 99% of the
> information we may need.
> 
> Jakub, do you think this will be acceptable?

As far as I understand what you're proposing, yes :)

> Eric, Adrian, Aaron, do you see any problems with such implementation?
> 
> P.S. There is a plan to add more drop reasons for other places in openvswitch
>  module to catch more regular types of drops like memory issues or upcall
>  failures.  So, the drop reason subsystem can be extended later.
>  The explicit drop action is a bit of an odd case here.

If you have more than ~4 OvS specific reasons, I wonder if it still
makes sense to create a reason group/subsystem for OvS (a'la WiFi)?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-07 Thread Jakub Kicinski

On Fri, 7 Jul 2023 18:04:36 +0200 Ilya Maximets wrote:
> >> That already exists, right? Johannes added it in the last release for 
> >> WiFi.  
> > 
> > I'm not sure.  The SKB_DROP_REASON_SUBSYS_MAC80211_UNUSABLE behaves 
> > similarly
> > to that on a surface.  However, looking closer, any value that can be passed
> > into ieee80211_rx_handlers_result() and ends up in the kfree_skb_reason() is
> > kind of defined in net/mac80211/drop.h, unless I'm missing something (very
> > possible, because I don't really know wifi code).
> > 
> > The difference, I guess, is that for openvswitch values will be provided by
> > the userpsace application via netlink interface.  It'll be just a number not
> > defined anywhere in the kernel.  Only the subsystem itself will be defined
> > in order to occupy the range.  Garbage in, same garbage out, from the 
> > kernel's
> > perspective.  
> 
> To be clear, I think, not defining them in this particular case is better.
> Definition of every reason that userspace can come up with will add extra
> uAPI maintenance cost/issues with no practical benefits.  Values are not
> going to be used for anything outside reporting a drop reason and subsystem
> offset is not part of uAPI anyway.

Ah, I see. No, please don't stuff user space defined values into 
the drop reason. The reasons are for debugging the kernel stack 
itself. IOW it'd be abuse not reuse.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: add drop action

2023-07-07 Thread Jakub Kicinski

On Fri, 7 Jul 2023 12:30:38 +0200 Ilya Maximets wrote:
> A wild idea:  How about we do not define actual reasons?  i.e. define a
> subsystem and just call kfree_skb_reason(skb, SUBSYSTEM | value), where
> 'value' is whatever userspace gives as long as it is within a subsystem
> range?

That already exists, right? Johannes added it in the last release for WiFi.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v3] net: openvswitch: fix race on port output

2023-04-06 Thread Jakub Kicinski

On Wed, 5 Apr 2023 07:53:41 + Felix Huettner wrote:
> assume the following setup on a single machine:
> 1. An openvswitch instance with one bridge and default flows
> 2. two network namespaces "server" and "client"
> 3. two ovs interfaces "server" and "client" on the bridge
> 4. for each ovs interface a veth pair with a matching name and 32 rx and
>tx queues
> 5. move the ends of the veth pairs to the respective network namespaces
> 6. assign ip addresses to each of the veth ends in the namespaces (needs
>to be the same subnet)
> 7. start some http server on the server network namespace
> 8. test if a client in the client namespace can reach the http server

Hi Simon, looks good?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] net: openvswitch: fix race on port output

2023-04-03 Thread Jakub Kicinski

On Mon, 3 Apr 2023 11:18:46 + Felix Hüttner wrote:
> On Sat, 1 Apr 2023 6:25:00 +0000 Jakub Kicinski wrote:
> > On Fri, 31 Mar 2023 06:25:13 + Felix Hüttner wrote:  
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 253584777101..6628323b7bea 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -3199,6 +3199,7 @@ static u16 skb_tx_hash(const struct net_device *dev,
> > > }
> > >
> > > if (skb_rx_queue_recorded(skb)) {
> > > +   BUG_ON(unlikely(qcount == 0));  
> >
> > DEBUG_NET_WARN_ON()
> >  
> 
> However if this condition triggers we will be permanently stuck in the loop 
> below.
> From my understading this also means that future calls to `synchronize_net` 
> will never finish (as the packet never finishes processing).
> So the user will quite probably need to restart his system.
> I find DEBUG_NET_WARN_ON_ONCE to offer too little visiblity as 
> CONFIG_DEBUG_NET is not necessarily enabled per default.
> I as the user would see it as helpful to have this information available 
> without additional config flags.
> I would propose to use WARN_ON_ONCE

skb_tx_hash() may get called a lot, we shouldn't slow it down on
production systems just to catch buggy drivers, IMO.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] net: openvswitch: fix race on port output

2023-03-31 Thread Jakub Kicinski

On Fri, 31 Mar 2023 08:19:09 + Felix Huettner wrote:
> 1. An openvswitch instance with one bridge and default flows
> 2. two network namespaces "server" and "client"
> 3. two ovs interfaces "server" and "client" on the bridge
> 4. for each ovs interface a veth pair with a matching name and 32 rx and
>tx queues
> 5. move the ends of the veth pairs to the respective network namespaces
> 6. assign ip addresses to each of the veth ends in the namespaces (needs
>to be the same subnet)
> 7. start some http server on the server network namespace
> 8. test if a client in the client namespace can reach the http server

grunt.. please read:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] net: openvswitch: fix race on port output

2023-03-31 Thread Jakub Kicinski

On Fri, 31 Mar 2023 06:25:13 + Felix Hüttner wrote:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 253584777101..6628323b7bea 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3199,6 +3199,7 @@ static u16 skb_tx_hash(const struct net_device *dev,
> }
> 
> if (skb_rx_queue_recorded(skb)) {
> +   BUG_ON(unlikely(qcount == 0));

DEBUG_NET_WARN_ON()

> hash = skb_get_rx_queue(skb);
> if (hash >= qoffset)
> hash -= qoffset;
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index ca3ebfdb3023..33b317e5f9a5 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -913,7 +913,7 @@ static void do_output(struct datapath *dp, struct sk_buff 
> *skb, int out_port,
>  {
> struct vport *vport = ovs_vport_rcu(dp, out_port);
> 
> -   if (likely(vport)) {
> +   if (likely(vport && vport->dev->reg_state == NETREG_REGISTERED)) {

Without looking too closely netif_carrier_ok() seems like a more
appropriate check for liveness on the datapath?

> u16 mru = OVS_CB(skb)->mru;
> u32 cutlen = OVS_CB(skb)->cutlen;
> 
> --
> 2.40.0
> 
> Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
> Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der 
> vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in 
> Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie 
> hier.

You gotta get rid of this to work upstream.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCHv2 net-next 0/5] net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs

2023-02-09 Thread Jakub Kicinski

On Tue,  7 Feb 2023 17:52:05 -0500 Xin Long wrote:
> We've moved some duplicate code into nf_nat_ovs in:
> 
>   "net: eliminate the duplicate code in the ct nat functions of ovs and tc"
> 
> This patchset addresses more code duplication in the conntrack of ovs
> and tc then creates nf_conntrack_ovs for them, and four functions will
> be extracted and moved into it:
> 
>   nf_ct_handle_fragments()
>   nf_ct_skb_network_trim()
>   nf_ct_helper()
>   nf_ct_add_helper()

Hi Pablo, do you prefer to take this or should we?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 1/3] string_helpers: Move string_is_valid() to the header

2023-02-07 Thread Jakub Kicinski

On Mon,  6 Feb 2023 18:13:12 +0200 Andy Shevchenko wrote:
> +static inline bool string_is_valid(const char *s, int len)
> +{
> + return memchr(s, '\0', len) ? true : false;
> +}

I was tempted to suggest adding a kdoc, but perhaps the function
doesn't have an obvious enough name? Maybe we should call the helper
string_is_terminated(), instead?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread Jakub Kicinski

On Wed, 1 Feb 2023 21:35:15 -0800 Jakub Kicinski wrote:
> On Thu, 2 Feb 2023 05:09:51 + 陶 缘 wrote:
> > I guest you are pointing to the field "From: taoyuan_e...@hotmail.com" in 
> > the patch header linked from "Headers show" section in the patch page
> > 
> > 
> > https://patchwork.kernel.org/project/netdevbpf/patch/os3p286mb22954422e3dd09ff5fd6b091f5...@os3p286mb2295.jpnp286.prod.outlook.com/
> > 
> > I will fix that accordingly.  
> 
> The From is correct, please look at the entire email Jiri also
> commented about the code.
> 
> Two more notes:
>  - don't top post on the list
>  - reply in plain text not HTML

One more thing, please make sure you read the process information 
for Linux networking:

https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 回复: [PATCH v2 1/1] net: openvswitch: reduce cpu_used_mask memory

2023-02-01 Thread Jakub Kicinski

On Thu, 2 Feb 2023 05:09:51 + 陶 缘 wrote:
> I guest you are pointing to the field "From: taoyuan_e...@hotmail.com" in the 
> patch header linked from "Headers show" section in the patch page
> 
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/os3p286mb22954422e3dd09ff5fd6b091f5...@os3p286mb2295.jpnp286.prod.outlook.com/
> 
> I will fix that accordingly.

The From is correct, please look at the entire email Jiri also
commented about the code.

Two more notes:
 - don't top post on the list
 - reply in plain text not HTML
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] scripts/spelling.txt: add "exsits" pattern and fix typo instances

2023-01-26 Thread Jakub Kicinski

On Thu, 26 Jan 2023 16:22:05 +0100 Luca Ceresoli wrote:
> Fix typos and add the following to the scripts/spelling.txt:
> 
>   exsits||exists
> 
> Signed-off-by: Luca Ceresoli 

You need to split this up per subsystem, I reckon :(
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] [PATCH v6 net-next] net: openvswitch: Add support to count upcall packets

2022-11-30 Thread Jakub Kicinski

On Wed, 30 Nov 2022 04:15:59 -0500 wangchuanlei wrote:
> +/**
> + *   ovs_vport_get_upcall_stats - retrieve upcall stats
> + *
> + * @vport: vport from which to retrieve the stats
> + * @ovs_vport_upcall_stats: location to store stats

s/ovs_vport_upcall_//

> + *
> + * Retrieves upcall stats for the given device.
> + *
> + * Must be called with ovs_mutex or rcu_read_lock.
> + */
> +void ovs_vport_get_upcall_stats(struct vport *vport, struct 
> ovs_vport_upcall_stats *stats)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net] net: openvswitch: add missing .resv_start_op

2022-10-27 Thread Jakub Kicinski

I missed one of the families in OvS when annotating .resv_start_op.
This triggers the warning added in commit ce48ebdd5651 ("genetlink:
limit the use of validation workarounds to old ops").

Reported-by: syzbot+40eb8c0447c0e47a7...@syzkaller.appspotmail.com
Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes")
Signed-off-by: Jakub Kicinski 
---
CC: pshe...@ovn.org
CC: p...@paul-moore.com
CC: d...@openvswitch.org
---
 net/openvswitch/datapath.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 155263e73512..8b84869eb2ac 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2544,6 +2544,7 @@ struct genl_family dp_vport_genl_family __ro_after_init = 
{
.parallel_ops = true,
.small_ops = dp_vport_genl_ops,
.n_small_ops = ARRAY_SIZE(dp_vport_genl_ops),
+   .resv_start_op = OVS_VPORT_CMD_SET + 1,
.mcgrps = _dp_vport_multicast_group,
.n_mcgrps = 1,
.module = THIS_MODULE,
-- 
2.37.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 02/12] skbuff: Proactively round up to kmalloc bucket size

2022-09-22 Thread Jakub Kicinski

On Wed, 21 Sep 2022 20:10:03 -0700 Kees Cook wrote:
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 974e7138..4fe4c7544c1d 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -427,14 +427,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
> gfp_mask,
>*/
>   size = SKB_DATA_ALIGN(size);
>   size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> - data = kmalloc_reserve(size, gfp_mask, node, );
> - if (unlikely(!data))
> - goto nodata;
> - /* kmalloc(size) might give us more room than requested.
> + /* kmalloc(size) might give us more room than requested, so
> +  * allocate the true bucket size up front.
>* Put skb_shared_info exactly at the end of allocated zone,
>* to allow max possible filling before reallocation.
>*/
> - osize = ksize(data);
> + osize = kmalloc_size_roundup(size);
> + data = kmalloc_reserve(osize, gfp_mask, node, );
> + if (unlikely(!data))
> + goto nodata;
>   size = SKB_WITH_OVERHEAD(osize);
>   prefetchw(data + size);

I'd rename osize here to alloc_size for consistency but one could 
argue either way :)

Acked-by: Jakub Kicinski 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 net 2/2] net: openvswitch: allow conntrack in non-initial user namespace

2022-09-22 Thread Jakub Kicinski

On Wed, 21 Sep 2022 03:19:46 +0200 Michael Weiß wrote:
> Similar to the previous commit, the Netlink interface of the OVS
> conntrack module was restricted to global CAP_NET_ADMIN by using
> GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support
> unprivileged containers in non-initial user namespace.

Should we bump 

  ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL);

to also being accounted?

Otherwise LGTM, please repost with [PATCH net-next v3] in the subject.
net is for fixes only, and we're quite late in the -rc process.

Please try to CC the original authors as well, for Joe the address
will be Joe Stringer .

> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 4e70df91d0f2..9142ba322991 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -2252,14 +2252,16 @@ static int ovs_ct_limit_cmd_get(struct sk_buff *skb, 
> struct genl_info *info)
>  static const struct genl_small_ops ct_limit_genl_ops[] = {
>   { .cmd = OVS_CT_LIMIT_CMD_SET,
>   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> - .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
> -* privilege. */
> + .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN
> +* privilege.
> +*/
>   .doit = ovs_ct_limit_cmd_set,
>   },
>   { .cmd = OVS_CT_LIMIT_CMD_DEL,
>   .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> - .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
> -* privilege. */
> + .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN
> +* privilege.
> +*/
>   .doit = ovs_ct_limit_cmd_del,
>   },
>   { .cmd = OVS_CT_LIMIT_CMD_GET,

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next] genetlink: start to validate reserved header bytes

2022-08-24 Thread Jakub Kicinski

We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.

Signed-off-by: Jakub Kicinski 
---
CC: j...@resnulli.us
CC: johan...@sipsolutions.net
CC: linux-bl...@vger.kernel.org
CC: osmocom-net-g...@lists.osmocom.org
CC: linux-w...@vger.kernel.org
CC: wiregu...@lists.zx2c4.com
CC: linux-wirel...@vger.kernel.org
CC: linux-s...@vger.kernel.org
CC: target-de...@vger.kernel.org
CC: linux...@vger.kernel.org
CC: virtualizat...@lists.linux-foundation.org
CC: linux-c...@vger.kernel.org
CC: cluster-de...@redhat.com
CC: mp...@lists.linux.dev
CC: lvs-de...@vger.kernel.org
CC: netfilter-de...@vger.kernel.org
CC: linux-security-mod...@vger.kernel.org
CC: d...@openvswitch.org
CC: linux-s...@vger.kernel.org
CC: tipc-discuss...@lists.sourceforge.net
---
 drivers/block/nbd.c  | 1 +
 drivers/net/gtp.c| 1 +
 drivers/net/ieee802154/mac802154_hwsim.c | 1 +
 drivers/net/macsec.c | 1 +
 drivers/net/team/team.c  | 1 +
 drivers/net/wireguard/netlink.c  | 1 +
 drivers/net/wireless/mac80211_hwsim.c| 1 +
 drivers/target/target_core_user.c| 1 +
 drivers/thermal/thermal_netlink.c| 1 +
 drivers/vdpa/vdpa.c  | 1 +
 fs/cifs/netlink.c| 1 +
 fs/dlm/netlink.c | 1 +
 fs/ksmbd/transport_ipc.c | 1 +
 include/linux/genl_magic_func.h  | 1 +
 include/net/genetlink.h  | 3 +++
 kernel/taskstats.c   | 1 +
 net/batman-adv/netlink.c | 1 +
 net/core/devlink.c   | 1 +
 net/core/drop_monitor.c  | 1 +
 net/ethtool/netlink.c| 1 +
 net/hsr/hsr_netlink.c| 1 +
 net/ieee802154/netlink.c | 1 +
 net/ieee802154/nl802154.c| 1 +
 net/ipv4/fou.c   | 1 +
 net/ipv4/tcp_metrics.c   | 1 +
 net/ipv6/ila/ila_main.c  | 1 +
 net/ipv6/ioam6.c | 1 +
 net/ipv6/seg6.c  | 1 +
 net/l2tp/l2tp_netlink.c  | 1 +
 net/mptcp/pm_netlink.c   | 1 +
 net/ncsi/ncsi-netlink.c  | 1 +
 net/netfilter/ipvs/ip_vs_ctl.c   | 1 +
 net/netlabel/netlabel_calipso.c  | 1 +
 net/netlabel/netlabel_cipso_v4.c | 1 +
 net/netlabel/netlabel_mgmt.c | 1 +
 net/netlabel/netlabel_unlabeled.c| 1 +
 net/netlink/genetlink.c  | 4 
 net/nfc/netlink.c| 1 +
 net/openvswitch/conntrack.c  | 1 +
 net/openvswitch/datapath.c   | 3 +++
 net/openvswitch/meter.c  | 1 +
 net/psample/psample.c| 1 +
 net/smc/smc_netlink.c| 3 ++-
 net/smc/smc_pnet.c   | 3 ++-
 net/tipc/netlink.c   | 1 +
 net/tipc/netlink_compat.c| 1 +
 net/wireless/nl80211.c   | 1 +
 47 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 2a709daefbc4..6cec9ce23fd3 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -2322,6 +2322,7 @@ static struct genl_family nbd_genl_family __ro_after_init 
= {
.module = THIS_MODULE,
.small_ops  = nbd_connect_genl_ops,
.n_small_ops= ARRAY_SIZE(nbd_connect_genl_ops),
+   .resv_start_op  = NBD_CMD_STATUS + 1,
.maxattr= NBD_ATTR_MAX,
.policy = nbd_attr_policy,
.mcgrps = nbd_mcast_grps,
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index a208e2b1a9af..15c7dc82107f 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1859,6 +1859,7 @@ static struct genl_family gtp_genl_family __ro_after_init 
= {
.module = THIS_MODULE,
.small_ops  = gtp_genl_ops,
.n_small_ops= ARRAY_SIZE(gtp_genl_ops),
+   .resv_start_op  = GTP_CMD_ECHOREQ + 1,
.mcgrps = gtp_genl_mcgrps,
.n_mcgrps   = ARRAY_SIZE(gtp_genl_mcgrps),
 };
diff --git a/drivers/net/ieee802154/mac802154_hwsim.c 
b/drivers/net/ieee802154/mac802154_hwsim.c
index 38c217bd7c82..2f0544dd7c2a 100644
--- a/drivers/net/ieee802154/mac802154_hwsim.c
+++ b/drivers/net/ieee802154/mac802154_hwsim.c
@@ -630,6 +630,7 @@ static struct genl_family hwsim_genl_family __ro_after_init 
= {
.module = THIS_MODULE,
.small_ops = hwsim_nl_ops,
.n_small_ops = ARRAY_SIZE(hwsim_nl_ops),
+   .resv_start_op = MAC802154_HWSIM_CMD_NEW_EDGE + 1,
.mcgrps = hwsim_mcgrps,
.n_mcgrps

Re: [ovs-dev] [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces

2022-08-23 Thread Jakub Kicinski

On Tue, 23 Aug 2022 16:50:21 +0300 Andrey Zhadchenko wrote:
> > Are you 100% sure that all user space making this call initializes
> > dp_ifindex to 0? There is no validation in the kernel today that
> > the field is not garbage as far as I can tell.
> > 
> > If you are sure, please add the appropriate analysis to the commit msg.  
> 
> I am not sure that *all* users set dp_ifindex to zero. At least I do not 
> know about them. Primary ovs userspace ovs-vswitchd certainly set to 
> zero, others like WeaveNet do it too. But there can be more?
> What is the policy regarding this? I can add a new attribute, it is not 
> hard.

IDK how many user space components driving OvS exist out there.
We could risk it and reuse the field, but if we get it wrong 
and don't catch the regression before the final release is cut 
the result gets _really_ ugly. We will have _some_ user space
out there expecting us to use ifindex and _some_ which expects
it can pass garbage, so we can neither revert not leave it...
If that makes sense.

If using / adding attribute is not a hassle that'd be my vote.

Using fixed structs like struct ovs_header is strongly discouraged
for new families exactly because of situations like this :(
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 3/3] openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requests

2022-08-22 Thread Jakub Kicinski

On Fri, 19 Aug 2022 18:30:44 +0300 Andrey Zhadchenko wrote:
> -static size_t ovs_dp_cmd_msg_size(void)
> +static size_t ovs_dp_cmd_msg_size(struct datapath *dp)
>  {
>   size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
> + struct dp_nlsk_pids *pids = ovsl_dereference(dp->upcall_portids);
> +
>  

double new line

>   msgsize += nla_total_size(IFNAMSIZ);
>   msgsize += nla_total_size_64bit(sizeof(struct ovs_dp_stats));
> @@ -1516,6 +1518,9 @@ static size_t ovs_dp_cmd_msg_size(void)
>   msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_USER_FEATURES */
>   msgsize += nla_total_size(sizeof(u32)); /* OVS_DP_ATTR_MASKS_CACHE_SIZE 
> */
>  
> + if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU && pids)
> + msgsize += nla_total_size_64bit(sizeof(u32) * pids->n_pids);

Can we make a safe over estimation here, like nr_cpu_ids maybe?
Would that be too large? It's fairly common to overestimate the
netlink message allocation.

Also why 64bit if the value is in u32 units?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 2/3] openvswitch: fix memory leak at failed datapath creation

2022-08-22 Thread Jakub Kicinski

On Mon, 22 Aug 2022 08:53:44 -0400 Aaron Conole wrote:
> Thanks for this patch.  I guess independent of this series, this patch
> should be applied to the net tree as well - it fixes an existing issue.

Yes, please, this needs to be reposted separately as [PATCH net].
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 1/3] openvswitch: allow specifying ifindex of new interfaces

2022-08-22 Thread Jakub Kicinski

On Fri, 19 Aug 2022 18:30:42 +0300 Andrey Zhadchenko wrote:
> CRIU is preserving ifindexes of net devices after restoration. However,
> current Open vSwitch API does not allow to target ifindex, so we cannot
> correctly restore OVS configuration.
> 
> Use ovs_header->dp_ifindex during OVS_DP_CMD_NEW as desired ifindex.
> Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev
> ifindex.

> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1739,6 +1739,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   struct vport *vport;
>   struct ovs_net *ovs_net;
>   int err;
> + struct ovs_header *ovs_header = info->userhdr;
>  
>   err = -EINVAL;
>   if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
> @@ -1779,6 +1780,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   parms.dp = dp;
>   parms.port_no = OVSP_LOCAL;
>   parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
> + parms.desired_ifindex = ovs_header->dp_ifindex;

Are you 100% sure that all user space making this call initializes
dp_ifindex to 0? There is no validation in the kernel today that 
the field is not garbage as far as I can tell.

If you are sure, please add the appropriate analysis to the commit msg.

>   /* So far only local changes have been made, now need the lock. */
>   ovs_lock();
> @@ -2199,7 +2201,10 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
> struct genl_info *info)
>   if (!a[OVS_VPORT_ATTR_NAME] || !a[OVS_VPORT_ATTR_TYPE] ||
>   !a[OVS_VPORT_ATTR_UPCALL_PID])
>   return -EINVAL;
> - if (a[OVS_VPORT_ATTR_IFINDEX])
> +
> + parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
> +
> + if (a[OVS_VPORT_ATTR_IFINDEX] && parms.type != OVS_VPORT_TYPE_INTERNAL)
>   return -EOPNOTSUPP;
>  
>   port_no = a[OVS_VPORT_ATTR_PORT_NO]
> @@ -2236,12 +2241,19 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
> struct genl_info *info)
>   }
>  
>   parms.name = nla_data(a[OVS_VPORT_ATTR_NAME]);
> - parms.type = nla_get_u32(a[OVS_VPORT_ATTR_TYPE]);
>   parms.options = a[OVS_VPORT_ATTR_OPTIONS];
>   parms.dp = dp;
>   parms.port_no = port_no;
>   parms.upcall_portids = a[OVS_VPORT_ATTR_UPCALL_PID];
>  
> + if (parms.type == OVS_VPORT_TYPE_INTERNAL) {
> + if (a[OVS_VPORT_ATTR_IFINDEX])

You already validated that type must be internal for ifindex 
to be specified, so the outer if is unnecessary.

It's pretty common in netlink handling code to validate first
then act assuming validation has passed.

> + parms.desired_ifindex =
> + nla_get_u32(a[OVS_VPORT_ATTR_IFINDEX]);
> + else
> + parms.desired_ifindex = 0;
> + }
> +
>   vport = new_vport();
>   err = PTR_ERR(vport);
>   if (IS_ERR(vport)) {

> diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
> index 9de5030d9801..24e1cba2f1ac 100644
> --- a/net/openvswitch/vport.h
> +++ b/net/openvswitch/vport.h
> @@ -98,6 +98,8 @@ struct vport_parms {
>   enum ovs_vport_type type;
>   struct nlattr *options;
>  
> + int desired_ifindex;

Any chance this field would make sense somewhere else? you're adding 
a 4B field between two pointers it will result in a padding.

Also you're missing kdoc for this field.

>   /* For ovs_vport_alloc(). */
>   struct datapath *dp;
>   u16 port_no;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 1/2] openvswitch: Fix double reporting of drops in dropwatch

2022-07-29 Thread Jakub Kicinski

On Thu, 28 Jul 2022 12:12:58 -0400 Mike Pattrick wrote:
> Frames sent to userspace can be reported as dropped in
> ovs_dp_process_packet, however, if they are dropped in the netlink code
> then netlink_attachskb will report the same frame as dropped.
> 
> This patch checks for error codes which indicate that the frame has
> already been freed.
> 
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2109946

Please remove the Bugzilla link, it doesn't seem to be public.
If it is then it should be a Link: tag, not a custom tag for bz.

> Signed-off-by: Mike Pattrick 
> ---
>  net/openvswitch/datapath.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 7e8a39a35627..029f9c3e1c28 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -252,10 +252,20 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct 
> sw_flow_key *key)
>  
>   upcall.mru = OVS_CB(skb)->mru;
>   error = ovs_dp_upcall(dp, skb, key, , 0);
> - if (unlikely(error))
> - kfree_skb(skb);
> - else
> + switch (error) {
> + case 0:
> + fallthrough;
> + case -EAGAIN:
> + fallthrough;
> + case -ERESTARTSYS:
> + fallthrough;

No need to add the fallthrough;s between two case statements.

> + case -EINTR:
>   consume_skb(skb);
> + break;
> + default:
> + kfree_skb(skb);
> + break;
> + }
>   stats_counter = >n_missed;
>   goto out;
>   }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] net: rename reference+tracking helpers

2022-06-08 Thread Jakub Kicinski

On Wed, 8 Jun 2022 16:58:08 -0600 David Ahern wrote:
> On 6/8/22 8:58 AM, Jakub Kicinski wrote:
> > IMO to encourage use of the track-capable API we could keep their names
> > short and call the legacy functions __netdev_hold() as I mentioned or
> > maybe netdev_hold_notrack().  
> 
> I like that option. Similar to the old nla_parse functions that were
> renamed with _deprecated - makes it easier to catch new uses.

Well, not really a perfect parallel because _deprecated nla has to stay
forever, given it behaves differently, while _notrack would hopefully
die either thru conversion or someone rightly taking an axe to the
cobwebbed code.

Either way, I hope nobody is against merging the current patch.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] net: rename reference+tracking helpers

2022-06-08 Thread Jakub Kicinski

On Wed, 8 Jun 2022 10:27:15 +0200 Jiri Pirko wrote:
> Wed, Jun 08, 2022 at 06:39:55AM CEST, k...@kernel.org wrote:
> >Netdev reference helpers have a dev_ prefix for historic
> >reasons. Renaming the old helpers would be too much churn  
> 
> Hmm, I think it would be great to eventually rename the rest too in
> order to maintain unique prefix for netdev things. Why do you think the
> "churn" would be an issue?

Felt like we're better of moving everyone to the new tracking helpers
than doing just a pure rename. But I'm not opposed to a pure rename.

> >diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
> >index 817577e713d7..815738c0e067 100644
> >--- a/drivers/net/macsec.c
> >+++ b/drivers/net/macsec.c
> >@@ -3462,7 +3462,7 @@ static int macsec_dev_init(struct net_device *dev)
> > memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len);
> > 
> > /* Get macsec's reference to real_dev */
> >-dev_hold_track(real_dev, >dev_tracker, GFP_KERNEL);
> >+netdev_hold(real_dev, >dev_tracker, GFP_KERNEL);  
> 
> So we later decide to rename dev_hold() to obey the netdev_*() naming
> scheme, we would have collision.

dev_hold() should not be used in new code, we should use tracking
everywhere. Given that we can name the old helpers __netdev_hold().

> Also, seems to me odd to have:
> OLDPREFIX_x()
> and
> NEWPREFIX_x()
> to be different functions.
> 
> For the sake of not making naming mess, could we rather have:
> netdev_hold_track()
> or
> netdev_hold_tr() if the prior is too long
> ?

See above, one day non-track version should be removed.
IMO to encourage use of the track-capable API we could keep their names
short and call the legacy functions __netdev_hold() as I mentioned or
maybe netdev_hold_notrack().
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net-next] net: rename reference+tracking helpers

2022-06-07 Thread Jakub Kicinski

Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
 dev_hold_track()-> netdev_hold()
 dev_put_track() -> netdev_put()
 dev_replace_track() -> netdev_ref_replace()

Signed-off-by: Jakub Kicinski 
---
CC: dsah...@kernel.org
CC: steffen.klass...@secunet.com
CC: jreu...@yaina.de
CC: ra...@blackwall.org
CC: j...@resnulli.us
CC: kgr...@linux.ibm.com
CC: ivec...@redhat.com
CC: jma...@redhat.com
CC: ying@windriver.com
CC: lucien@gmail.com
CC: a...@arndb.de
CC: yajun.d...@linux.dev
CC: aten...@kernel.org
CC: richardsonn...@google.com
CC: hkallwe...@gmail.com
CC: linux-h...@vger.kernel.org
CC: d...@openvswitch.org
CC: linux-s...@vger.kernel.org
CC: tipc-discuss...@lists.sourceforge.net
---
 drivers/net/eql.c  |  4 ++--
 drivers/net/macsec.c   |  4 ++--
 drivers/net/macvlan.c  |  4 ++--
 drivers/net/netconsole.c   |  2 +-
 drivers/net/vrf.c  |  8 
 include/linux/netdevice.h  | 24 
 include/net/xfrm.h |  2 +-
 net/8021q/vlan_dev.c   |  4 ++--
 net/ax25/af_ax25.c |  7 ---
 net/ax25/ax25_dev.c|  6 +++---
 net/bridge/br_if.c | 10 +-
 net/core/dev.c | 10 +-
 net/core/dev_ioctl.c   |  4 ++--
 net/core/drop_monitor.c|  5 +++--
 net/core/dst.c |  8 
 net/core/failover.c|  4 ++--
 net/core/link_watch.c  |  2 +-
 net/core/neighbour.c   | 18 +-
 net/core/net-sysfs.c   |  8 
 net/core/netpoll.c |  2 +-
 net/core/pktgen.c  |  6 +++---
 net/ethtool/ioctl.c|  4 ++--
 net/ethtool/netlink.c  |  6 +++---
 net/ethtool/netlink.h  |  2 +-
 net/ipv4/devinet.c |  4 ++--
 net/ipv4/fib_semantics.c   | 11 ++-
 net/ipv4/ipmr.c|  2 +-
 net/ipv4/route.c   |  7 +++
 net/ipv4/xfrm4_policy.c|  2 +-
 net/ipv6/addrconf.c|  4 ++--
 net/ipv6/addrconf_core.c   |  2 +-
 net/ipv6/ip6_gre.c |  8 
 net/ipv6/ip6_tunnel.c  |  4 ++--
 net/ipv6/ip6_vti.c |  4 ++--
 net/ipv6/ip6mr.c   |  2 +-
 net/ipv6/route.c   | 10 +-
 net/ipv6/sit.c |  4 ++--
 net/ipv6/xfrm6_policy.c|  4 ++--
 net/llc/af_llc.c   |  2 +-
 net/openvswitch/vport-netdev.c |  6 +++---
 net/packet/af_packet.c | 12 ++--
 net/sched/act_mirred.c |  6 +++---
 net/sched/sch_api.c|  2 +-
 net/sched/sch_generic.c| 11 ++-
 net/smc/smc_pnet.c |  7 ---
 net/switchdev/switchdev.c  |  4 ++--
 net/tipc/bearer.c  |  4 ++--
 net/xfrm/xfrm_device.c |  2 +-
 48 files changed, 141 insertions(+), 137 deletions(-)

diff --git a/drivers/net/eql.c b/drivers/net/eql.c
index 557ca8ff9dec..ca3e4700a813 100644
--- a/drivers/net/eql.c
+++ b/drivers/net/eql.c
@@ -225,7 +225,7 @@ static void eql_kill_one_slave(slave_queue_t *queue, 
slave_t *slave)
list_del(>list);
queue->num_slaves--;
slave->dev->flags &= ~IFF_SLAVE;
-   dev_put_track(slave->dev, >dev_tracker);
+   netdev_put(slave->dev, >dev_tracker);
kfree(slave);
 }
 
@@ -399,7 +399,7 @@ static int __eql_insert_slave(slave_queue_t *queue, slave_t 
*slave)
if (duplicate_slave)
eql_kill_one_slave(queue, duplicate_slave);
 
-   dev_hold_track(slave->dev, >dev_tracker, GFP_ATOMIC);
+   netdev_hold(slave->dev, >dev_tracker, GFP_ATOMIC);
list_add(>list, >all_slaves);
queue->num_slaves++;
slave->dev->flags |= IFF_SLAVE;
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 817577e713d7..815738c0e067 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3462,7 +3462,7 @@ static int macsec_dev_init(struct net_device *dev)
memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len);
 
/* Get macsec's reference to real_dev */
-   dev_hold_track(real_dev, >dev_tracker, GFP_KERNEL);
+   netdev_hold(real_dev, >dev_tracker, GFP_KERNEL);
 
return 0;
 }
@@ -3710,7 +3710,7 @@ static void macsec_free_netdev(struct net_device *dev)
free_percpu(macsec->secy.tx_sc.stats);
 
/* Get rid of the macsec's reference to real_dev */
-   dev_put_track(macsec->real_dev, >dev_tracker);
+   netdev_put(macsec->real_dev, >dev_tracker);
 }
 
 static void macsec_setup(struct net_device *dev)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index eff75beb1395..5b46a6526fc6 100644
--- a/driver

Re: [ovs-dev] [PATCH RFC 4/5] net/tls: Add support for PF_TLSH (a TLS handshake listener)

2022-04-28 Thread Jakub Kicinski

On Thu, 28 Apr 2022 10:09:17 -0400 Benjamin Coddington wrote:
> > Noob reply: wish I knew.  (I somewhat hoped _you_ would've been able to
> > tell me.)
> >
> > Thing is, the only method I could think of for fd passing is the POSIX fd
> > passing via unix_attach_fds()/unix_detach_fds().  But that's AF_UNIX,
> > which really is designed for process-to-process communication, not
> > process-to-kernel.  So you probably have to move a similar logic over to
> > AF_NETLINK. And design a new interface on how fds should be passed over
> > AF_NETLINK.
> >
> > But then you have to face the issue that AF_NELINK is essentially UDP, and
> > you have _no_ idea if and how many processes do listen on the other end.
> > Thing is, you (as the sender) have to copy the fd over to the receiving
> > process, so you'd better _hope_ there is a receiving process.  Not to
> > mention that there might be several processes listening in...

Sort of. I double checked the netlink upcall implementations we have,
they work by user space entity "registering" their netlink address
(portid) at startup. Kernel then directs the upcalls to that address.
But AFAICT there's currently no way for the netlink "server" to see
when a "client" goes away, which makes me slightly uneasy about using
such schemes for security related stuff. The user agent may crash and
something else could grab the same address, I think.

Let me CC OvS who uses it the most, perhaps I'm missing a trick.

My thinking was to use the netlink attribute format (just to reuse the
helpers and parsing, but we can invent a new TLV format if needed) but
create a new socket type specifically for upcalls.

> > And that's something I _definitely_ don't feel comfortable with without
> > guidance from the networking folks, so I didn't pursue it further and we
> > went with the 'accept()' mechanism Chuck implemented.
> >
> > I'm open to suggestions, though.  
> 
> EXPORT_SYMBOL(receive_fd) would allow interesting implementations.
> 
> The kernel keyring facilities have a good API for creating various key_types
> which are able to perform work such as this from userspace contexts.
> 
> I have a working prototype for a keyring key instantiation which allows a
> userspace process to install a kernel fd on its file table.  The problem
> here is how to match/route such fd passing to appropriate processes in
> appropriate namespaces.  I think this problem is shared by all
> kernel-to-userspace upcalls, which I hope we can discuss at LSF/MM.

Almost made me wish I was coming to LFS/MM :)

> I don't think kernel fds are very special as compared to userspace fds.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-09 Thread Jakub Kicinski

On Wed, 9 Mar 2022 14:43:07 +0100 Ilya Maximets wrote:
> >> It's a bit of an uncharted territory, hard to say what's right.
> >> It may be a little easier to code up the rejection if we have
> >> the types defined (which I think we need to do in
> >> __parse_flow_nlattrs()? seems OvS does its own nla parsing?)  
> 
> Ack.  And, yes, __parse_flow_nlattrs() seems to be the right spot.
> OVS has lots of nested attributes with some special treatment in a
> few cases and dependency tracking, AFAICT, so it parses attributes
> on it's own.  Is there a better way?

Looks like OvS has extra requirements like attributes can't be
duplicated and zeroed out attrs are ignored. I don't think generic
parsers can do that today, although the former seems like a useful
addition.

A problem for another time.

> >> Johannes, any preference?
> >
> > so why not again just flat enum without ifdefs and without values
> > commented out? even if we leave values in comments like above it doesn't
> > mean the userspace won't use them by mistake and send to the kernel.
> > but the kernel will probably ignore as not being used and at least
> > there won't be a conflict again even if by mistake.. and it's easiest
> > to read.  
> 
> OK.  Seems like we have some votes and a reason (explicit reject) to have
> them defined.  This will also make current user space and kernel definitions
> equal.  Let me put together a patch and we'll continue the discussion there.

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-08 Thread Jakub Kicinski

On Tue, 8 Mar 2022 19:25:31 +0100 Ilya Maximets wrote:
> > since its rc7 we end up with kernel and ovs broken with each other.
> > can we revert the kernel patches anyway and introduce them again later
> > when ovs userspace is also updated?  
> 
> I don't think this patch is part of 5-17-rc7.  AFAICT, it's a candidate
> for 5.18, so we should still have a bit of time.  Am I missing something?

You are correct, it's going to hit Linus's tree during the 5.18 merge
window. It seems we're close enough to a resolution to focus on a fix
instead.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-08 Thread Jakub Kicinski

On Tue, 8 Mar 2022 20:33:12 +0100 Ilya Maximets wrote:
> On 3/8/22 17:17, Jakub Kicinski wrote:
> > On Tue, 8 Mar 2022 15:12:45 +0100 Ilya Maximets wrote:  
> >> Yes, that is something that I had in mind too.  The only thing that makes
> >> me uncomfortable is OVS_KEY_ATTR_TUNNEL_INFO = 30 here.  Even though it
> >> doesn't make a lot of difference, I'd better keep the kernel-only 
> >> attributes
> >> at the end of the enumeration.  Is there a better way to handle kernel-only
> >> attribute?  
> > 
> > My thought was to leave the kernel/userspace only types "behind" to
> > avoid perpetuating the same constructs.
> > 
> > Johannes's point about userspace to userspace messages makes the whole
> > thing a little less of an aberration.
> > 
> > Is there a reason for the types to be hidden under __KERNEL__? 
> > Or someone did that in a misguided attempt to save space in attr arrays
> > when parsing?  
> 
> My impression is that OVS_KEY_ATTR_TUNNEL_INFO was hidden from the
> user space just because it's not supposed to ever be used by the
> user space application.  Pravin or Jesse should know for sure.

Hard to make any assumptions about what user space that takes 
the liberty of re-defining kernel uAPI types will or will not
do ;) The only way to be safe would be to actively reject the
attr ID on the kernel side. Unless user space uses the same
exact name for something else IMHO hiding the value doesn't
afford us any extra protection.

> >> I agree with that.  
> > 
> > Should we add the user space types to the kernel header and remove the
> > ifdef __KERNEL__ around TUNNEL_INFO, then?  
> 
> I don't think we need to actually define them, but we may list them
> in the comment.  I'm OK with either option though.
> 
> For the removal of ifdef __KERNEL__, that might be a good thing to do.
> I'm just not sure what are the best practices here.
> We'll need to make some code changes in user space to avoid warnings
> about not all the enum members being used in 'switch'es.  But that's
> not a problem.

Presumably only as the headers are updated? IOW we would not break 
the build for older sources?

> If you think that having a flat enum without 'ifdef's is a viable
> option from a kernel's point of view, I'm all for it.
> 
> Maybe something like this (only checked that this compiles; 29 and
> 30 are correct numbers of these userspace attributes):

It's a bit of an uncharted territory, hard to say what's right.
It may be a little easier to code up the rejection if we have 
the types defined (which I think we need to do in
__parse_flow_nlattrs()? seems OvS does its own nla parsing?)

Johannes, any preference?

> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 9d1710f20505..86bc951be5bc 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -351,11 +351,19 @@ enum ovs_key_attr {
>   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
>   OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
>   OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
> - OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
>  
> -#ifdef __KERNEL__
> - OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> -#endif
> + /* User space decided to squat on types 29 and 30.  They are listed
> +  * below, but should not be sent to the kernel:
> +  *
> +  * OVS_KEY_ATTR_PACKET_TYPE,   be32 packet type
> +  * OVS_KEY_ATTR_ND_EXTENSIONS, IPv6 Neighbor Discovery extensions
> +  *
> +  * WARNING: No new types should be added unless they are defined
> +  *  for both kernel and user space (no 'ifdef's).  It's hard
> +  *  to keep compatibility otherwise. */
> + OVS_KEY_ATTR_TUNNEL_INFO = 31,  /* struct ip_tunnel_info.
> +For in-kernel use only. */
> + OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
>   __OVS_KEY_ATTR_MAX
>  };
>  
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 8b4124820f7d..315064bada3e 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -346,7 +346,7 @@ size_t ovs_key_attr_size(void)
>   /* Whenever adding new OVS_KEY_ FIELDS, we should consider
>* updating this function.
>*/
> - BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 30);
> + BUILD_BUG_ON(OVS_KEY_ATTR_MAX != 32);
>  
>   returnnla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
>   + nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
> ---
> 
> Thoughts?
&

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-08 Thread Jakub Kicinski

On Tue, 8 Mar 2022 15:12:45 +0100 Ilya Maximets wrote:
> >> diff --git a/include/uapi/linux/openvswitch.h 
> >> b/include/uapi/linux/openvswitch.h
> >> index 9d1710f20505..ab6755621e02 100644
> >> --- a/include/uapi/linux/openvswitch.h
> >> +++ b/include/uapi/linux/openvswitch.h
> >> @@ -351,11 +351,16 @@ enum ovs_key_attr {
> >> OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 
> >> */
> >> OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 
> >> */
> >> OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
> >> -   OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
> >>  
> >>  #ifdef __KERNEL__
> >> OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> >>  #endif
> >> +   /* User space decided to squat on types 30 and 31 */
> >> +   OVS_KEY_ATTR_IPV6_EXTHDRS = 32, /* struct ovs_key_ipv6_exthdr */
> >> +   /* WARNING:  */  
> 
> Yes, that is something that I had in mind too.  The only thing that makes
> me uncomfortable is OVS_KEY_ATTR_TUNNEL_INFO = 30 here.  Even though it
> doesn't make a lot of difference, I'd better keep the kernel-only attributes
> at the end of the enumeration.  Is there a better way to handle kernel-only
> attribute?

My thought was to leave the kernel/userspace only types "behind" to
avoid perpetuating the same constructs.

Johannes's point about userspace to userspace messages makes the whole
thing a little less of an aberration.

Is there a reason for the types to be hidden under __KERNEL__? 
Or someone did that in a misguided attempt to save space in attr arrays
when parsing?

> Also, the OVS_KEY_ATTR_ND_EXTENSIONS (31) attribute used to store IPv6 
> Neighbor
> Discovery extensions is currently implemented only for userspace, but nothing
> actually prevents us having the kernel implementation.  So, we need a way to
> make it usable by the kernel in the future.

The "= 32" leaves the earlier attr types as reserved so nothing
prevents us from defining them later. But..

> > It might be nicer to actually document here in what's at least supposed
> > to be the canonical documentation of the API what those types were used
> > for.  
> 
> I agree with that.

Should we add the user space types to the kernel header and remove the
ifdef __KERNEL__ around TUNNEL_INFO, then?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-07 Thread Jakub Kicinski

On Tue, 8 Mar 2022 01:04:00 +0100 Ilya Maximets wrote:
> > Thanks for the explanation, we can apply a revert if that'd help your
> > CI / ongoing development but sounds like the fix really is in user
> > space. Expecting netlink attribute lists not to grow is not fair.  
> 
> I don't think it was intentional, just a careless mistake.  Unfortunately,
> all OVS binaries built during the last 5 years rely on that unwanted
> expectation (re-build will also not help as they are using a copy of the
> uAPI header and the clash will be there anyway).  If we want to keep them
> working, kernel uAPI has to be carefully updated with current userspace-only
> attributes before we add any new ones.  That is not great, but I don't see
> any other option right now that doesn't require code changes in userspace.
> 
> I'd say that we need to revert the current patch and re-introduce it
> later when the uAPI problem is sorted out.  This way we will avoid blocking
> the net-next testing and will also avoid problems in case the uAPI changes
> are not ready at the moment of the new kernel release.
> 
> What do you think?

Let me add some people I associate with genetlink work in my head
(fairly or not) to keep me fair here.

It's highly unacceptable for user space to straight up rewrite kernel
uAPI types but if it already happened the only fix is something like:

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9d1710f20505..ab6755621e02 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -351,11 +351,16 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
-   OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
 #endif
+   /* User space decided to squat on types 30 and 31 */
+   OVS_KEY_ATTR_IPV6_EXTHDRS = 32, /* struct ovs_key_ipv6_exthdr */
+   /* WARNING:  */
+
__OVS_KEY_ATTR_MAX
 };


right?

> > Since ovs uses genetlink you should be able to dump the policy from 
> > the kernel and at least validate that it doesn't overlap.  
> 
> That is interesting.  Indeed, this functionality can be used to detect
> problems or to define userspace-only attributes in runtime based on the
> kernel reply.  Thanks for the pointer!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-07 Thread Jakub Kicinski

On Mon, 7 Mar 2022 23:14:13 +0100 Ilya Maximets wrote:
> The main problem is that userspace uses the modified copy of the uapi header
> which looks like this:
>   
> https://github.com/openvswitch/ovs/blob/f77dbc1eb2da2523625cd36922c6fccfcb3f3eb7/datapath/linux/compat/include/linux/openvswitch.h#L357
> 
> In short, the userspace view:
> 
>   enum ovs_key_attr {
>   
> 
>   #ifdef __KERNEL__
>   /* Only used within kernel data path. */
>   #endif
> 
>   #ifndef __KERNEL__
>   /* Only used within userspace data path. */
>   #endif
>   __OVS_KEY_ATTR_MAX
> };
> 
> And the kernel view:
> 
>   enum ovs_key_attr {
>   
> 
>   #ifdef __KERNEL__
>   /* Only used within kernel data path. */
>   #endif
> 
>   __OVS_KEY_ATTR_MAX
>   };
> 
> This happened before my time, but the commit where userspace made a wrong
> turn appears to be this one:
>   
> https://github.com/openvswitch/ovs/commit/beb75a40fdc295bfd6521b0068b4cd12f6de507c
> The attribute for userspace only was added to the common enum after the
> OVS_KEY_ATTR_TUNNEL_INFO.   I'm not sure how things didn't fall apart when
> OVS_KEY_ATTR_NSH was added later (no-one cared that NSH doesn't work, because
> OVS didn't support it yet?).
> 
> In general, any addition of a new attribute into that enumeration leads to
> inevitable clash between userpsace-only attributes and new kernel attributes.
> 
> After the kernel update, kernel provides new attributes to the userspace and
> userspace tries to parse them as one of the userspace-only attributes and
> fails.   In our current case userspace is trying to parse 
> OVS_KEY_ATTR_IPV6_EXTHDR
> as userspace-only OVS_KEY_ATTR_PACKET_TYPE, because they have the same value 
> in the
> enum, fails and discards the netlink message as malformed.  So, IPv6 is fully
> broken, because OVS_KEY_ATTR_IPV6_EXTHDR is supplied now with every IPv6 
> packet
> that goes to userspace.
> 
> We need to unify the view of 'enum ovs_key_attr' between userspace and kernel
> before we can add any new values to it.
> 
> One way to do that should be addition of both userspace-only attributes to the
> kernel header (and maybe exposing OVS_KEY_ATTR_TUNNEL_INFO too, just to keep
> it flat and avoid any possible problems in the future).  Any other suggestions
> are welcome.  But in any case this will require careful testing with existing
> OVS userspace to avoid any unexpected issues.
> 
> Moving forward, I think, userspace OVS should find a way to not have 
> userpsace-only
> attributes, or have them as a separate enumeration.  But I'm not sure how to 
> do
> that right now.  Or we'll have to add userspace-only attributes to the kernel
> uapi before using them.

Thanks for the explanation, we can apply a revert if that'd help your
CI / ongoing development but sounds like the fix really is in user
space. Expecting netlink attribute lists not to grow is not fair.

Since ovs uses genetlink you should be able to dump the policy from 
the kernel and at least validate that it doesn't overlap.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-03-07 Thread Jakub Kicinski

On Mon, 7 Mar 2022 10:49:31 +0200 Roi Dayan wrote:
> >> I think there is a missing userspace fix. didnt verify yet.
> >> but in ovs userspace odp-netlink.h created from 
> >> datapath/linux/compat/include/linux/openvswitch.h
> >> and that file is not synced the change here.
> >> So the new enum OVS_KEY_ATTR_IPV6_EXTHDRS is missing and also struct
> >> ovs_key_ipv6_exthdrs which is needed in lib/udp-util.c
> >> in struct ovs_flow_key_attr_lens to add expected len for
> >> OVS_KEY_ATTR_IPV6_EXTHDR.  
> > 
> > I guess if this is creating backward compatibility issues, this
> > patch should be reverted/fixed. As a kmod upgrade should not break
> > existing deployments. 
> 
> it looks like it does. we can't work with ovs without reverting this.
> can we continue with reverting this commit please?

Sure, can someone ELI5 what the problem is?

What's "kmod upgrade" in this context a kernel upgrade or loading 
a newer module in older kernel? 

How can adding a new nl attr break user space? Does the user space
actually care about the OVS_KEY_ATTR_TUNNEL_INFO wart?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v4 1/1] openvswitch: Fix setting ipv6 fields causing hw csum failure

2022-02-22 Thread Jakub Kicinski

You'll need to rebase, the patch which made everything force inlined
got merged.

On Sun, 20 Feb 2022 15:21:14 +0200 Paul Blakey wrote:
> +static inline __wsum
> +csum_block_replace(__wsum csum, __wsum old, __wsum new, int offset)
> +{
> + return csum_block_add(csum_block_sub(csum, old, offset), new, offset);

Why still _block? Since the arguments are pre-shifted we should be able
to subtract and add the entire thing, and the math will work out.
Obviously you'd need to shift by the offset in the word, not in a byte
in the caller.

> +}
> +
>  static inline __wsum csum_unfold(__sum16 n)
>  {
>   return (__force __wsum)n;
> @@ -184,4 +190,5 @@ static inline __wsum wsum_negate(__wsum val)
>  {
>   return (__force __wsum)-((__force u32)val);
>  }
> +

Spurious?

>  #endif
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v3 1/1] openvswitch: Fix setting ipv6 fields causing hw csum failure

2022-02-17 Thread Jakub Kicinski

On Wed, 16 Feb 2022 15:53:08 +0200 Paul Blakey wrote:
> + skb->csum =
> + ~csum_block_add(csum_block_sub(~skb->csum,
> +(__force __wsum) 
> (ipv6_tclass << 4), 1),
> + (__force __wsum) (old_ipv6_tclass << 
> 4), 1);

Please add a wcsum_replace4() helper to include/net/checksum.h
It should work like the (misnamed) csum_relace4() helper and have
similar params just wsum instead of sum16.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 1/1] net/sched: act_ct: Fix flow table lookup failure with no originating ifindex

2022-02-17 Thread Jakub Kicinski

On Thu, 17 Feb 2022 11:34:24 +0200 Paul Blakey wrote:
> Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tupledx")

Fixes tag: Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tupledx")
Has these problem(s):
- Subject does not match target commit subject
  Just use
git log -1 --format='Fixes: %h ("%s")'
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v2 1/1] openvswitch: Fix setting ipv6 fields causing hw csum failure

2022-02-14 Thread Jakub Kicinski

On Mon, 14 Feb 2022 14:48:51 +0200 Paul Blakey wrote:
> Ipv6 ttl, label and tos fields are modified without first
> pulling/pushing the ipv6 header, which would have updated
> the hw csum (if available). This might cause csum validation
> when sending the packet to the stack, as can be seen in
> the trace below.
> 
> Fix this by updating skb->csum if available.

> Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action")
> Signed-off-by: Paul Blakey 
> ---
>  Changelog:
> v1->v2:
>   Replaced push pull rcsum checksum calc with actual checksum calc

Well, what I had in mind was an skb helper which would take skb, u32
new_val, u32 old_val, but this works, too.

Please fix the new sparse warnings tho, and..

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 076774034bb9..3725801cb040 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -423,12 +423,44 @@ static void set_ipv6_addr(struct sk_buff *skb, u8 
> l4_proto,
>   memcpy(addr, new_addr, sizeof(__be32[4]));
>  }
>  
> -static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask)
> +static void set_ipv6_dsfield(struct sk_buff *skb, struct ipv6hdr *nh, __u8 
> ipv6_tclass, __u8 mask)
>  {
> + __u8 old_ipv6_tclass = ipv6_get_dsfield(nh);

use normal u8, __u8 is for uAPI.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 1/1] openvswitch: Fix setting ipv6 fields causing hw csum failure

2022-02-10 Thread Jakub Kicinski

On Thu, 10 Feb 2022 10:53:24 +0200 Paul Blakey wrote:
> > The calls seem a little heavy for single byte replacements.
> > Can you instead add a helper based on csum_replace4() maybe?
> > 
> > BTW doesn't pedit have the same problem?
> 
> I don't think they are heavier then csum_replace4,

csum_replace4 is a handful of instructions all of which will be inlined.
csum_partial() is a function call and handles variable lengths.

> but they are more bulletproof in my opinion, since they handle both
> the COMPLETE and PARTIAL csum cases (in __skb_postpull_rcsum())

Yes, that's why I said "add a helper based on", a skb helper which
checks the csum type of the packet but instead of calling csum_partial
for no reason does the adjustment directly.

> and resemble what editing of the packet should have done - pull the
> header, edit, and then push it back.

That's not what this code is doing, so the argument does not stand IMO.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-02-09 Thread Jakub Kicinski

On Wed, 9 Feb 2022 12:46:01 -0800 Cpp Code wrote:
> > ok, I see advantage of using skb_header_pointer() in this case, but
> > replacing all check_header() with skb_header_pointer() would add lot
> > of copy operation in flow extract. Anyways for this use case
> > skb_header_pointer() is fine.
> >
> > Acked-by: Pravin B Shelar   
> 
> Could this be applied please.

Please repost with Pravin's ack included.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 1/1] openvswitch: Fix setting ipv6 fields causing hw csum failure

2022-02-08 Thread Jakub Kicinski

On Mon, 7 Feb 2022 16:41:01 +0200 Paul Blakey wrote:
> Ipv6 ttl, label and tos fields are modified without first
> pulling/pushing the ipv6 header, which would have updated
> the hw csum (if available). This might cause csum validation
> when sending the packet to the stack, as can be seen in
> the trace below.
> 
> Fix this by calling postpush/postpull checksum calculation
> which will update the hw csum if needed.

> -static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask)
> +static void set_ipv6_dsfield(struct sk_buff *skb, struct ipv6hdr *nh, __u8 
> ipv6_tclass, __u8 mask)
>  {
> + skb_postpull_rcsum(skb, nh, 4);
> +
> + ipv6_change_dsfield(nh, ~mask, ipv6_tclass);
> +
> + skb_postpush_rcsum(skb, nh, 4);
> +}

The calls seem a little heavy for single byte replacements.
Can you instead add a helper based on csum_replace4() maybe?

BTW doesn't pedit have the same problem?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 1/1] net: openvswitch: Fix ct_state nat flags for conns arriving from tc

2022-01-04 Thread Jakub Kicinski

On Tue, 4 Jan 2022 10:28:21 +0200 Paul Blakey wrote:
> Netfilter conntrack maintains NAT flags per connection indicating
> whether NAT was configured for the connection. Openvswitch maintains
> NAT flags on the per packet flow key ct_state field, indicating
> whether NAT was actually executed on the packet.
> 
> When a packet misses from tc to ovs the conntrack NAT flags are set.
> However, NAT was not necessarily executed on the packet because the
> connection's state might still be in NEW state. As such, openvswitch wrongly
> assumes that NAT was executed and sets an incorrect flow key NAT flags.
> 
> Fix this, by flagging to openvswitch which NAT was actually done in
> act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so
> the packet flow key NAT flags will be correctly set.

Fixes ?

> Signed-off-by: Paul Blakey 

> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 4507d77d6941..bab45a009310 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -287,7 +287,9 @@ struct tc_skb_ext {
>   __u32 chain;
>   __u16 mru;
>   __u16 zone;
> - bool post_ct;
> + bool post_ct:1;
> + bool post_ct_snat:1;
> + bool post_ct_dnat:1;

single bit bool variables seem weird, use a unsigned int type, like u8.

>  };
>  #endif
>  
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 9e71691c491b..a171dfa91910 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -197,7 +197,9 @@ struct tc_skb_cb {
>   struct qdisc_skb_cb qdisc_cb;
>  
>   u16 mru;
> - bool post_ct;
> + bool post_ct: 1;

extra space

> + bool post_ct_snat:1;
> + bool post_ct_dnat:1;
>   u16 zone; /* Only valid if post_ct = true */
>  };
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v2 2/3] net/sched: flow_dissector: Fix matching on zone id for invalid conns

2021-12-10 Thread Jakub Kicinski

On Thu, 9 Dec 2021 09:57:33 +0200 Paul Blakey wrote:
> @@ -238,10 +239,12 @@ void
>  skb_flow_dissect_ct(const struct sk_buff *skb,
>   struct flow_dissector *flow_dissector,
>   void *target_container, u16 *ctinfo_map,
> - size_t mapsize, bool post_ct)
> + size_t mapsize)
>  {
>  #if IS_ENABLED(CONFIG_NF_CONNTRACK)
> + bool post_ct = tc_skb_cb(skb)->post_ct;
>   struct flow_dissector_key_ct *key;
> + u16 zone = tc_skb_cb(skb)->zone;
>   enum ip_conntrack_info ctinfo;
>   struct nf_conn_labels *cl;
>   struct nf_conn *ct;
> @@ -260,6 +263,7 @@ skb_flow_dissect_ct(const struct sk_buff *skb,
>   if (!ct) {
>   key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
>   TCA_FLOWER_KEY_CT_FLAGS_INVALID;
> + key->ct_zone = zone;
>   return;
>   }
>  

Why is flow dissector expecting skb cb to be TC now?
Please keep the appropriate abstractions intact.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 0/3] net/sched: Fix ct zone matching for invalid conntrack state

2021-12-08 Thread Jakub Kicinski

On Wed, 8 Dec 2021 19:02:37 +0200 Paul Blakey wrote:
> Currently, when a packet is marked as invalid conntrack_in in act_ct,
> post_ct will be set, and connection info (nf_conn) will be removed
> from the skb. Later openvswitch and flower matching will parse this
> as ct_state=+trk+inv. But because the connection info is missing,
> there is also no zone info to match against even though the packet
> is tracked.
> 
> This series fixes that, by passing the last executed zone by act_ct.
> The zone info is passed along from act_ct to the ct flow dissector
> (used by flower to extract zone info) and to ovs, the same way as post_ct
> is passed, via qdisc layer skb cb to dissector, and via skb extension
> to OVS.
> 
> Since there was no more for BPF skb cb to extend the qdisc skb cb,
> tc info on the qdisc skb cb is moved to a tc specific cb that extend it
> instead of within it (same as BPF).

The last paragraph is missing words.

Please repost and cast a wider net for reviewers, you must CC blamed
authors.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2021-11-02 Thread Jakub Kicinski

On Fri, 29 Oct 2021 11:20:08 -0700 Toms Atteka wrote:
> This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
> packets can be filtered using ipv6_ext flag.
> 
> Signed-off-by: Toms Atteka 

Hi! This patch didn't get reviewed in time for the v5.16 merge window,
please continue the work but you'll have to repost in two weeks after
v5.16-rc1 has been cut. If you repost before that point please use RFC
designation.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6] net: openvswitch: IPv6: Add IPv6 extension header support

2021-09-29 Thread Jakub Kicinski

On Wed, 29 Sep 2021 08:19:05 +0200 Nicolas Dichtel wrote:
> > /* Insert a kernel only KEY_ATTR */
> > #define OVS_KEY_ATTR_TUNNEL_INFO__OVS_KEY_ATTR_MAX
> > #undef OVS_KEY_ATTR_MAX
> > #define OVS_KEY_ATTR_MAX__OVS_KEY_ATTR_MAX  
> Following the other thread [1], this will break if a new app runs over an old
> kernel.

Good point.

> Why not simply expose this attribute to userspace and throw an error if a
> userspace app uses it?

Does it matter if it's exposed or not? Either way the parsing policy
for attrs coming from user space should have a reject for the value.
(I say that not having looked at the code, so maybe I shouldn't...)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6] net: openvswitch: IPv6: Add IPv6 extension header support

2021-09-28 Thread Jakub Kicinski

On Tue, 28 Sep 2021 12:47:27 -0700 Toms Atteka wrote:
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index a87b44cd5590..dc6eb5f6399f 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -346,6 +346,13 @@ enum ovs_key_attr {
>  #ifdef __KERNEL__
>   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
>  #endif
> +
> +#ifndef __KERNEL__

#else

> + PADDING,  /* Padding so kernel and non kernel field count would match */

The name PADDING seems rather risky, collisions will be likely.
OVS_KEY_ATTR_PADDING maybe?

But maybe we don't need to define this special value and bake it into
the uAPI, why can't we add something like this to the kernel header
(i.e. include/linux/openvswitch.h):

/* Insert a kernel only KEY_ATTR */
#define OVS_KEY_ATTR_TUNNEL_INFO__OVS_KEY_ATTR_MAX
#undef OVS_KEY_ATTR_MAX
#define OVS_KEY_ATTR_MAX__OVS_KEY_ATTR_MAX

> +#endif
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v5] net: openvswitch: IPv6: Add IPv6 extension header support

2021-09-21 Thread Jakub Kicinski

On Mon, 20 Sep 2021 11:20:38 -0700 Toms Atteka wrote:
> This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
> packets can be filtered using ipv6_ext flag.
> 
> Signed-off-by: Toms Atteka 

Please make sure to check the files you touch with

./scripts/kernel-doc -none

You're adding kdoc warnings by using the /** comments
which are in fact not kdoc-formatted. Please fix and 
repost.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] openvswitch: Fix condition check in do_execute_actions() by using nla_ok()

2021-09-17 Thread Jakub Kicinski

On Fri, 17 Sep 2021 08:07:14 + Jiasheng Jiang wrote:
> Just using 'rem > 0' might be unsafe, so it's better
> to use the nla_ok() instead.
> Because we can see from the nla_next() that
> '*remaining' might be smaller than 'totlen'. And nla_ok()
> will avoid it happening.
> For example, ovs_dp_process_packet() -> ovs_execute_actions()
> -> do_execute_actions(), and attr comes from OVS_CB(skb)->input_vport,  
> which restores the received packet from the user space.

Right but that's the call trace for where the actions are executed.
Not where they are constructed.

As far as I can tell the action list is constructed in the kernel 
by __ovs_nla_copy_actions(). Since kernel does the formatting, it
can trust the contents are correct. We normally require nla_ok()
when handling input directly from user space, which is not the 
case in do_execute_actions().

And since kernel is sure that the input is correct the extra checking
just adds to the datapath overhead.

Unless you can point out how exactly the input could be invalid 
at this point I'd suggest we leave this code as is. Perhaps add
a comment explaining why input is trusted.

Thanks!

> Fixes: ccb1352e76cff0524e7ccb2074826a092dd13016
> ('net: Add Open vSwitch kernel components.')

FWIW the correct format would have been:

Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")

> Signed-off-by: Jiasheng Jiang 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] openvswitch: Fix condition check by using nla_ok()

2021-09-16 Thread Jakub Kicinski

On Thu, 16 Sep 2021 07:36:40 -0700 Jakub Kicinski wrote:
> On Thu, 16 Sep 2021 01:43:23 + Jiasheng Jiang wrote:
> > Just using 'rem > 0' might be unsafe, so it's better
> > to use the nla_ok() instead.
> > Because we can see from the nla_next() that
> > '*remaining' might be smaller than 'totlen'. And nla_ok()
> > will avoid it happening.
> > 
> > Signed-off-by: Jiasheng Jiang   
> 
> Are the attributes coming from the user space here or are generated 
> by the kernel / were already validated?  Depending on that this is
> either a fix and needs to be backported or a possible cleanup.
> 
> Please repost with the explanation where attrs come from in the commit
> message, and if it's indeed a bug please add a Fixes tag.

And please use different subject for each patch, otherwise patchwork
bot thinks this is just two versions of the same patch and marks the 
one posted earlier as Superseded.

> If we do need the nla_ok() we should probably also switch to
> nla_for_each_attr() and nla_for_each_nested().

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] openvswitch: Fix condition check by using nla_ok()

2021-09-16 Thread Jakub Kicinski

On Thu, 16 Sep 2021 01:43:23 + Jiasheng Jiang wrote:
> Just using 'rem > 0' might be unsafe, so it's better
> to use the nla_ok() instead.
> Because we can see from the nla_next() that
> '*remaining' might be smaller than 'totlen'. And nla_ok()
> will avoid it happening.
> 
> Signed-off-by: Jiasheng Jiang 

Are the attributes coming from the user space here or are generated 
by the kernel / were already validated?  Depending on that this is
either a fix and needs to be backported or a possible cleanup.

Please repost with the explanation where attrs come from in the commit
message, and if it's indeed a bug please add a Fixes tag.

If we do need the nla_ok() we should probably also switch to
nla_for_each_attr() and nla_for_each_nested().
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [External] [PATCH] ovs: datapath: clear skb->tstamp in forwarding path

2021-08-17 Thread Jakub Kicinski

On Tue, 17 Aug 2021 10:25:24 +0800 范开喜 wrote:
> fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs
> doesn't clear skb->tstamp. We encountered a problem with linux
> version 5.4.56 and ovs version 2.14.1, and packets failed to
> dequeue from qdisc when fq qdisc was attached to ovs port.
> 
> Signed-off-by: kaixi.fan 
> Signed-off-by: xiexiaohui 
> Reviewed-by: Cong Wang 
> ---
>  net/openvswitch/vport.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> index 88deb5b41429..cf2ce5812489 100644
> --- a/net/openvswitch/vport.c
> +++ b/net/openvswitch/vport.c
> @@ -507,6 +507,7 @@ void ovs_vport_send(struct vport *vport, struct
> sk_buff *skb, u8 mac_proto)
>   }
> 
>   skb->dev = vport->dev;
> + skb->tstamp = 0;
>   vport->ops->send(skb);
>   return;

This patch has been mangled by your email client, please try
git send-email if possible.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v2] net: openvswitch: fix TTL decrement exception action execution

2020-12-14 Thread Jakub Kicinski

On Mon,  7 Dec 2020 05:08:39 -0500 Eelco Chaudron wrote:
> Currently, the exception actions are not processed correctly as the wrong
> dataset is passed. This change fixes this, including the misleading
> comment.
> 
> In addition, a check was added to make sure we work on an IPv4 packet,
> and not just assume if it's not IPv6 it's IPv4.
> 
> This was all tested using OVS with patch,
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=21639,
> applied and sending packets with a TTL of 1 (and 0), both with IPv4
> and IPv6.
> 
> Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink 
> message format")
> Signed-off-by: Eelco Chaudron 
> ---
> v2: - Undid unnessesary paramerter removal from dec_ttl_exception_handler()
> - Updated commit message to include testing information.

Applied now, and will send to stable soon-ish.

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: fix TTL decrement exception action execution

2020-12-04 Thread Jakub Kicinski

On Fri,  4 Dec 2020 07:16:23 -0500 Eelco Chaudron wrote:
> Currently, the exception actions are not processed correctly as the wrong
> dataset is passed. This change fixes this, including the misleading
> comment.
> 
> In addition, a check was added to make sure we work on an IPv4 packet,
> and not just assume if it's not IPv6 it's IPv4.
> 
> Small cleanup which removes an unsessesaty parameter from the
> dec_ttl_exception_handler() function.

No cleanups in fixes, please. Especially when we're at -rc6..

You can clean this up in net-next within a week after trees merge.

> Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink 
> message format")

:( 
and please add some info on how these changes are tested.

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] openvswitch: fix error return code in validate_and_copy_dec_ttl()

2020-12-04 Thread Jakub Kicinski

On Fri, 04 Dec 2020 13:07:48 +0100 Eelco Chaudron wrote:
> > Fix to return a negative error code from the error handling
> > case instead of 0, as done elsewhere in this function.
> >
> > Changing 'return start' to 'return action_start' can fix this bug.
> >
> > Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action 
> > netlink message format")
> > Reported-by: Hulk Robot 
> > Signed-off-by: Wang Hai   
> 
> Thanks for fixing!
> 
> Reviewed-by: Eelco Chaudron 

Applied, thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: fix TTL decrement action netlink message format

2020-11-23 Thread Jakub Kicinski

On Mon, 23 Nov 2020 20:36:39 +0100 Matteo Croce wrote:
> On Fri, Nov 20, 2020 at 10:12 PM Jakub Kicinski  wrote:
> > On Thu, 19 Nov 2020 04:04:04 -0500 Eelco Chaudron wrote:  
> > > Currently, the openvswitch module is not accepting the correctly formated
> > > netlink message for the TTL decrement action. For both setting and getting
> > > the dec_ttl action, the actions should be nested in the
> > > OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi. 
> > >  
> >
> > IOW this change will not break any known user space, correct?
> >
> > But existing OvS user space already expects it to work like you
> > make it work now?
> >
> > What's the harm in leaving it as is?
> >  
> > > Fixes: 744676e77720 ("openvswitch: add TTL decrement action")
> > > Signed-off-by: Eelco Chaudron   
> >
> > Can we get a review from OvS folks? Matteo looks good to you (as the
> > original author)?
> 
> I think that the userspace still has to implement the dec_ttl action;
> by now dec_ttl is implemented with set_ttl().
> So there is no breakage yet.
> 
> Eelco, with this fix we will encode the netlink attribute in the same
> way for the kernel and netdev datapath?

We don't allow breaking uAPI. Sounds like the user space never
implemented this and perhaps the nesting is just inconvenient 
but not necessarily broken? If it is broken and unusable that 
has to be clearly explained in the commit message. I'm dropping 
v1 from patchwork.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: fix TTL decrement action netlink message format

2020-11-20 Thread Jakub Kicinski

On Thu, 19 Nov 2020 04:04:04 -0500 Eelco Chaudron wrote:
> Currently, the openvswitch module is not accepting the correctly formated
> netlink message for the TTL decrement action. For both setting and getting
> the dec_ttl action, the actions should be nested in the
> OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.

IOW this change will not break any known user space, correct?

But existing OvS user space already expects it to work like you 
make it work now?

What's the harm in leaving it as is?

> Fixes: 744676e77720 ("openvswitch: add TTL decrement action")
> Signed-off-by: Eelco Chaudron 

Can we get a review from OvS folks? Matteo looks good to you (as the
original author)?

> - err = __ovs_nla_copy_actions(net, attr, key, sfa, eth_type,
> + err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type,
>vlan_tci, mpls_label_count, log);
>   if (err)
>   return err;

You're not canceling any nests on error, I assume this is normal.

> + add_nested_action_end(*sfa, action_start);
>   add_nested_action_end(*sfa, start);
>   return 0;
>  }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: Be liberal in tcp conntrack.

2020-11-20 Thread Jakub Kicinski

On Fri, 20 Nov 2020 07:32:11 +0100 Florian Westphal wrote:
> Jakub Kicinski  wrote:
> > On Mon, 16 Nov 2020 18:31:26 +0530 nusid...@redhat.com wrote:  
> > > From: Numan Siddique 
> > > 
> > > There is no easy way to distinguish if a conntracked tcp packet is
> > > marked invalid because of tcp_in_window() check error or because
> > > it doesn't belong to an existing connection. With this patch,
> > > openvswitch sets liberal tcp flag for the established sessions so
> > > that out of window packets are not marked invalid.
> > > 
> > > A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
> > > sets this flag for both the directions of the nf_conn.
> > > 
> > > Suggested-by: Florian Westphal 
> > > Signed-off-by: Numan Siddique   
> 
> Acked-by: Florian Westphal 

Thanks! Applied.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: Be liberal in tcp conntrack.

2020-11-19 Thread Jakub Kicinski

On Mon, 16 Nov 2020 18:31:26 +0530 nusid...@redhat.com wrote:
> From: Numan Siddique 
> 
> There is no easy way to distinguish if a conntracked tcp packet is
> marked invalid because of tcp_in_window() check error or because
> it doesn't belong to an existing connection. With this patch,
> openvswitch sets liberal tcp flag for the established sessions so
> that out of window packets are not marked invalid.
> 
> A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
> sets this flag for both the directions of the nf_conn.
> 
> Suggested-by: Florian Westphal 
> Signed-off-by: Numan Siddique 

Florian, LGTY?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [net-next] netfiler: conntrack: Add the option to set ct tcp flag - BE_LIBERAL per-ct basis.

2020-11-09 Thread Jakub Kicinski

On Mon,  9 Nov 2020 12:59:30 +0530 nusid...@redhat.com wrote:
> From: Numan Siddique 
> 
> Before calling nf_conntrack_in(), caller can set this flag in the
> connection template for a tcp packet and any errors in the
> tcp_in_window() will be ignored.
> 
> A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which
> sets this flag for both the directions of the nf_conn.
> 
> openvswitch makes use of this feature so that any out of window tcp
> packets are not marked invalid. Prior to this there was no easy way
> to distinguish if conntracked packet is marked invalid because of
> tcp_in_window() check error or because it doesn't belong to an
> existing connection.
> 
> An earlier attempt (see the link) tried to solve this problem for
> openvswitch in a different way. Florian Westphal instead suggested
> to be liberal in openvswitch for tcp packets.
> 
> Link: 
> https://patchwork.ozlabs.org/project/netdev/patch/20201006083355.121018-1-nusid...@redhat.com/
> 
> Suggested-by: Florian Westphal 
> Signed-off-by: Numan Siddique 

Please repost Ccing Pablo & netfilter-devel.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v2] net: openvswitch: silence suspicious RCU usage warning

2020-11-03 Thread Jakub Kicinski

On Tue,  3 Nov 2020 09:25:49 +0100 Eelco Chaudron wrote:
> Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
> by replacing rcu_dereference() with rcu_dereference_ovsl().
> 
> In addition, when creating a new datapath, make sure it's configured under
> the ovs_lock.
> 
> Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
> Reported-by: syzbot+9a8f8bfcc56e85780...@syzkaller.appspotmail.com
> Signed-off-by: Eelco Chaudron 
> ---
> v2: - Moved local variable initialization above lock
> - Renamed jump label to indicate unlocking

Applied, thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Use IS_ERR instead of IS_ERR_OR_NULL

2020-11-02 Thread Jakub Kicinski

On Sat, 31 Oct 2020 14:01:53 +0800 YueHaibing wrote:
> Fix smatch warning:
> 
> net/openvswitch/meter.c:427 ovs_meter_cmd_set() warn: passing zero to 
> 'PTR_ERR'
> 
> dp_meter_create() never returns NULL, use IS_ERR
> instead of IS_ERR_OR_NULL to fix this.
> 
> Signed-off-by: YueHaibing 

Applied, thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: silence suspicious RCU usage warning

2020-11-02 Thread Jakub Kicinski

On Mon, 02 Nov 2020 09:52:19 +0100 Eelco Chaudron wrote:
> On 30 Oct 2020, at 22:28, Jakub Kicinski wrote:
> >> @@ -1695,6 +1695,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
> >> struct genl_info *info)
> >>if (err)
> >>goto err_destroy_ports;
> >>
> >> +  /* So far only local changes have been made, now need the lock. */
> >> +  ovs_lock();  
> >
> > Should we move the lock below assignments to param?
> >
> > Looks a little strange to protect stack variables with a global lock.  
> 
> You are right, I should have moved it down after the assignment. I will 
> send out a v2.
> 
> > Let's update the name of the label.  
> 
> Guess now it is, unlock and destroy meters, so what label are you 
> looking for?
> 
> err_unlock_and_destroy_meters: which looks a bit long, or just 
> err_unlock:

I feel like I saw some names like err_unlock_and_destroy_meters in OvS
code, but can't find them in this file right now.

I'd personally go for kist err_unlock, or maybe err_unlock_ovs as is
used in other functions in this file.

But as long as it starts with err_unlock it's fine by me :)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: silence suspicious RCU usage warning

2020-10-30 Thread Jakub Kicinski

On Thu, 29 Oct 2020 15:53:21 +0100 Eelco Chaudron wrote:
> Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
> by replacing rcu_dereference() with rcu_dereference_ovsl().
> 
> In addition, when creating a new datapath, make sure it's configured under
> the ovs_lock.
> 
> Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
> Reported-by: syzbot+9a8f8bfcc56e85780...@syzkaller.appspotmail.com
> Signed-off-by: Eelco Chaudron 
> ---
>  net/openvswitch/datapath.c   |8 
>  net/openvswitch/flow_table.c |2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 832f898edb6a..020f8539fede 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1695,6 +1695,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   if (err)
>   goto err_destroy_ports;
>  
> + /* So far only local changes have been made, now need the lock. */
> + ovs_lock();

Should we move the lock below assignments to param?

Looks a little strange to protect stack variables with a global lock.

>   /* Set up our datapath device. */
>   parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
>   parms.type = OVS_VPORT_TYPE_INTERNAL;
> @@ -1707,9 +1710,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   if (err)
>   goto err_destroy_meters;
>  
> - /* So far only local changes have been made, now need the lock. */
> - ovs_lock();
> -
>   vport = new_vport();
>   if (IS_ERR(vport)) {
>   err = PTR_ERR(vport);
> @@ -1725,7 +1725,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   ovs_dp_reset_user_features(skb, info);
>   }
>  
> - ovs_unlock();
>   goto err_destroy_meters;
>   }
>  
> @@ -1742,6 +1741,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   return 0;
>  
>  err_destroy_meters:

Let's update the name of the label.

> + ovs_unlock();
>   ovs_meters_exit(dp);
>  err_destroy_ports:
>   kfree(dp->ports);
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index f3486a37361a..c89c8da99f1a 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -390,7 +390,7 @@ static struct mask_cache *tbl_mask_cache_alloc(u32 size)
>  }
>  int ovs_flow_tbl_masks_cache_resize(struct flow_table *table, u32 size)
>  {
> - struct mask_cache *mc = rcu_dereference(table->mask_cache);
> + struct mask_cache *mc = rcu_dereference_ovsl(table->mask_cache);
>   struct mask_cache *new;
>  
>   if (size == mc->cache_size)
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v4] net: openvswitch: fix to make sure flow_lookup() is not preempted

2020-10-18 Thread Jakub Kicinski

On Sat, 17 Oct 2020 20:24:51 +0200 Eelco Chaudron wrote:
> The flow_lookup() function uses per CPU variables, which must be called
> with BH disabled. However, this is fine in the general NAPI use case
> where the local BH is disabled. But, it's also called from the netlink
> context. The below patch makes sure that even in the netlink path, the
> BH is disabled.
> 
> In addition, u64_stats_update_begin() requires a lock to ensure one writer
> which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
> ensures that there is always only one writer.
> 
> Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
> Reported-by: Juri Lelli 
> Signed-off-by: Eelco Chaudron 

Applied and queued for 5.9.2. Thanks Eelco!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net v3] net: openvswitch: fix to make sure flow_lookup() is not preempted

2020-10-16 Thread Jakub Kicinski

On Thu, 15 Oct 2020 19:09:33 +0200 Eelco Chaudron wrote:
> The flow_lookup() function uses per CPU variables, which must be called
> with BH disabled. However, this is fine in the general NAPI use case
> where the local BH is disabled. But, it's also called from the netlink
> context. The below patch makes sure that even in the netlink path, the
> BH is disabled.
> 
> In addition, u64_stats_update_begin() requires a lock to ensure one writer
> which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
> ensures that there is always only one writer.
> 
> Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
> Reported-by: Juri Lelli 
> Signed-off-by: Eelco Chaudron 

Hi Eelco, looks like this doesn't apply after the 5.10 merges.

Please respin on top of the current net.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/9 net-next] net: netdevice.h: sw_netstats_rx_add helper

2020-10-05 Thread Jakub Kicinski

On Mon,  5 Oct 2020 22:34:18 +0200 Fabian Frederick wrote:
> +static inline void dev_sw_netstats_rx_add(struct net_device *dev, unsigned 
> int len)
> +{
> + struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
> +
> + u64_stats_update_begin(>syncp);
> + tstats->rx_bytes += len;
> + tstats->rx_packets++;
> + u64_stats_update_end(>syncp);
> +

checkpatch points out there is an unnecessary empty line here

> +}
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 2/2] net: openvswitch: make masks cache size configurable

2020-07-23 Thread Jakub Kicinski

On Thu, 23 Jul 2020 14:58:31 +0200 Eelco Chaudron wrote:
> + if ((size & (size - 1)) != 0 ||

is_power_of_2() ?

> + (size * sizeof(struct mask_cache_entry)) > PCPU_MIN_UNIT_SIZE)
> + return NULL;

> + new->cache_size = size;
> + if (new->cache_size > 0) {
> + cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
> +new->cache_size,

array_size() ?

> +__alignof__(struct mask_cache_entry));
> +

No need for the new line here

> + if (!cache) {
> + kfree(new);
> + return NULL;
> + }
> + }

> + if (size == mc->cache_size || (size & (size - 1)) != 0)

why check "is power of 2" twice?

> @@ -454,7 +516,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
>   struct table_instance *ti = rcu_dereference_raw(table->ti);
>   struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
>  
> - free_percpu(table->mask_cache);
> + call_rcu(>mask_cache->rcu, mask_cache_rcu_cb);
>   call_rcu(>mask_array->rcu, mask_array_rcu_cb);
>   table_instance_destroy(table, ti, ufid_ti, false);
>  }

This adds a new warning :(

net/openvswitch/flow_table.c:519:24: warning: incorrect type in argument 1 
(different address spaces)
net/openvswitch/flow_table.c:519:24:expected struct callback_head *head
net/openvswitch/flow_table.c:519:24:got struct callback_head [noderef] 
__rcu *
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 2/2] net: openvswitch: make masks cache size configurable

2020-07-22 Thread Jakub Kicinski

On Wed, 22 Jul 2020 10:27:52 +0200 Eelco Chaudron wrote:
> This patch makes the masks cache size configurable, or with
> a size of 0, disable it.
> 
> Reviewed-by: Paolo Abeni 
> Signed-off-by: Eelco Chaudron 

Hi Elco!

This patch adds a bunch of new sparse warnings:

net/openvswitch/flow_table.c:376:23: warning: incorrect type in assignment 
(different address spaces)
net/openvswitch/flow_table.c:376:23:expected struct mask_cache_entry *cache
net/openvswitch/flow_table.c:376:23:got void [noderef] __percpu *
net/openvswitch/flow_table.c:386:25: warning: incorrect type in assignment 
(different address spaces)
net/openvswitch/flow_table.c:386:25:expected struct mask_cache_entry 
[noderef] __percpu *mask_cache
net/openvswitch/flow_table.c:386:25:got struct mask_cache_entry *cache
net/openvswitch/flow_table.c:411:27: warning: incorrect type in assignment 
(different address spaces)
net/openvswitch/flow_table.c:411:27:expected struct mask_cache [noderef] 
__rcu *mask_cache
net/openvswitch/flow_table.c:411:27:got struct mask_cache *
net/openvswitch/flow_table.c:440:35: warning: incorrect type in argument 1 
(different address spaces)
net/openvswitch/flow_table.c:440:35:expected struct mask_cache *mc
net/openvswitch/flow_table.c:440:35:got struct mask_cache [noderef] __rcu 
*mask_cache
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH net 08/16] openvswitch: add missing attribute validation for hash

2020-03-02 Thread Jakub Kicinski

Add missing attribute validation for OVS_PACKET_ATTR_HASH
to the netlink policy.

Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
Signed-off-by: Jakub Kicinski 
---
CC: Pravin B Shelar 
CC: Tonghao Zhang 
CC: d...@openvswitch.org
---
 net/openvswitch/datapath.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c047afd12116..07a7dd185995 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -645,6 +645,7 @@ static const struct nla_policy 
packet_policy[OVS_PACKET_ATTR_MAX + 1] = {
[OVS_PACKET_ATTR_ACTIONS] = { .type = NLA_NESTED },
[OVS_PACKET_ATTR_PROBE] = { .type = NLA_FLAG },
[OVS_PACKET_ATTR_MRU] = { .type = NLA_U16 },
+   [OVS_PACKET_ATTR_HASH] = { .type = NLA_U64 },
 };
 
 static const struct genl_ops dp_packet_genl_ops[] = {
-- 
2.24.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] net: openvswitch: free vport unless register_netdevice() succeeds

2019-10-22 Thread Jakub Kicinski

On Mon, 21 Oct 2019 12:01:57 +0200, Stefano Brivio wrote:
> From: Hillf Danton 
> 
> syzbot found the following crash on:
 
> The function in net core, register_netdevice(), may fail with vport's
> destruction callback either invoked or not. After commit 309b66970ee2,

I've added the correct commit quote here, please heed checkpatch'es
warnings.

> the duty to destroy vport is offloaded from the driver OTOH, which ends
> up in the memory leak reported.
> 
> It is fixed by releasing vport unless device is registered successfully.
> To do that, the callback assignment is defered until device is registered.
> 
> Reported-by: syzbot+13210896153522fe1...@syzkaller.appspotmail.com
> Fixes: 309b66970ee2 ("net: openvswitch: do not free vport if 
> register_netdevice() is failed.")
> Cc: Taehee Yoo 
> Cc: Greg Rose 
> Cc: Eric Dumazet 
> Cc: Marcelo Ricardo Leitner 
> Cc: Ying Xue 
> Cc: Andrey Konovalov 
> Signed-off-by: Hillf Danton 
> Acked-by: Pravin B Shelar 
> [sbrivio: this was sent to d...@openvswitch.org and never made its way
>  to netdev -- resending original patch]
> Signed-off-by: Stefano Brivio 
> ---
> This patch was sent to d...@openvswitch.org and appeared on netdev
> only as Pravin replied to it, giving his Acked-by. I contacted the
> original author one month ago requesting to resend this to netdev,
> but didn't get an answer, so I'm now resending the original patch.

Applied and queued for 4.14+, thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] AF_XDP with QoS support question

2019-06-08 Thread Jakub Kicinski

On Fri, 7 Jun 2019 19:55:34 -0700, William Tu wrote:
> Hi,
> 
> When using AF_XDP, the TC qdisc layer is by-passed and packets go to
> userspace directly. One problem is that there is no QoS support when
> using AF_XDP.
> 
> For egress shaping, I'm thinking about using tc-mqprio, which has
> hardware offload support. And for OVS, we can add tc-mqprio support.

What is your end game?  Once upon the time Simon was explaining the QoS
stuff in OvS to me, but IIRC it used CBQ and/or HTB.  The XDP TX queues
are not exposed to the stack, so we can't set per-queue QoS, setting a
root Qdisc (like mqprio) and expecting the XDP queues to have the same
settings would be very limiting (then again even with mqprio IDK how
you'd select the prio? by using the TX queue ID? hm..).

> For ingress policing, I don't know how to do it. Is there an hardware
> offload ingress policing support?

There is support for act_police in a couple drivers.  Although using it
per queue could be a challenge...  (At least we do have a real queue ID
on the RX, hopefully the mlx5 fake queues never get merged.)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

94 matches

Mail list logo