From: Leon Schuermann <leon@is.currently.online> Add an optional function `ndo_lookup_mtu` to the `struct net_device_ops`. This function can be used to allow other parts of the network stack to let the destination netdevice determine the allowed packet MTU. This is done on a per-packet basis, providing the `struct sk_buff` holding the packet contents.
The information obtained through this method may be cached by other parts of the network stack, such as for instance the path MTU discovery (PMTUD) mechanism. It is not guaranteed that this function will be called for every packet, not even that is called on a single packet of a given flow. When this function is not implemented or when it returns -ENODATA no statement about the permitted MTU is made and the networking stack will resort to the device MTU values. These properties make this mechanism capable of providing a "suggestion" for a packet's MTU, deviating from the default device MTU. The device is allowed to announce MTU values lower or higher than the minimum and maximum device MTU respectively. Whether such MTU values will be respected is up to the implementation. Still, even with this being a non-mandatory to implement or respect mechanism, it has some interesting consequences. Being able to inspect the entire packet buffer, the destination netdevice implementation can control MTUs on a flow granularity. For instance, it could be used to allow two devices on a shared Ethernet segment to communicate with each other using a large (> 1500 byte) MTU, while using a lower MTU for other devices. The immediate motivation for these changes provide another example of this mechanism being useful: when using WireGuard, peers can reside behind paths of varying MTU restrictions. PMTUD does not work across these tunnel links however, as WireGuard cannot accept unauthenticated ICMP responses. Thus it will continue to send too large packets over lower-MTU links. With this mechanism WireGuard can, on a per-peer granularity, reduce the MTU, without limiting the overall device MTU. Furthermore, it can employ in-band PMTUD mechanisms to resolve these values automatically. While an MTU metric can be set for specific FIB routes and thus lower the MTU for individual peers, as a consequence this completely disables PMTUD on the entire route. While regular PMTUD does not work over the tunnel link, it should still be usable on the rest of the route. Furthermore, when employing an in-band per-peer PMTUD mechanism, modifying the FIB to store the detected MTU is inelegant at best. Signed-off-by: Leon Schuermann <leon@is.currently.online> --- include/linux/netdevice.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7c3da0e1ea9d..d9d59b756f57 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1279,6 +1279,16 @@ struct netdev_net_notifier { * struct net_device *(*ndo_get_peer_dev)(struct net_device *dev); * If a device is paired with a peer device, return the peer instance. * The caller must be under RCU read context. + * int (*ndo_lookup_mtu)(const struct sk_buff *skb, + * const struct net_device *dev); + * For devices supporting dynamic lookup of the MTU for individual + * skb packets, this function returns the MTU for the passed skb. + * A return value of -ENODATA must be treated as if the device does + * not support this feature. It is not guaranteed that this function will + * be called for every packet presented to the ndo_start_xmit function. + * A device must always accept packets of the announced min/max device MTU. + * This function may be used to potentially allow MTU sizes lower/higher + * than the min/max device MTU respectively. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1487,6 +1497,8 @@ struct net_device_ops { int (*ndo_tunnel_ctl)(struct net_device *dev, struct ip_tunnel_parm *p, int cmd); struct net_device * (*ndo_get_peer_dev)(struct net_device *dev); + int (*ndo_lookup_mtu)(const struct sk_buff *skb, + const struct net_device *dev); }; /** -- 2.33.1