Eelco Chaudron <[email protected]> writes: > On 22 Mar 2024, at 20:06, Aaron Conole wrote: > >> Open vSwitch is originally intended to switch at layer 2, only dealing with >> Ethernet frames. With the introduction of l3 tunnels support, it crossed >> into the realm of needing to care a bit about some routing details when >> making forwarding decisions. If an oversized packet would need to be >> fragmented during this forwarding decision, there is a chance for pmtu >> to get involved and generate a routing exception. This is gated by the >> skbuff->pkt_type field. >> >> When a flow is already loaded into the openvswitch module this field is >> set up and transitioned properly as a packet moves from one port to >> another. In the case that a packet execute is invoked after a flow is >> newly installed this field is not properly initialized. This causes the >> pmtud mechanism to omit sending the required exception messages across >> the tunnel boundary and a second attempt needs to be made to make sure >> that the routing exception is properly setup. To fix this, we set the >> outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get >> to the openvswitch module via a port device or packet command. > > Is this not a problem when the packet comes from the bridge port in the > kernel?
It very well may be an issue there as well, but the recommendation is to operate with the bridge port down as far as I know, so I don't know if this issue has been observed happening from the bridge port. Since I will spin a v2 with a comment, do you want me to mention something about the bridge port? >> This issue is periodically encountered in complex setups, such as large >> openshift deployments, where multiple sets of tunnel traversal occurs. >> A way to recreate this is with the ovn-heater project that can setup >> a networking environment which mimics such large deployments. In that >> environment, without this patch, we can see: >> >> ./ovn_cluster.sh start >> podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200 >> podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache >> podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 >> -c2 >> PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. >> From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142) >> >> --- 21.0.0.3 ping statistics --- >> 2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 1017ms >> >> Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not >> sent into the server. >> >> With this patch, setting the pkt_type, we see the following: >> >> podman exec ovn-chassis-1 ip netns exec sw01p1 ping 21.0.0.3 -M do -s 1300 >> -c2 >> PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. >> From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222) >> ping: local error: message too long, mtu=1222 >> >> --- 21.0.0.3 ping statistics --- >> 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1061ms >> >> In this case, the first ping request receives the FRAG_NEEDED message and >> a local routing exception is created. >> >> Reported-at: https://issues.redhat.com/browse/FDP-164 >> Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.") >> Signed-off-by: Aaron Conole <[email protected]> >> --- >> NOTE: An alternate approach would be to add a netlink attribute to preserve >> pkt_type across the kernel->user boundary, but that does require some >> userspace cooperation. > > I prefer the method in this patch, as it requires no userspace change, > i.e. it will work even with older versions of OVS without the need for > backports. Yes - that was my thinking as well. >> net/openvswitch/actions.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c >> index 6fcd7e2ca81fe..952c6292100d0 100644 >> --- a/net/openvswitch/actions.c >> +++ b/net/openvswitch/actions.c >> @@ -936,6 +936,8 @@ static void do_output(struct datapath *dp, struct >> sk_buff *skb, int out_port, >> pskb_trim(skb, ovs_mac_header_len(key)); >> } >> >> + skb->pkt_type = PACKET_OUTGOING; >> + > > Maybe add a comment based on the large explanation above? Okay - I can add one. >> if (likely(!mru || >> (skb->len <= mru + vport->dev->hard_header_len))) { >> ovs_vport_send(vport, skb, ovs_key_mac_proto(key)); >> -- >> 2.41.0 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
