On 3/22/24 17:36, Mike Pattrick wrote: > On Fri, Mar 22, 2024 at 10:41 AM Ilya Maximets <[email protected]> wrote: >> >> Recirculation involves re-parsing the packet from scratch and that >> process is not aware of multiple header levels nor the inner/outer >> offsets. So, it overwrites offsets with new ones from the outermost >> headers and sets offloading flags that change their meaning when >> the packet is marked for tunnel offloading. >> >> For example: >> >> 1. TCP packet enters OVS. >> 2. TCP packet gets encapsulated into UDP tunnel. >> 3. Recirculation happens. >> 4. Packet is re-parsed after recirculation with miniflow_extract() >> or similar function. >> 5. Packet is marked for UDP checksumming because we parse the >> outermost set of headers. But since it is tunneled, it means >> inner UDP checksumming. And that makes no sense, because the >> inner packet is TCP. >> >> This is causing packet drops due to malformed packets or even >> assertions and crashes in the code that is trying to fixup checksums >> for packets using incorrect metadata: >> >> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior >> >> lib/packets.c:2061:15: runtime error: >> member access within null pointer of type 'struct udp_header' >> >> 0 0xbe5221 in packet_udp_complete_csum lib/packets.c:2061:15 >> 1 0x7e5662 in dp_packet_ol_send_prepare lib/dp-packet.c:638:9 >> 2 0x96ef89 in netdev_send lib/netdev.c:940:9 >> 3 0x818e94 in dp_netdev_pmd_flush_output_on_port lib/dpif-netdev.c:5577:9 >> 4 0x817606 in dp_netdev_pmd_flush_output_packets lib/dpif-netdev.c:5618:27 >> 5 0x81cfa5 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5677:9 >> 6 0x7eefe4 in dpif_netdev_run lib/dpif-netdev.c:7001:25 >> 7 0x610e87 in type_run ofproto/ofproto-dpif.c:367:9 >> 8 0x5b9e80 in ofproto_type_run ofproto/ofproto.c:1879:31 >> 9 0x55bbb4 in bridge_run__ vswitchd/bridge.c:3281:9 >> 10 0x558b6b in bridge_run vswitchd/bridge.c:3346:5 >> 11 0x591dc5 in main vswitchd/ovs-vswitchd.c:130:9 >> 12 0x172b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89) >> 13 0x172c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a) >> 14 0x47eff4 in _start (vswitchd/ovs-vswitchd+0x47eff4) >> >> Tests added for both IPv4 and IPv6 cases. Though IPv6 test doesn't >> trigger the issue it's better to have a symmetric test. >> >> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.") >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053014.html >> Signed-off-by: Ilya Maximets <[email protected]> >> --- > > I have tested this, and it does fix the segfault here. > > Acked-by: Mike Pattrick <[email protected]> >
Thanks! Applied and backported to 3.3. We can think of alternative solutions on top of this fix, but I don't see any easy ones that would cover all the cases for now. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
