On a system running a 3.10-derived kernel along with out-of-tree OVS from around May 2017, we're getting "nf_ct_tcp: bad TCP checksum" errors when handling network traffic from certain applications on another system.
One thing all the error packets have in common is that they are short. With a zero-length TCP payload and no IP options, they are shorter than the minimum Ethernet frame of 64 bytes (including FCS). So on the system running OVS, they arrive with some trailing padding (0x00 bytes). In the normal receive path, ip_rcv() trims the packet to iph->tot_len before invoking NF_INET_PRE_ROUTING hooks (including conntrack). Then any subsequent L3+ processing steps, like nf_checksum(), can simply use skb->len as the length of the packet, rather than referring back to iph->tot_len. This trimming does not seem to occur in the OVS conntrack path, so the checksum verification in tcp_header() fails. (The extra 0x00 bytes themselves don't affect the checksum, but the length in the IP pseudoheader does. That length is based on skb->len, and without trimming, it doesn't match the length the sender used when computing the checksum.) This wasn't an issue until OVS conntrack started passing hooknum==NF_INET_PRE_ROUTING to nf_conntrack_in(), specifically to validate L4 checksums (https://github.com/openvswitch/ovs/commit/4a777f56ca). It's possible that conntrack in newer kernels isn't confused by padding, or removes it some other way, but unfortunately we're stuck with 3.10. Does this diagnosis make sense? Any suggestions on the right place to trim padding in the OVS conntrack path? --Ed _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
