Ilya Maximets <[email protected]> writes: > On 6/14/23 21:08, Ilya Maximets wrote: >> On 6/14/23 20:11, Paolo Valerio wrote: >>> Ilya Maximets <[email protected]> writes: >>> >>>> On 6/12/23 16:57, Aaron Conole wrote: >>>>> Paolo Valerio <[email protected]> writes: >>>>> >>>>>> since a27d70a89 ("conntrack: add generic IP protocol support") all >>>>>> the unrecognized IP protocols get handled using ct_proto_other ops >>>>>> and are managed as L3 using 3 tuples. >>>>>> >>>>>> This patch stores L4 information for SCTP in the conn_key so that >>>>>> multiple conn instances, instead of one with ports zeroed, will be >>>>>> created when there are multiple SCTP connections between two hosts. >>>>>> It also performs crc32c check when not offloaded, and adds SCTP to >>>>>> pat_enabled. >>>>>> >>>>>> With this patch, given two SCTP association between two hosts, >>>>>> tracking the connection will result in: >>>>>> >>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1 >>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1 >>>>>> >>>>>> instead of: >>>>>> >>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1 >>>>>> >>>>>> Signed-off-by: Paolo Valerio <[email protected]> >>>>>> --- >>>>> >>>>> Thanks for this work - I think it looks good. >>>>> >>>>> Perhaps it should have a NEWS item mentioned that the userspace >>>>> conntrack now supports matching SCTP l4 data. >>>>> >>>>> If you do spin a v4 with that change, you can keep my: >>>>> >>>>> Acked-by: Aaron Conole <[email protected]> >>>> >>>> Hi, Paolo and Aaron. >>>> >>>> I'm getting a consistent test failure while running check-kernel >>>> on Ubuntu 22.10 with 5.19 kernel: >>>> >>>> >>>> ./system-traffic.at:4754: cat ofctl_monitor.log >>>> --- - 2023-06-14 11:26:41.958591125 +0000 >>>> +++ /root/ovs/tests/system-kmod-testsuite.dir/at-groups/105/stdout >>>> 2023-06-14 11:26:41.952000000 +0000 >>>> @@ -12,8 +12,6 @@ >>>> >>>> sctp,vlan_tci=0x0000,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969 >>>> sctp_csum:9b67e853 >>>> NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=54 in_port=1 (via action) >>>> data_len=54 (unbuffered) >>>> >>>> sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345 >>>> sctp_csum:bc0e5463 >>>> -NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=50 >>>> ct_state=est|rpl|trk|dnat,ct_zone=1,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=132,ct_tp_src=54969,ct_tp_dst=12345,ip,in_port=2 >>>> (via action) data_len=50 (unbuffered) >>>> -sctp,vlan_tci=0x0000,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969 >>>> sctp_csum:d6ce6b9e >>>> NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=50 in_port=1 (via action) >>>> data_len=50 (unbuffered) >>>> -sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345 >>>> sctp_csum:add7db93 >>>> +sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=54969,tp_dst=12345 >>>> sctp_csum:5db68ce >>>> >>>> >>>> Do you know what can be a problem here? >>>> >>>> Test is passing on Fedora 38 with 6.3 kernel and on rhel 9.2. >>>> >>> >>> Hi Ilya, >>> >>> Uhm, it seems there's a problem with the shutdown sequence. >>> I just ran the on a VM: >>> >>> vagrant@ubuntu2210:~/ovs$ grep CONFIG_NF_CT_PROTO_SCTP >>> /boot/config-5.19.0-38-generic >>> CONFIG_NF_CT_PROTO_SCTP=y >>> >>> vagrant@ubuntu2210:~/ovs$ grep VERSION /etc/os-release >>> VERSION_ID="22.10" >>> VERSION="22.10 (Kinetic Kudu)" >>> VERSION_CODENAME=kinetic >>> >>> vagrant@ubuntu2210:~/ovs$ uname -r >>> 5.19.0-38-generic >> >> The only difference with my VM is that I have -43-generic kernel. >> >>> >>> but I can't see the failure. >>> Any chance to see if they are marked for some reason as invalid? >> >> I dumped conntrack after every packet and here is what I see: >> >> On RHEL9, where test is working: >> >> 1. sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 >> mark=0 zone=1 use=1 >> 2. sctp 132 2 COOKIE_WAIT src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 mark=0 >> zone=1 use=1 >> 3. sctp 132 2 COOKIE_ECHOED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 mark=0 >> zone=1 use=1 >> 4. sctp 132 431999 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> mark=0 zone=1 use=1 >> 5. sctp 132 431999 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> mark=0 zone=1 use=1 >> 6. sctp 132 431999 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> mark=0 zone=1 use=1 >> 7. sctp 132 0 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> mark=0 zone=1 use=1 >> 8. sctp 132 2 SHUTDOWN_ACK_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> mark=0 zone=1 use=1 >> 9. sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> mark=0 zone=1 use=1 > > Here if I monitor conntrack during the test, I get: > > # conntrack -E --proto=sctp > [NEW] sctp 132 10 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 > dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 > zone=1 > [DESTROY] sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 > dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] > zone=1 [USERSPACE] portid=3715 > > >> >> On Ubuntu, where it doesn't work: >> >> 1. sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 >> zone=1 use=1 >> 2. sctp 132 2 COOKIE_WAIT src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 zone=1 use=1 >> 3. sctp 132 2 COOKIE_ECHOED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 zone=1 use=1 >> 4. sctp 132 209 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> zone=1 use=1 >> 5. sctp 132 209 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> zone=1 use=1 >> 6. sctp 132 209 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> zone=1 use=1 >> 7. sctp 132 0 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969 >> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] >> zone=1 use=1 >> 8. NO ENTRY! >> 9. NO ENTRY! > > But here I have: > > # conntrack -E --proto=sctp > [NEW] sctp 132 10 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 > dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 > zone=1 > [DESTROY] sctp 132 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969 > dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] > zone=1 > > So, the connection indeed is getting destroyed while in SHUTDOWN_SENT state. > Sounds like a kernel bug in Ubuntu... >
Thanks Ilya for digging more into it. It seemed so to me as well, but looking at the logs you provided, the timeout for SHUTDOWN_SENT is 0 (third column of the dump), so it seems it's a matter of speed in the reply. ❯ cat /proc/sys/net/netfilter/nf_conntrack_sctp_timeout_shutdown_sent 0 I'm a bit surprised by this. Just to confirm that, I upgraded the kernel of my vm to 5.19.0-43-generic and the test succeeded. Sleeping for 1 second in SHUTDOWN_SENT before sending the SHUTDOWN_ACK_SENT make the test fail. I would expect the same on RHEL 9 and Fedora. Paolo >> >> So, after sending SHUTDOWN_ACK, there is no conntrack entry in the kernel >> anymore. >> >> >>> >>>> Best regards, Ilya Maximets. >>> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
