Ilya Maximets <[email protected]> writes:

> On 6/15/23 19:49, Paolo Valerio wrote:
>> Ilya Maximets <[email protected]> writes:
>> 
>>> On 6/14/23 21:08, Ilya Maximets wrote:
>>>> On 6/14/23 20:11, Paolo Valerio wrote:
>>>>> Ilya Maximets <[email protected]> writes:
>>>>>
>>>>>> On 6/12/23 16:57, Aaron Conole wrote:
>>>>>>> Paolo Valerio <[email protected]> writes:
>>>>>>>
>>>>>>>> since a27d70a89 ("conntrack: add generic IP protocol support") all
>>>>>>>> the unrecognized IP protocols get handled using ct_proto_other ops
>>>>>>>> and are managed as L3 using 3 tuples.
>>>>>>>>
>>>>>>>> This patch stores L4 information for SCTP in the conn_key so that
>>>>>>>> multiple conn instances, instead of one with ports zeroed, will be
>>>>>>>> created when there are multiple SCTP connections between two hosts.
>>>>>>>> It also performs crc32c check when not offloaded, and adds SCTP to
>>>>>>>> pat_enabled.
>>>>>>>>
>>>>>>>> With this patch, given two SCTP association between two hosts,
>>>>>>>> tracking the connection will result in:
>>>>>>>>
>>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
>>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1
>>>>>>>>
>>>>>>>> instead of:
>>>>>>>>
>>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1
>>>>>>>>
>>>>>>>> Signed-off-by: Paolo Valerio <[email protected]>
>>>>>>>> ---
>>>>>>>
>>>>>>> Thanks for this work - I think it looks good.
>>>>>>>
>>>>>>> Perhaps it should have a NEWS item mentioned that the userspace
>>>>>>> conntrack now supports matching SCTP l4 data.
>>>>>>>
>>>>>>> If you do spin a v4 with that change, you can keep my:
>>>>>>>
>>>>>>> Acked-by: Aaron Conole <[email protected]>
>>>>>>
>>>>>> Hi, Paolo and Aaron.
>>>>>>
>>>>>> I'm getting a consistent test failure while running check-kernel
>>>>>> on Ubuntu 22.10 with 5.19 kernel:
>>>>>>
>>>>>>
>>>>>> ./system-traffic.at:4754: cat ofctl_monitor.log
>>>>>> --- -   2023-06-14 11:26:41.958591125 +0000
>>>>>> +++
>>>>>> /root/ovs/tests/system-kmod-testsuite.dir/at-groups/105/stdout
>>>>>> 2023-06-14 11:26:41.952000000 +0000
>>>>>> @@ -12,8 +12,6 @@
>>>>>>  
>>>>>> sctp,vlan_tci=0x0000,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>>> sctp_csum:9b67e853
>>>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=54 in_port=1
>>>>>> (via action) data_len=54 (unbuffered)
>>>>>>  
>>>>>> sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>>> sctp_csum:bc0e5463
>>>>>> -NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=50
>>>>>> ct_state=est|rpl|trk|dnat,ct_zone=1,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=132,ct_tp_src=54969,ct_tp_dst=12345,ip,in_port=2
>>>>>> (via action) data_len=50 (unbuffered)
>>>>>> -sctp,vlan_tci=0x0000,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>>> sctp_csum:d6ce6b9e
>>>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=50 in_port=1
>>>>>> (via action) data_len=50 (unbuffered)
>>>>>> -sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>>> sctp_csum:add7db93
>>>>>> +sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=54969,tp_dst=12345
>>>>>> sctp_csum:5db68ce
>>>>>>
>>>>>>
>>>>>> Do you know what can be a problem here?
>>>>>>
>>>>>> Test is passing on Fedora 38 with 6.3 kernel and on rhel 9.2.
>>>>>>
>>>>>
>>>>> Hi Ilya,
>>>>>
>>>>> Uhm, it seems there's a problem with the shutdown sequence.
>>>>> I just ran the on a VM:
>>>>>
>>>>> vagrant@ubuntu2210:~/ovs$ grep CONFIG_NF_CT_PROTO_SCTP 
>>>>> /boot/config-5.19.0-38-generic 
>>>>> CONFIG_NF_CT_PROTO_SCTP=y
>>>>>
>>>>> vagrant@ubuntu2210:~/ovs$ grep VERSION /etc/os-release 
>>>>> VERSION_ID="22.10"
>>>>> VERSION="22.10 (Kinetic Kudu)"
>>>>> VERSION_CODENAME=kinetic
>>>>>
>>>>> vagrant@ubuntu2210:~/ovs$ uname -r
>>>>> 5.19.0-38-generic
>>>>
>>>> The only difference with my VM is that I have -43-generic kernel.
>>>>
>>>>>
>>>>> but I can't see the failure.
>>>>> Any chance to see if they are marked for some reason as invalid?
>>>>
>>>> I dumped conntrack after every packet and here is what I see:
>>>>
>>>> On RHEL9, where test is working:
>>>>
>>>> 1. sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345
>>>> dport=34567 mark=0 zone=1 use=1
>>>> 2. sctp 132 2 COOKIE_WAIT src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> mark=0 zone=1 use=1
>>>> 3. sctp 132 2 COOKIE_ECHOED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> mark=0 zone=1 use=1
>>>> 4. sctp 132 431999 ESTABLISHED src=10.1.1.1 dst=10.1.1.2
>>>> sport=54969 dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345
>>>> dport=34567 [ASSURED] mark=0 zone=1 use=1
>>>> 5. sctp 132 431999 ESTABLISHED src=10.1.1.1 dst=10.1.1.2
>>>> sport=54969 dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345
>>>> dport=34567 [ASSURED] mark=0 zone=1 use=1
>>>> 6. sctp 132 431999 ESTABLISHED src=10.1.1.1 dst=10.1.1.2
>>>> sport=54969 dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345
>>>> dport=34567 [ASSURED] mark=0 zone=1 use=1
>>>> 7. sctp 132 0 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> [ASSURED] mark=0 zone=1 use=1
>>>> 8. sctp 132 2 SHUTDOWN_ACK_SENT src=10.1.1.1 dst=10.1.1.2
>>>> sport=54969 dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345
>>>> dport=34567 [ASSURED] mark=0 zone=1 use=1
>>>> 9. sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> [ASSURED] mark=0 zone=1 use=1
>>>
>>> Here if I monitor conntrack during the test, I get:
>>>
>>> # conntrack -E --proto=sctp
>>>     [NEW] sctp 132 10 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345
>>> dport=34567 zone=1
>>> [DESTROY] sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>> [ASSURED] zone=1 [USERSPACE] portid=3715
>>>
>>>
>>>>
>>>> On Ubuntu, where it doesn't work:
>>>>
>>>> 1. sctp 132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345
>>>> dport=34567 zone=1 use=1
>>>> 2. sctp 132 2 COOKIE_WAIT src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> zone=1 use=1
>>>> 3. sctp 132 2 COOKIE_ECHOED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> zone=1 use=1
>>>> 4. sctp 132 209 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> [ASSURED] zone=1 use=1
>>>> 5. sctp 132 209 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> [ASSURED] zone=1 use=1
>>>> 6. sctp 132 209 ESTABLISHED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> [ASSURED] zone=1 use=1
>>>> 7. sctp 132 0 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969
>>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567
>>>> [ASSURED] zone=1 use=1
>>>> 8. NO ENTRY!
>>>> 9. NO ENTRY!
>>>
>>> But here I have:
>>>
>>> # conntrack -E --proto=sctp
>>>     [NEW] sctp 132 10 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969
>>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345
>>> dport=34567 zone=1
>>> [DESTROY] sctp 132 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2
>>> sport=54969 dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345
>>> dport=34567 [ASSURED] zone=1
>>>
>>> So, the connection indeed is getting destroyed while in SHUTDOWN_SENT state.
>>> Sounds like a kernel bug in Ubuntu...
>>>
>> 
>> Thanks Ilya for digging more into it.
>> It seemed so to me as well, but looking at the logs you provided, the
>> timeout for SHUTDOWN_SENT is 0 (third column of the dump), so it seems
>> it's a matter of speed in the reply.
>> 
>> ❯ cat /proc/sys/net/netfilter/nf_conntrack_sctp_timeout_shutdown_sent
>> 0
>> 
>> I'm a bit surprised by this.
>> 
>> Just to confirm that, I upgraded the kernel of my vm to
>> 5.19.0-43-generic and the test succeeded. Sleeping for 1 second in
>> SHUTDOWN_SENT before sending the SHUTDOWN_ACK_SENT make the test fail. I
>> would expect the same on RHEL 9 and Fedora.
>
> Hmm, good point.
> The difference between my tests on Ubuntu and RHEL is that I tested
> with -O1 and sanitizers on Ubuntu, so it was a tiny bit slower.
> I just tried to run with sanitizers on RHEL and I'm getting the same
> failure as I have in Ubuntu.
>
> So, the test seems to be extremely time-sensitive.  Is there a way
> to make it more stable?

I guess one alternative could be change from trying to dump the ct
information directly, check for the 'conntrack' utility, and use that to
log the events - then sweep the ct events log.  It seems the ofctl
monitor is showing that it is a bit racy, but maybe relying on the ct
events log could still give us the confidence that it is working without
the raciness of the ofctl monitor?

>> 
>> Paolo
>> 
>>>>
>>>> So, after sending SHUTDOWN_ACK, there is no conntrack entry in the kernel 
>>>> anymore.
>>>>
>>>>
>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>
>> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to