On 6/15/23 19:49, Paolo Valerio wrote:
> Ilya Maximets <[email protected]> writes:
> 
>> On 6/14/23 21:08, Ilya Maximets wrote:
>>> On 6/14/23 20:11, Paolo Valerio wrote:
>>>> Ilya Maximets <[email protected]> writes:
>>>>
>>>>> On 6/12/23 16:57, Aaron Conole wrote:
>>>>>> Paolo Valerio <[email protected]> writes:
>>>>>>
>>>>>>> since a27d70a89 ("conntrack: add generic IP protocol support") all
>>>>>>> the unrecognized IP protocols get handled using ct_proto_other ops
>>>>>>> and are managed as L3 using 3 tuples.
>>>>>>>
>>>>>>> This patch stores L4 information for SCTP in the conn_key so that
>>>>>>> multiple conn instances, instead of one with ports zeroed, will be
>>>>>>> created when there are multiple SCTP connections between two hosts.
>>>>>>> It also performs crc32c check when not offloaded, and adds SCTP to
>>>>>>> pat_enabled.
>>>>>>>
>>>>>>> With this patch, given two SCTP association between two hosts,
>>>>>>> tracking the connection will result in:
>>>>>>>
>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1
>>>>>>>
>>>>>>> instead of:
>>>>>>>
>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1
>>>>>>>
>>>>>>> Signed-off-by: Paolo Valerio <[email protected]>
>>>>>>> ---
>>>>>>
>>>>>> Thanks for this work - I think it looks good.
>>>>>>
>>>>>> Perhaps it should have a NEWS item mentioned that the userspace
>>>>>> conntrack now supports matching SCTP l4 data.
>>>>>>
>>>>>> If you do spin a v4 with that change, you can keep my:
>>>>>>
>>>>>> Acked-by: Aaron Conole <[email protected]>
>>>>>
>>>>> Hi, Paolo and Aaron.
>>>>>
>>>>> I'm getting a consistent test failure while running check-kernel
>>>>> on Ubuntu 22.10 with 5.19 kernel:
>>>>>
>>>>>
>>>>> ./system-traffic.at:4754: cat ofctl_monitor.log
>>>>> --- -   2023-06-14 11:26:41.958591125 +0000
>>>>> +++ /root/ovs/tests/system-kmod-testsuite.dir/at-groups/105/stdout      
>>>>> 2023-06-14 11:26:41.952000000 +0000
>>>>> @@ -12,8 +12,6 @@
>>>>>  
>>>>> sctp,vlan_tci=0x0000,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>>  sctp_csum:9b67e853
>>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=54 in_port=1 (via action) 
>>>>> data_len=54 (unbuffered)
>>>>>  
>>>>> sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>>  sctp_csum:bc0e5463
>>>>> -NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=50 
>>>>> ct_state=est|rpl|trk|dnat,ct_zone=1,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=132,ct_tp_src=54969,ct_tp_dst=12345,ip,in_port=2
>>>>>  (via action) data_len=50 (unbuffered)
>>>>> -sctp,vlan_tci=0x0000,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>>  sctp_csum:d6ce6b9e
>>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=50 in_port=1 (via action) 
>>>>> data_len=50 (unbuffered)
>>>>> -sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>>  sctp_csum:add7db93
>>>>> +sctp,vlan_tci=0x0000,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=54969,tp_dst=12345
>>>>>  sctp_csum:5db68ce
>>>>>
>>>>>
>>>>> Do you know what can be a problem here?
>>>>>
>>>>> Test is passing on Fedora 38 with 6.3 kernel and on rhel 9.2.
>>>>>
>>>>
>>>> Hi Ilya,
>>>>
>>>> Uhm, it seems there's a problem with the shutdown sequence.
>>>> I just ran the on a VM:
>>>>
>>>> vagrant@ubuntu2210:~/ovs$ grep CONFIG_NF_CT_PROTO_SCTP 
>>>> /boot/config-5.19.0-38-generic 
>>>> CONFIG_NF_CT_PROTO_SCTP=y
>>>>
>>>> vagrant@ubuntu2210:~/ovs$ grep VERSION /etc/os-release 
>>>> VERSION_ID="22.10"
>>>> VERSION="22.10 (Kinetic Kudu)"
>>>> VERSION_CODENAME=kinetic
>>>>
>>>> vagrant@ubuntu2210:~/ovs$ uname -r
>>>> 5.19.0-38-generic
>>>
>>> The only difference with my VM is that I have -43-generic kernel.
>>>
>>>>
>>>> but I can't see the failure.
>>>> Any chance to see if they are marked for some reason as invalid?
>>>
>>> I dumped conntrack after every packet and here is what I see:
>>>
>>> On RHEL9, where test is working:
>>>
>>> 1. sctp     132 9 CLOSED            src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 
>>> mark=0 zone=1 use=1
>>> 2. sctp     132 2 COOKIE_WAIT       src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 mark=0 
>>> zone=1 use=1
>>> 3. sctp     132 2 COOKIE_ECHOED     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 mark=0 
>>> zone=1 use=1
>>> 4. sctp     132 431999 ESTABLISHED  src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> mark=0 zone=1 use=1
>>> 5. sctp     132 431999 ESTABLISHED  src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> mark=0 zone=1 use=1
>>> 6. sctp     132 431999 ESTABLISHED  src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> mark=0 zone=1 use=1
>>> 7. sctp     132 0 SHUTDOWN_SENT     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> mark=0 zone=1 use=1
>>> 8. sctp     132 2 SHUTDOWN_ACK_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> mark=0 zone=1 use=1
>>> 9. sctp     132 9 CLOSED            src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> mark=0 zone=1 use=1
>>
>> Here if I monitor conntrack during the test, I get:
>>
>> # conntrack -E --proto=sctp
>>     [NEW] sctp     132 10 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 
>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 
>> zone=1
>> [DESTROY] sctp     132 9 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 
>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>> zone=1 [USERSPACE] portid=3715
>>
>>
>>>
>>> On Ubuntu, where it doesn't work:
>>>
>>> 1. sctp     132 9 CLOSED            src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 
>>> zone=1 use=1
>>> 2. sctp     132 2 COOKIE_WAIT       src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 zone=1 use=1
>>> 3. sctp     132 2 COOKIE_ECHOED     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 zone=1 use=1
>>> 4. sctp     132 209 ESTABLISHED     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> zone=1 use=1
>>> 5. sctp     132 209 ESTABLISHED     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> zone=1 use=1
>>> 6. sctp     132 209 ESTABLISHED     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> zone=1 use=1
>>> 7. sctp     132 0 SHUTDOWN_SENT     src=10.1.1.1 dst=10.1.1.2 sport=54969 
>>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>>> zone=1 use=1
>>> 8. NO ENTRY!
>>> 9. NO ENTRY!
>>
>> But here I have:
>>
>> # conntrack -E --proto=sctp
>>     [NEW] sctp     132 10 CLOSED src=10.1.1.1 dst=10.1.1.2 sport=54969 
>> dport=12345 [UNREPLIED] src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 
>> zone=1
>> [DESTROY] sctp     132 SHUTDOWN_SENT src=10.1.1.1 dst=10.1.1.2 sport=54969 
>> dport=12345 src=10.1.1.2 dst=10.1.1.240 sport=12345 dport=34567 [ASSURED] 
>> zone=1
>>
>> So, the connection indeed is getting destroyed while in SHUTDOWN_SENT state.
>> Sounds like a kernel bug in Ubuntu...
>>
> 
> Thanks Ilya for digging more into it.
> It seemed so to me as well, but looking at the logs you provided, the
> timeout for SHUTDOWN_SENT is 0 (third column of the dump), so it seems
> it's a matter of speed in the reply.
> 
> ❯ cat /proc/sys/net/netfilter/nf_conntrack_sctp_timeout_shutdown_sent
> 0
> 
> I'm a bit surprised by this.
> 
> Just to confirm that, I upgraded the kernel of my vm to
> 5.19.0-43-generic and the test succeeded. Sleeping for 1 second in
> SHUTDOWN_SENT before sending the SHUTDOWN_ACK_SENT make the test fail. I
> would expect the same on RHEL 9 and Fedora.

Hmm, good point.
The difference between my tests on Ubuntu and RHEL is that I tested
with -O1 and sanitizers on Ubuntu, so it was a tiny bit slower.
I just tried to run with sanitizers on RHEL and I'm getting the same
failure as I have in Ubuntu.

So, the test seems to be extremely time-sensitive.  Is there a way
to make it more stable?

> 
> Paolo
> 
>>>
>>> So, after sending SHUTDOWN_ACK, there is no conntrack entry in the kernel 
>>> anymore.
>>>
>>>
>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to