The last ACK from the 3-WAY handshake in this test goes to userspace (miss upcall) after going through conntrack because +est traffic is handled differently from +new in this set of OpenFlow rules.
The sender though proceeds with sending the data that may end up re-ordered with the aforementioned ACK. Since connection is very short, it is possible that this one ACK will be delivered even after the connection is already closed, i.e., after all the data is sent and the connection termination sequence (FIN-ACK-FIN-ACK) is done. Delivery in this case triggers an RST reply. And RST transitions TIME_WAIT into CLOSE (CLOSING in OVS terms), causing the test failure. It's not really possible to fully avoid the packet re-ordering as it is part of the upcall mechanism. But there is a couple ways to avoid it in this particular case, e.g., if we predict how the +est packet will look like and install the datapath flow for +est while processing the original +new, or by storing the upcall PID in the kernel for the whole time of action execution and not only for one packet we're executing actions for (TCP handshake replies are happening in the context of the initiator, which is OVS handler thread in our case). But these are large changes that will not help with test failures on older branches / older kernels. For now, fixing the test instead. The intention in the test was to check that the state is terminal at the end of the connection, i.e., that our actions do not leave established or incomplete conntrack entries. So it should be fine to check for both TIME_WAIT and CLOSING, as both of these are terminal states. Fixes: 8d48d5f39436 ("system-traffic: Add conntrack floating IP test") Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> --- tests/system-traffic.at | 43 +++++++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 12 deletions(-) diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 0f3acef3f..b12e5633c 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -9110,20 +9110,39 @@ table=21 ip,nw_dst=10.1.1.2 action=set_field:f0:00:00:01:01:02->eth_ AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) -dnl non-FIP case +dnl non-FIP case. +dnl +dnl The last ACK from the 3-WAY handshake will go via upcall after going +dnl through conntrack because +est traffic is handled differently from +new. +dnl The sender though will proceed sending data. Since connection is very +dnl short, it is possible that this one ACK will be delivered after the +dnl connection is already closed, i.e. after all the data is sent and the +dnl connection termination sequence (FIN-ACK-FIN-ACK) is done. Delivery +dnl in this case will trigger RST reply. And RST will transition TIME_WAIT +dnl into CLOSE, hence the need to look for both states below. NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.1.1.1 1234]) -OVS_WAIT_UNTIL([[ovs-appctl dpctl/dump-conntrack | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' | -grep "tcp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=TIME_WAIT)" -]]) - -dnl Check that the full session ends as expected (i.e. TIME_WAIT). Otherwise it -dnl means the datapath didn't process the ct_clear action. Ending in SYN_RECV -dnl (OVS maps to ESTABLISHED) means the initial frame was committed, but not a -dnl second time after the FIP translation (because ct_clear didn't occur). +OVS_WAIT_UNTIL_EQUAL([ovs-appctl dpctl/dump-conntrack \ + | grep -E '(TIME_WAIT|CLOSING)' | FORMAT_CT(10.1.1.2)], +[tcp,dnl +orig=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),dnl +reply=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),dnl +protoinfo=(state=<cleared>)]) + +dnl Check that the full session ends as expected (i.e. TIME_WAIT/CLOSING). +dnl Otherwise it means the datapath didn't process the ct_clear action. +dnl Ending in SYN_RECV (OVS maps to ESTABLISHED) means the initial frame +dnl was committed, but not a second time after the FIP translation (because +dnl ct_clear didn't occur). +dnl +dnl Same considerations about packet reordering apply, hence looking for +dnl both states. NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.254.254.1 1234]) -OVS_WAIT_UNTIL([[ovs-appctl dpctl/dump-conntrack | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' | -grep "tcp,orig=(src=10.254.254.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.1,dst=10.254.254.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=TIME_WAIT)" -]]) +OVS_WAIT_UNTIL_EQUAL([ovs-appctl dpctl/dump-conntrack \ + | grep -E '(TIME_WAIT|CLOSING)' | FORMAT_CT(10.254.254.2)], +[tcp,dnl +orig=(src=10.254.254.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),dnl +reply=(src=10.1.1.1,dst=10.254.254.2,sport=<cleared>,dport=<cleared>),dnl +protoinfo=(state=<cleared>)]) OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP -- 2.50.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev