On 11/16/23 12:49, Xavier Simonart wrote:
> Hi Dumitru
> 

Hi Xavier,

> Thanks for the patch. It clearly decreases the failure count.
> I am afraid however it does not fix all potential issues.
> My understanding is the following:
> The test goal was to ensure ovs does not disconnect the connections on
> inactivity probe.There are 4 connections with inactivity probe (3 in
> earlier OVN versions):
> - feature
> - pinctrl
> - statctrl
> - ofctrl
> For those connections, ovs will send ECHO REQUEST and expects
> ovn-controller to reply with ECHO REPLY, within 60 seconds.
> The patch you sent ensures ofctrl runs between the warp, so ensures
> that ovn-controller replied with ECHO REPLY to the ECHO_REQUEST for
> ofctrl.
> However, ECHO REQUEST for pinctrl connection is handled by the pinctrl
> thread, and nothing guarantees that pinctrl had time to run between
> two warps, so that it handled the ECHO REQUEST within those
> (simulated) 60 seconds.
> Similar issue happens with statctrl.

I had missed those cases..

> Then, another issue can occur with the "feature" connection:
> ovn-controller handles the ECHO_REQUEST for that connection in the
> main ovn-controller loop, but only if there is an ovs idl transaction
> available ("ovs db is not read-only").
> So, if ovsdb is very slow (and still handling the ovn-installed-ts
> insertion for instance), ovn-controller will not get any ovs idl
> transaction available, and won't handle the ECHO_REQUEST for the
> "feature" connection, or at least not in time.
> I am not sure how to fix those issues without making the test overly complex.
> 

+1

Also, the test itself is quite artificial, I suggest we remove it all
together.  If you agree with that, I can post v2.

Thanks,
Dumitru

> Thanks
> Xavier
> 
> On Thu, Nov 16, 2023 at 10:39 AM Ales Musil <[email protected]> wrote:
>>
>> On Wed, Nov 15, 2023 at 2:40 PM Dumitru Ceara <[email protected]> wrote:
>>
>>> The test incorrectly assumed that "ovn-nbctl --wait=hv sync" will always
>>> send an OpenFlow barrier to ovs-vswitchd.  That doesn't happen unless
>>> there are other OpenFlow (rule or group) changes that need to be
>>> programmed in the datapath.
>>>
>>> Fix the test by actually installing new rules in between warp
>>> invocations.
>>>
>>> Spotted in CI, mostly on oversubscribed systems.  A log sample that
>>> shows that ovn-controller didn't generate any barrier request for the
>>> nbctl sync request:
>>>
>>>   2023-11-15T12:12:22.937Z|00084|vconn|DBG|unix#4: received:
>>> OFPT_BARRIER_REQUEST (OF1.5) (xid=0x13):
>>>   2023-11-15T12:12:22.937Z|00085|vconn|DBG|unix#4: sent (Success):
>>> OFPT_BARRIER_REPLY (OF1.5) (xid=0x13):
>>>   ...
>>>   2023-11-15T12:12:23.032Z|00090|unixctl|DBG|received request
>>> time/warp["60000"], id=0
>>>   2023-11-15T12:12:23.032Z|00091|unixctl|DBG|replying with success, id=0:
>>> "warped"
>>>   2023-11-15T12:12:23.042Z|00094|vconn|DBG|unix#3: sent (Success):
>>> OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>>>   2023-11-15T12:12:23.042Z|00095|vconn|DBG|unix#4: sent (Success):
>>> OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>>>   2023-11-15T12:12:23.042Z|00097|vconn|DBG|unix#5: sent (Success):
>>> OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>>>   2023-11-15T12:12:23.042Z|00098|unixctl|DBG|received request
>>> time/warp["60000"], id=0
>>>   2023-11-15T12:12:23.042Z|00099|unixctl|DBG|replying with success, id=0:
>>> "warped"
>>>   2023-11-15T12:12:23.052Z|00100|rconn|ERR|br-int<->unix#3: no response to
>>> inactivity probe after 60 seconds, disconnecting
>>>   2023-11-15T12:12:23.052Z|00101|rconn|ERR|br-int<->unix#4: no response to
>>> inactivity probe after 60 seconds, disconnecting
>>>   2023-11-15T12:12:23.052Z|00102|rconn|ERR|br-int<->unix#5: no response to
>>> inactivity probe after 60 seconds, disconnecting
>>>   2023-11-15T12:12:23.052Z|00103|unixctl|DBG|received request
>>> time/warp["60000"], id=0
>>>
>>> Fixes: ff00142808dc ("controller: Fixed ovs/ovn(features) connection lost
>>> when running more than 120 seconds")
>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>> ---
>>>  tests/ovn.at | 15 +++++++++++++++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>> index b8c61f87fb..face39e5c2 100644
>>> --- a/tests/ovn.at
>>> +++ b/tests/ovn.at
>>> @@ -35352,20 +35352,35 @@ as hv1
>>>  check ovs-vsctl add-br br-phys
>>>  ovn_attach n1 br-phys 192.168.0.1
>>>
>>> +check ovn-nbctl ls-add ls
>>> +
>>>  dnl Ensure that there are 4 openflow connections.
>>>  OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version'
>>> hv1/ovs-vswitchd.log)" -eq "4"])
>>>
>>> +as hv1 check ovs-vsctl -- add-port br-int lsp0       \
>>> +    -- set Interface lsp0 external-ids:iface-id=lsp0 \
>>> +    -- add-port br-int lsp1                          \
>>> +    -- set Interface lsp1 external-ids:iface-id=lsp1 \
>>> +    -- add-port br-int lsp2                          \
>>> +    -- set Interface lsp2 external-ids:iface-id=lsp2
>>> +
>>>  dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
>>>  dnl openflow connections in the meantime.  This should allow ovs-vswitchd
>>>  dnl to probe the openflow connections at least twice.
>>>
>>>  as hv1 ovs-appctl time/warp 60000
>>> +check ovn-nbctl lsp-add ls lsp0
>>> +wait_for_ports_up
>>>  check ovn-nbctl --wait=hv sync
>>>
>>>  as hv1 ovs-appctl time/warp 60000
>>> +check ovn-nbctl lsp-add ls lsp1
>>> +wait_for_ports_up
>>>  check ovn-nbctl --wait=hv sync
>>>
>>>  as hv1 ovs-appctl time/warp 60000
>>> +check ovn-nbctl lsp-add ls lsp2
>>> +wait_for_ports_up
>>>  check ovn-nbctl --wait=hv sync
>>>
>>>  AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
>>> --
>>> 2.39.3
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [email protected]
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>>
>> Looks good to me, thanks.
>>
>> Acked-by: Ales Musil <[email protected]>
>>
>> --
>>
>> Ales Musil
>>
>> Senior Software Engineer - OVN Core
>>
>> Red Hat EMEA <https://www.redhat.com>
>>
>> [email protected]
>> <https://red.ht/sig>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to