On 4/26/23 10:13, Stefan Hoffmann wrote:
> On Tue, 2023-04-25 at 18:13 +0200, Ilya Maximets wrote:
>> On 4/25/23 15:08, Stefan Hoffmann wrote:
>>> On Mon, 2023-04-24 at 16:42 -0500, Terry Wilson wrote:
>>>>
>>>> On Fri, Apr 21, 2023 at 3:29 AM Stefan Hoffmann
>>>> <[email protected]> wrote:
>>>>> In some cases ovsdb server or relay gets restarted, ovsdb python
>>>>> clients
>>>>> may keep the local socket open. Instead of reconnecting a lot of
>>>>> failures
>>>>> will be logged.
>>>>> This can be reproduced with ssl connections to the server/relay and
>>>>> restarting it, so it has the same IP after restart.
>>>>>
>>>>> This patch catches the Exceptions at do_handshake to recreate the
>>>>> connection on the client side.
>>>>>
>>>>
>>>>
>>>> Is this something we can add some tests for? I know we have some
>>>> other tests that involve restarts, but I don't think they involve
>>>> python clients or SSL.
>>>
>>> I can try to find a proper place to add such an test. Can you point me
>>> to the correct place to add python clients to that test?
>>> But I'm not sure, if we run always into that issue at a restart.
>>
>> Usually, such tests live somewhere around tests/ovsdb-idl.at.
>> There is a basic CHECK_STREAM_OPEN_BLOCK test there, but it
>> doesn't check SSL connections, maybe it's worth it to extend
>> the test to be able to connect via SSL.  This will require
>> changing the tests/test-stream.py test script.
>>
>> On the subject, I'm not sure if the issue can be reliably
>> reproduced, especially because it requires OpenSSL 3.0, AFAICT.
>>
>> I have one question for the code below.
>>
>>>>  
>>>>> Reviewed-by: Simon Horman <[email protected]>
>>>>> Signed-off-by: Stefan Hoffmann <[email protected]>
>>>>> Signed-off-by: Luca Czesla <[email protected]>
>>>>> Signed-off-by: Max Lamprecht <[email protected]>
>>>>> Co-authored-by: Luca Czesla <[email protected]>
>>>>> Co-authored-by: Max Lamprecht <[email protected]>
>>>>> ---
>>>>>  python/ovs/stream.py | 3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>>>>> index ac5b0fd0c..b32341076 100644
>>>>> --- a/python/ovs/stream.py
>>>>> +++ b/python/ovs/stream.py
>>>>> @@ -824,7 +824,8 @@ class SSLStream(Stream):
>>>>>              self.socket.do_handshake()
>>>>>          except ssl.SSLWantReadError:
>>>>>              return errno.EAGAIN
>>>>> -        except ssl.SSLSyscallError as e:
>>>>> +        except (ssl.SSLSyscallError, ssl.SSLZeroReturnError,
>>>>> +                ssl.SSLEOFError, OSError) as e:
>>
>> Do we actually need to catch the generic OSError here?
>> It looks a bit odd to catch non-ssl exception.
> 
> We got an "Transport endpoint is not connected" sometimes.
> It tries getpeername() at a socket that is not connected anymore or
> something like that. That results in the OSError.
> We also saw, that the file descriptor was still there but not
> connected.
> 
> Here the traceback (I hope you can still read that after line wrap):
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.9/site-
> packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run
>     self.idl.run()
>  File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-
> py3.9.egg/ovs/db/idl.py", line 433, in run
>     self._session.run()
>   File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-
> py3.9.egg/ovs/jsonrpc.py", line 519, in run
>     error = self.stream.connect()
>   File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-
> py3.9.egg/ovs/stream.py", line 824, in connect
>     self.socket.do_handshake()
>   File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py",
> line 312, in do_handshake
>     return self._call_trampolining(
>   File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py",
> line 158, in _call_trampolining
>     return func(*a, **kw)
>   File "/usr/local/lib/python3.9/ssl.py", line 1305, in do_handshake
>     self._check_connected()
>   File "/usr/local/lib/python3.9/ssl.py", line 1089, in
> _check_connected
>     self.getpeername()
> 
> OSError: [Errno 107] Transport endpoint is not connected

OK.  Yeah, looking through the ssl.py implementation, it can indeed
throw a generic OSError.  That's unfortunate, but OK, I guess.

Thanks for clarification!

>>
>> Best regards, Ilya Maximets.
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to