New submission from ghost43 <somber.ni...@protonmail.com>:
This is about a potential DOS vector that can get an asyncio server serving SSL connections to enter a busy loop and hang. To recover the server (python process) needs to be restarted. See downstream report at https://github.com/spesmilo/electrumx/issues/92 ElectrumX is server software that serves JSON-RPC requests either directly over plaintext TCP or over SSL/TLS (no HTTP involved). The server code is written using asyncio. The most popular client is also written using python+asyncio. However, there are other server implementations (e.g. in Rust), and other clients (e.g. in javascript, scala, java). The servers are part of a federated network and peer with each other. In the last 1+ year, server operators have started reporting that the python process (of ElectrumX) sporadically hangs: it uses 100% CPU and stops serving clients or accepting new connections. The only way we could find to recover is restarting the server (the python process); waiting for even days did not help, python was stuck. The hang seemed to mostly happen on busy servers that serve several thousands of concurrent sessions, and even on those only happened e.g. once a month. So the hang was difficult to reproduce. Nevertheless we observed that the hang only happens if it is the asyncio server that handles SSL, i.e. if the server operator put nginx in front of the python process and handles SSL termination in nginx, they would be unaffected. One server operator whose server at one point hanged multiple times a day confirmed this, and reported that nginx subsequently started logging lines like this: ``` 2021/01/11 02:28:30 [crit] 21#21: *437620 SSL_shutdown() failed (SSL: error:14094123:SSL routines:ssl3_read_bytes:application data after close notify) while proxying connection, client: REDACTED, server: 0.0.0.0:50002, upstream: "127.0.0.1:50213", bytes from/to client:81/205, bytes from/to upstream:205/0 ``` Over these last months, the hang has been reproduced on many different python versions by different people, e.g. 3.7.1, 3.7.5, 3.8.5, 3.9.1, 3.9.4. A few days ago, many servers hanged almost simultaneously, and when restarted, they would soon hang again at most a few hours later. Presumably someone either flooded the network with buggy clients, or it might have been an attack. Anyway, this allowed me to look into this better. I set up gdb and waited. This was on ubuntu 20.04 with python 3.8.5 and openssl=1.1.1f-1ubuntu2.3. It seems the event loop thread is stuck in a busy loop. The deepest common stack frame looks to be at https://github.com/python/cpython/blob/v3.8.5/Lib/asyncio/sslproto.py#L675 ``` (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.8/asyncio/sslproto.py", line 535, in feed_appdata File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once --Type <RET> for more, q to quit, c to continue without paging-- File "/usr/lib/python3.8/asyncio/base_events.py", line 826, in run_forever None, getaddr_func, host, port, family, type, proto, flags) File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete self.run_forever() File "/usr/lib/python3.8/asyncio/runners.py", line 299, in run File "/usr/local/bin/electrumx_server", line 35, in main asyncio.run(controller.run()) File "/usr/local/bin/electrumx_server", line 43, in <module> main() ``` ``` (gdb) py-list 530 except (SystemExit, KeyboardInterrupt): 531 raise 532 except BaseException as e: 533 self._fatal_error(e, 'SSL error in data received') 534 return >535 536 for chunk in ssldata: 537 self._transport.write(chunk) 538 539 for chunk in appdata: 540 if chunk: ``` ``` (gdb) py-locals self = <_SSLPipe(_context=<SSLContext at remote 0x7f0a30dc00c0>, _server_side=True, _server_hostname=None, _state='WRAPPED', _incoming=<_ssl.MemoryBIO at remote 0x7f09cc0925b0>, _outgoing=<_ssl.MemoryBIO at remote 0x7f09b919c430>, _sslobj=<SSLObject(_sslobj=<_ssl._SSLSocket at remote 0x7f09cc8c1630>) at remote 0x7f09db428910>, _need_ssldata=False, _handshake_cb=<method at remote 0x7f09c0177a00>, _shutdown_cb=None) at remote 0x7f09db43f160> data = b'{"jsonrpc": "2.0", "result": null, "id": 18}\n' offset = 0 ssldata = [b'\x17\x03\x03\x00\x13<\x8fh\x93\xd3\xd7M\xb9\xbd\xb6\xad\x08\x1bZ\x857\x0c\xe4\xac'] view = <memoryview at remote 0x7f09d295be80> exc_errno = 5 ``` ``` (gdb) cont Continuing. ^C Thread 1 "electrumx_serve" received signal SIGINT, Interrupt. memory_getbuf (self=0x7f09b69a3880, view=0x7ffcb6829200, flags=0) at ../Objects/memoryobject.c:1441 1441 ../Objects/memoryobject.c: No such file or directory. (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.8/ssl.py", line 897, in write return self._sslobj.write(data) File "/usr/lib/python3.8/asyncio/sslproto.py", line 262, in feed_appdata offset += self._sslobj.write(view[offset:]) File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run --Type <RET> for more, q to quit, c to continue without paging-- self._context.run(self._callback, *self._args) File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once File "/usr/lib/python3.8/asyncio/base_events.py", line 826, in run_forever None, getaddr_func, host, port, family, type, proto, flags) File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete self.run_forever() File "/usr/lib/python3.8/asyncio/runners.py", line 299, in run File "/usr/local/bin/electrumx_server", line 35, in main asyncio.run(controller.run()) File "/usr/local/bin/electrumx_server", line 43, in <module> main() (gdb) (gdb) cont Continuing. ^C (gdb) py-bt Traceback (most recent call first): <built-in method len of module object at remote 0x7f0a34f8e0e0> File "/usr/lib/python3.8/asyncio/sslproto.py", line 535, in feed_appdata File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, *self._args) --Type <RET> for more, q to quit, c to continue without paging-- File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once File "/usr/lib/python3.8/asyncio/base_events.py", line 826, in run_forever None, getaddr_func, host, port, family, type, proto, flags) File "/usr/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete self.run_forever() File "/usr/lib/python3.8/asyncio/runners.py", line 299, in run File "/usr/local/bin/electrumx_server", line 35, in main asyncio.run(controller.run()) File "/usr/local/bin/electrumx_server", line 43, in <module> main() main() (gdb) cont Continuing. ^C Thread 1 "electrumx_serve" received signal SIGINT, Interrupt. PySlice_New (start=0, stop=None, step=0x0) at ../Objects/sliceobject.c:123 123 ../Objects/sliceobject.c: No such file or directory. (gdb) Continuing. ^C Thread 1 "electrumx_serve" received signal SIGINT, Interrupt. 0x00000000005d4390 in long_richcompare ( self=<SSLErrorNumber(_value_=5, _name_='SSL_ERROR_SYSCALL', __objclass__=<EnumMeta(_generate_next_value_=<function at remote 0x7f0a34e119d0>, __doc__='An enumeration.', __module__='ssl', _member_names_=['SSL_ERROR_SSL', 'SSL_ERROR_WANT_READ', 'SSL_ERROR_WANT_WRITE', 'SSL_ERROR_WANT_X509_LOOKUP', 'SSL_ERROR_SYSCALL', 'SSL_ERROR_ZERO_RETURN', 'SSL_ERROR_WANT_CONNECT', 'SSL_ERROR_EOF', 'SSL_ERROR_INVALID_ERROR_CODE'], _member_map_={'SSL_ERROR_SSL': <SSLErrorNumber(_value_=1, _name_='SSL_ERROR_SSL', __objclass__=<...>) at remote 0x7f0a3487fe00>, 'SSL_ERROR_WANT_READ': <SSLErrorNumber(_value_=2, _name_='SSL_ERROR_WANT_READ', __objclass__=<...>) at remote 0x7f0a3487fe80>, 'SSL_ERROR_WANT_WRITE': <SSLErrorNumber(_value_=3, _name_='SSL_ERROR_WANT_WRITE', __objclass__=<...>) at remote 0x7f0a3487ff00>, 'SSL_ERROR_WANT_X509_LOOKUP': <SSLErrorNumber(_value_=4, _name_='SSL_ERROR_WANT_X509_LOOKUP', __objclass__=<...>) at remote 0x7f0a3487ff80>, 'SSL_ERROR_SYSCALL': <...>, 'SSL_ERROR_ZERO_RETUR N': <SSLErrorNumber(_value_=6, _nam...(truncated), other=5, op=2) at ../Objects/longobject.c:3083 3083 ../Objects/longobject.c: No such file or directory. (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.8/asyncio/sslproto.py", line 526, in feed_appdata return File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once --Type <RET> for more, q to quit, c to continue without paging-- (gdb) (gdb) cont Continuing. ^C Thread 1 "electrumx_serve" received signal SIGINT, Interrupt. 0x00000000004f7fa0 in builtin_len (module=<optimized out>, obj=<optimized out>) at ../Objects/longobject.c:61 61 ../Objects/longobject.c: No such file or directory. (gdb) py-bt Traceback (most recent call first): <built-in method len of module object at remote 0x7f0a34f8e0e0> File "/usr/lib/python3.8/asyncio/sslproto.py", line 535, in feed_appdata File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, *self._args) --Type <RET> for more, q to quit, c to continue without paging-- (gdb) cont Continuing. ^C Thread 1 "electrumx_serve" received signal SIGINT, Interrupt. PyType_IsSubtype (b=0x907020 <_PyNone_Type>, a=0x90a860 <PyUnicode_Type>) at ../Objects/typeobject.c:1370 1370 ../Objects/typeobject.c: No such file or directory. (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.8/asyncio/sslproto.py", line 268, in feed_appdata if exc.reason == 'PROTOCOL_IS_SHUTDOWN': File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.8/asyncio/base_events.py", line 2627, in _run_once --Type <RET> for more, q to quit, c to continue without paging-- (gdb) cont Continuing. ^C Thread 1 "electrumx_serve" received signal SIGINT, Interrupt. __GI___errno_location () at errno-loc.c:25 25 errno-loc.c: No such file or directory. (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.8/ssl.py", line 897, in write return self._sslobj.write(data) File "/usr/lib/python3.8/asyncio/sslproto.py", line 262, in feed_appdata offset += self._sslobj.write(view[offset:]) File "/usr/lib/python3.8/asyncio/sslproto.py", line 675, in _process_write_backlog ssldata, offset = self._sslpipe.feed_appdata(data, offset) File "/usr/lib/python3.8/asyncio/sslproto.py", line 599, in _write_appdata self._process_write_backlog() File "/usr/lib/python3.8/asyncio/sslproto.py", line 387, in write self._ssl_protocol._write_appdata(data) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/rawsocket.py", line 118, in write self._asyncio_transport.write(framed_message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 153, in _send_message await self.transport.write(message) File "/usr/local/lib/python3.8/dist-packages/aiorpcx/session.py", line 496, in _throttled_request await self._send_message(message) <built-in method run of Context object at remote 0x7f09e30eb6c0> File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run ``` Note that around line https://github.com/python/cpython/blob/v3.8.5/Lib/asyncio/sslproto.py#L262, exc_errno = 5 corresponds to `ssl.SSL_ERROR_SYSCALL`. Considering the gdb backtraces above, it seems to me the execution is stuck in that while loop, always getting `ssl.SSL_ERROR_SYSCALL` and suppressing it. Note that this code is not async, so the whole event loop is blocked. I have patched my python interpreter to not suppress that exception but instead re-raise it, and recompiled; and the hangs have gone away; I can no longer reproduce them. I used the following patch on top of 3.9.4 (but note the code in this file has not changed for years): ``` Lib/asyncio/sslproto.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Lib/asyncio/sslproto.py b/Lib/asyncio/sslproto.py index cad25b2653..f67898a777 100644 --- a/Lib/asyncio/sslproto.py +++ b/Lib/asyncio/sslproto.py @@ -267,8 +267,7 @@ class _SSLPipe(object): if exc.reason == 'PROTOCOL_IS_SHUTDOWN': exc_errno = exc.errno = ssl.SSL_ERROR_WANT_READ if exc_errno not in (ssl.SSL_ERROR_WANT_READ, - ssl.SSL_ERROR_WANT_WRITE, - ssl.SSL_ERROR_SYSCALL): + ssl.SSL_ERROR_WANT_WRITE): raise self._need_ssldata = (exc_errno == ssl.SSL_ERROR_WANT_READ) ``` Frankly I don't understand why the `ssl.SSL_ERROR_SYSCALL` error is special cased in the try-except, hence this might break something else. The git blame or the python docs were of no help either (to understand that exception). However with this patch my server seems to be working correctly and is no longer getting stuck. Unfortunately I don't have code that triggers the hang. ---------- assignee: christian.heimes components: SSL, asyncio messages: 392923 nosy: asvetlov, christian.heimes, ghost43, yselivanov priority: normal severity: normal status: open title: asyncio SSL server can be DOSed, event loop gets blocked: busy loops and uses 100% CPU type: security versions: Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue44036> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com