= hirsute verification =
# Start by showing we can still reproduce the problem w/o the -proposed
packages:
ubuntu@avoton02:~$ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string
--string maas.io --algo bm -j DROP
ubuntu@avoton02:~$ python3 ./repro.py & sleep 60
[1] 3386
# 60 seconds have passed, still hung:
ubuntu@avoton02:~$ sudo strace -p 3386
strace: Process 3386 attached
read(3, ^Cstrace: Process 3386 detached
<detached ...>
ubuntu@avoton02:~$ fg
python3 ./repro.py
^CTraceback (most recent call last):
File "/home/ubuntu/./repro.py", line 6, in <module>
r = RequestsUrlReader(url)
File "/usr/lib/python3/dist-packages/simplestreams/contentsource.py", line
381, in __init__
self.req = requests.get(url, stream=True, auth=auth, headers=headers)
File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in
request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in
urlopen
httplib_response = self._make_request(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in
_make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012,
in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in
connect
self.sock = ssl_wrap_socket(
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 428, in
ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 472, in
_ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
KeyboardInterrupt
# Now upgrade and demonstrate the problem is fixed
ubuntu@avoton02:~$ sudo apt install python3-simplestreams simplestreams -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be upgraded:
python3-simplestreams simplestreams
2 upgraded, 0 newly installed, 0 to remove and 68 not upgraded.
Need to get 31.8 kB/37.8 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu hirsute-proposed/main amd64
python3-simplestreams all 0.1.0-30-g3cc8988a-0ubuntu1.21.04.1 [31.8 kB]
Fetched 31.8 kB in 0s (119 kB/s)
(Reading database ... 79414 files and directories currently installed.)
Preparing to unpack
.../python3-simplestreams_0.1.0-30-g3cc8988a-0ubuntu1.21.04.1_all.deb ...
Unpacking python3-simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) over
(0.1.0-30-g3cc8988a-0ubuntu1) ...
Preparing to unpack
.../simplestreams_0.1.0-30-g3cc8988a-0ubuntu1.21.04.1_all.deb ...
Unpacking simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) over
(0.1.0-30-g3cc8988a-0ubuntu1) ...
Setting up python3-simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) ...
Setting up simplestreams (0.1.0-30-g3cc8988a-0ubuntu1.21.04.1) ...
Scanning processes...
Scanning processor microcode...
Scanning linux images...
Running kernel seems to be up-to-date.
The processor microcode seems to be up-to-date.
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated binaries.
ubuntu@avoton02:~$ python3 ./repro.py & sleep 60
[1] 3605
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 382, in
_make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1012,
in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in
connect
self.sock = ssl_wrap_socket(
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 428, in
ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 472, in
_ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1106: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in
urlopen
retries = retries.increment(
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 531, in
increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in
urlopen
httplib_response = self._make_request(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 385, in
_make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 336, in
_raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='images.maas.io',
port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/./repro.py", line 6, in <module>
r = RequestsUrlReader(url)
File "/usr/lib/python3/dist-packages/simplestreams/contentsource.py", line
382, in __init__
self.req = requests.get(
File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in
request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='images.maas.io',
port=443): Read timed out. (read timeout=10)
[1]+ Exit 1 python3 ./repro.py
** Description changed:
[Impact]
The bug is about simplestreams possibly getting stuck waiting forever
for an an HTTP response that never comes, e.g. because of networking
issues. This can potentially affect any package depending on
simplestreams, but specifically it was reported affecting MAAS, where it
causes server deployments to timeout.
[Test Plan]
+ Install an iptables rule to block SSL handshaking w/ the MAAS simplestreams
repo:
- Ideally this should be tested by building a MAAS snap with the
- simplestreams package including the fix, verifying that is works as
- expected.
+ -------------------------
+ $ sudo iptables -A INPUT -p tcp -s 91.189.88.136 -m string --string maas.io
--algo bm -j DROP
+ -------------------------
+
+ Run the reproducer described below, and verify that it hangs
+ indefinitely (I recommend waiting 60s):
+
+ -------------------------
+ $ cat repro.py
+ #!/usr/bin/env python3
+
+ from simplestreams.contentsource import RequestsUrlReader
+
+ url = "https://images.maas.io/ephemeral-v3/stable/streams/v1/index.sjson"
+ r = RequestsUrlReader(url)
+ -------------------------
+
+ With the fix applied, verify that it does timeout in ~10s.
[Regression Potential]
- Very little. Scenarios where it takes more than 10s for a remote server
- to provide simplestreams with the data it requested are unlikely, but
- can't be fully excluded.
+ Scenarios where it takes more than 10s to initiate a connection are
+ unlikely, but possible. Code that does not properly handle a timeout
+ exception in these situations may begin to fail.
[Original Description]
= How to determine you are seeing this problem =
Does your MAAS server seem to get "hung up", where deployments suddenly start
failing w/ lots of connection timeouts to the MAAS server?
Get a list of pids of your regiond processes:
$ ps -ef | grep regiond
Run strace on each one to see if one is stuck in a connect() or recv() call:
$ sudo strace -p $pid
recv(...
(normally you should see a lot of epoll_ctl() calls go by if not hung)
If one is hung, use lsof to see what it is connected to:
sudo lsof -i -a -p $pid
If you see an open connection to your images server, then this maybe
your problem. sudo kill -9 of the hung pid will cause it to respawn and
recover.
** Tags removed: verification-needed-hirsute
** Tags added: verification-done-hirsute
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1908452
Title:
MAAS stops working and deployment fails after `Loading ephemeral` step
To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1908452/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs