Hi Andreas (comment #34) and SRU team,
I'd like to share my views on this SRU after some research on the topic.
TL;DR: I think this is the 'least bad' way forward, and agree with it,
provided enough testing with WSGI and non-WSGI services on -proposed.
1) First and foremost, I initially thought 'this doesn't fix the issue;
the threading problem must be fixed, instead'.
However, there is more context on this, which isn't directly reflected
in the bug, and it is complicated. (I followed some links in the bug.)
The threading topic/model in Openstack is aware of such limitations
and this and other problems, due to different approaches to threading,
and how they interoperate (or not), and from different contexts (e.g.,
served via WSGI in HTTP servers, or directly running Python), etc.
Apparently, this will take 2-4 openstack releases to be worked out
(moving to asyncio), and requires much cross project coordination.
The source for this is issue [1], particularly comment [2] and later.
Earlier comments have details of related technical issues/options.
Thefore, properly "fixing" this doesn't seem to be easy nor short-term.
2) The proposed revert (i.e., 'workaround'), has actually been
merged in openstack devel branch _and_ stable branch 'yoga' [3,4],
which is the openstack release shipped in Jammy.
It happened 8 months ago (~2024-01), but python-oslo.messaging in
jammy-updates is older (2022-11-15).
So, there is upstream acknowledgement/"support" for the revert.
3) The proposed revert actually makes Jammy consistent with the
other Ubuntu LTSes, as it's the only one with different value.
The prior/later LTS releases have the same value value (False),
and the later interim too (prior interim had True), as in [6].
This means Jammy is the inconsistent across release upgrades,
and the revert would make it consistent.
4) The behavior change associated with the revert does exist,
but the trade-off seems worth it, considering the stability
impact to affected openstack components (nova-compute manages
virtual machines; cinder-backup is also listed in [5]),
and _potential_ performance impact (unknown?) to non-affected.
Current:
- WSGI services: WORK
- non-WSGI services: FAIL
Proposed:
- WSGI services: WORK, possibly slower (to be determined)
- non-WSGI services: WORK
It's not nice that, if there are non-affected users that
possibly get a performance impact, they have to figure it
out and change the (new) default to the previous (True).
*However*, it seems _worse_ that, by default, users may
be affected by a functional / stability impact.
5) It's worth mentioning that, despite the term 'crash' being
mentioned, apparently the nova-compute service keeps running
(see `systemctl` output in comment #23), which does not cause
systemd to restart the service, so this leaves nova-compute
non-operational, apparently:
:~$ sudo systemctl status nova-compute.service
...
Active: active (running) since Mon 2023-11-06 16:56:26 PST; 4 days
ago
...
Nov 11 06:18:24 os-vm-6 nova-compute[1621]: greenlet.error: cannot
switch to a different thread
6) The development of a dynamic detection method for this
might take a non-short amount of time, and would be changing
approaches to something that deviates from how upstream (and
previuous/later Ubuntu releases) have done it.
It might be a welcome improvement, but maybe not as the way
to address this problem right now.
Thanks!
[1] https://github.com/eventlet/eventlet/issues/432
[2] https://github.com/eventlet/eventlet/issues/432#issuecomment-2025104893
[3] https://review.opendev.org/q/topic:%22disable-green-threads%22
[4]
https://opendev.org/openstack/oslo.messaging/commit/44b3427bc9efea9f341edfb8ea7aea38f25d1a5a
[5] https://github.com/eventlet/eventlet/issues/432#issuecomment-1983373808
[6]
$ grep -A1 "cfg.BoolOpt('heartbeat_in_pthread'"
*/python-oslo.messaging-*/oslo_messaging/_drivers/impl_rabbit.py
focal/python-oslo.messaging-12.1.6/oslo_messaging/_drivers/impl_rabbit.py:
cfg.BoolOpt('heartbeat_in_pthread',
focal/python-oslo.messaging-12.1.6/oslo_messaging/_drivers/impl_rabbit.py-
default=False,
--
impish/python-oslo.messaging-12.9.1/oslo_messaging/_drivers/impl_rabbit.py:
cfg.BoolOpt('heartbeat_in_pthread',
impish/python-oslo.messaging-12.9.1/oslo_messaging/_drivers/impl_rabbit.py-
default=True,
--
jammy/python-oslo.messaging-12.13.0/oslo_messaging/_drivers/impl_rabbit.py:
cfg.BoolOpt('heartbeat_in_pthread',
jammy/python-oslo.messaging-12.13.0/oslo_messaging/_drivers/impl_rabbit.py-
default=True,
--
kinetic/python-oslo.messaging-14.0.0/oslo_messaging/_drivers/impl_rabbit.py:
cfg.BoolOpt('heartbeat_in_pthread',
kinetic/python-oslo.messaging-14.0.0/oslo_messaging/_drivers/impl_rabbit.py-
default=False,
--
noble/python-oslo.messaging-14.6.0/oslo_messaging/_drivers/impl_rabbit.py:
cfg.BoolOpt('heartbeat_in_pthread',
noble/python-oslo.messaging-14.6.0/oslo_messaging/_drivers/impl_rabbit.py-
default=False,
$ head -n1 */python-oslo.messaging-*/debian/changelog
==> focal/python-oslo.messaging-12.1.6/debian/changelog <==
python-oslo.messaging (12.1.6-0ubuntu1) focal; urgency=medium
==> impish/python-oslo.messaging-12.9.1/debian/changelog <==
python-oslo.messaging (12.9.1-0ubuntu4) impish; urgency=medium
==> jammy/python-oslo.messaging-12.13.0/debian/changelog <==
python-oslo.messaging (12.13.0-0ubuntu1.1) jammy; urgency=medium
==> kinetic/python-oslo.messaging-14.0.0/debian/changelog <==
python-oslo.messaging (14.0.0-0ubuntu1.1) kinetic; urgency=medium
==> noble/python-oslo.messaging-14.6.0/debian/changelog <==
python-oslo.messaging (14.6.0-0ubuntu1) noble; urgency=medium
** Bug watch added: github.com/eventlet/eventlet/issues #432
https://github.com/eventlet/eventlet/issues/432
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1934937
Title:
[SRU] Heartbeat in pthreads in nova-wallaby crashes with greenlet
error
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1934937/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs