@mruffell thanks! Only a few comments below: > Note because of this bug, groovy and upstream has now been changed to 10 min timeout, down from 1hr.
We should decide if this is really what we want to do. And if it should revert to the longer 1hr timeout, propose that upstream. I don't really know, is either default timeout better, 10 minutes or 1 hour? @nicolasbock did you have specific reasoning for the upstream reduction to 10 minutes? If 10 minutes is what we want, then we should be ok upstream and in Groovy, and in Focal with the current code in -proposed. > On Eoan: Assuming same behaviour as focal due to systemd service file. Untested. yeah, it FTBFS in Eoan unfortunately; there is bug 1843761, and also I detailed why it fails in the description for bug 1773324. As Eoan is almost EOL, my opinion is it's safer to simply leave it untouched there. > If this ExecStartPost script times out (which it does after 90 seconds it seems, even though documentation suggests infinite timeout) yep, systemd has DefaultTimeoutStartSec set to 90s (man systemd- system.conf for more details), so if TimeoutStartSec isn't specified for a service unit, it will default to 90 seconds (and I believe the timeout period includes the ExecStartPre, ExecStart, and ExecStartPost actions, but I'd have to specifically check the code to verify that). > we need to add a dependency to the package, socat well, this is usually a problem for SRU releases. Unfortunately, adding new deps for SRU releases causes 'sudo apt-get upgrade' to *not* upgrade any package that pulls in new (not currently installed) deps. While 'sudo apt upgrade' *does* pull in new deps, the ~ubuntu-sru team typically rejects adding new runtime deps to any SRU, without a very strong reason. Instead of pulling the entire service file back into Bionic, I think it might be enough to only add 'TimeoutStartSec=600', which should cover the timeout for the ExecStart= and ExecStartPost= actions. It may be also worth adding the Restart=on-failure and RestartSec=10 params. Could you test with the TimeoutStartSec param in bionic to see if that's enough to SRU? If pulling back only the TimeoutStartSec=600 param to Bionic works, that will hopefully be enough for Xenial, too. Thanks! -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1874075 Title: rabbitmq-server startup timeouts differ between SysV and systemd Status in rabbitmq-server package in Ubuntu: Fix Released Status in rabbitmq-server source package in Xenial: Fix Committed Status in rabbitmq-server source package in Bionic: Fix Committed Status in rabbitmq-server source package in Eoan: Won't Fix Status in rabbitmq-server source package in Focal: Fix Committed Status in rabbitmq-server source package in Groovy: Fix Released Status in rabbitmq-server package in Debian: New Bug description: The startup timeouts were recently adjusted and synchronized between the SysV and systemd startup files. https://github.com/rabbitmq/rabbitmq-server-release/pull/129 The new startup files should be included in this package. [Impact] After starting the RabbitMQ server process, the startup script will wait for the server to start by calling `rabbitmqctl wait` and will time out after 10 s. The startup time of the server depends on how quickly the Mnesia database becomes available and the server will time out after `mnesia_table_loading_retry_timeout` ms times `mnesia_table_loading_retry_limit` retries. By default this wait is 30,000 ms times 10 retries, i.e. 300 s. The mismatch between these two timeout values might lead to the startup script failing prematurely while the server is still waiting for the Mnesia tables. This change introduces variable `RABBITMQ_STARTUP_TIMEOUT` and the `--timeout` option into the startup script. The default value for this timeout is set to 10 minutes (600 seconds). This change also updates the systemd service file to match the timeout values between the two service management methods. [Scope] Upstream patch: https://github.com/rabbitmq/rabbitmq-server- release/pull/129 * Fix is not included in the Debian package * Fix is not included in any Ubuntu series * Groovy and Focal can apply the upstream patch as is * Bionic and Xenial need an additional fix in the systemd service file to set the `RABBITMQ_STARTUP_TIMEOUT` variable for the `rabbitmq-server-wait` helper script. [Test Case] In a clustered setup with two nodes, A and B. 1. create queue on A 2. shut down B 3. shut down A 4. boot B The broker on B will wait for A. The systemd service will wait for 10 seconds and then fail. Boot A and the rabbitmq-server process on B will complete startup. [Regression Potential] This change alters the behavior of the startup scripts when the Mnesia database takes long to become available. This might lead to failures further down the service dependency chain. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1874075/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : [email protected] Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp

