On bionic the rabbitmq-server process returns with exit code 0 regardless of whether it managed to start the server or not. Presumably this behavior is the reason for the ExecStartPost. If I change the service to type notify then systemd notices that the ExecStart command fails to start and the ExecStartPost is not necessary (it is never executed then). However the end result seems to be the same --> Let's leave the type as simple.
When I add Restart=on-failure the call to `systemctl start rabbitmq-server` returns with an error message, but the rabbitmq-server service is restarted every 10 seconds anyway. I don't know whether the non-blocking of `systemctl start` will cause issues, but the cluster now recovers as soon as the other broker is started. The relevant changes to the service file are: [Service] TimeoutStartSec=600 Restart=on-failure RestartSec=10 -- You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1874075 Title: rabbitmq-server startup timeouts differ between SysV and systemd Status in rabbitmq-server package in Ubuntu: Fix Released Status in rabbitmq-server source package in Xenial: Fix Committed Status in rabbitmq-server source package in Bionic: Fix Committed Status in rabbitmq-server source package in Eoan: Won't Fix Status in rabbitmq-server source package in Focal: Fix Committed Status in rabbitmq-server source package in Groovy: Fix Released Status in rabbitmq-server package in Debian: New Bug description: The startup timeouts were recently adjusted and synchronized between the SysV and systemd startup files. https://github.com/rabbitmq/rabbitmq-server-release/pull/129 The new startup files should be included in this package. [Impact] After starting the RabbitMQ server process, the startup script will wait for the server to start by calling `rabbitmqctl wait` and will time out after 10 s. The startup time of the server depends on how quickly the Mnesia database becomes available and the server will time out after `mnesia_table_loading_retry_timeout` ms times `mnesia_table_loading_retry_limit` retries. By default this wait is 30,000 ms times 10 retries, i.e. 300 s. The mismatch between these two timeout values might lead to the startup script failing prematurely while the server is still waiting for the Mnesia tables. This change introduces variable `RABBITMQ_STARTUP_TIMEOUT` and the `--timeout` option into the startup script. The default value for this timeout is set to 10 minutes (600 seconds). This change also updates the systemd service file to match the timeout values between the two service management methods. [Scope] Upstream patch: https://github.com/rabbitmq/rabbitmq-server- release/pull/129 * Fix is not included in the Debian package * Fix is not included in any Ubuntu series * Groovy and Focal can apply the upstream patch as is * Bionic and Xenial need an additional fix in the systemd service file to set the `RABBITMQ_STARTUP_TIMEOUT` variable for the `rabbitmq-server-wait` helper script. [Test Case] In a clustered setup with two nodes, A and B. 1. create queue on A 2. shut down B 3. shut down A 4. boot B The broker on B will wait for A. The systemd service will wait for 10 seconds and then fail. Boot A and the rabbitmq-server process on B will complete startup. [Regression Potential] This change alters the behavior of the startup scripts when the Mnesia database takes long to become available. This might lead to failures further down the service dependency chain. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1874075/+subscriptions -- Mailing list: https://launchpad.net/~sts-sponsors Post to : [email protected] Unsubscribe : https://launchpad.net/~sts-sponsors More help : https://help.launchpad.net/ListHelp

