[Sts-sponsors] [Bug 1874075] Re: rabbitmq-server startup timeouts differ between SysV and systemd

2020-06-26 Thread Nicolas Bock
> I guess this all means that 1) the version in focal-proposed is
> correct, and 2) the xenial and bionic versions need the addition of
> TimeoutStartSec=600 and Restart=on-failure to their service file,
> right? Is that all that's needed?

I agree with your assessment regarding Focal and Groovy. However, I
would like to verify this thesis for Bionic and Xenial to make sure
that rabbitmq-server and systemd behave that way there as well. Note
that the systemd service file on Bionic has an ExecStartPost calling a
wrapper script in addtion to the ExecStart which wouldn't be necessary
if the rabbitmq-server behaved the same way it does in Focal and
Groovy. In addition I am uncertain why the service doesn't use
`Type=notify` as in the later versions. Rabbitmq-server-3.6.10
understands what `sd_notify` is and I would have thought that this
implies that we should be able to use `Type=notify` on Bionic.

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1874075

Title:
  rabbitmq-server startup timeouts differ between SysV and systemd

Status in rabbitmq-server package in Ubuntu:
  Fix Released
Status in rabbitmq-server source package in Xenial:
  Fix Committed
Status in rabbitmq-server source package in Bionic:
  Fix Committed
Status in rabbitmq-server source package in Eoan:
  Won't Fix
Status in rabbitmq-server source package in Focal:
  Fix Committed
Status in rabbitmq-server source package in Groovy:
  Fix Released
Status in rabbitmq-server package in Debian:
  New

Bug description:
  The startup timeouts were recently adjusted and synchronized between
  the SysV and systemd startup files.

  https://github.com/rabbitmq/rabbitmq-server-release/pull/129

  The new startup files should be included in this package.

  [Impact]

  After starting the RabbitMQ server process, the startup script will
  wait for the server to start by calling `rabbitmqctl wait` and will
  time out after 10 s.

  The startup time of the server depends on how quickly the Mnesia
  database becomes available and the server will time out after
  `mnesia_table_loading_retry_timeout` ms times
  `mnesia_table_loading_retry_limit` retries. By default this wait is
  30,000 ms times 10 retries, i.e. 300 s.

  The mismatch between these two timeout values might lead to the
  startup script failing prematurely while the server is still waiting
  for the Mnesia tables.

  This change introduces variable `RABBITMQ_STARTUP_TIMEOUT` and the
  `--timeout` option into the startup script. The default value for this
  timeout is set to 10 minutes (600 seconds).

  This change also updates the systemd service file to match the timeout
  values between the two service management methods.

  [Scope]

  Upstream patch: https://github.com/rabbitmq/rabbitmq-server-
  release/pull/129

  * Fix is not included in the Debian package
  * Fix is not included in any Ubuntu series

  * Groovy and Focal can apply the upstream patch as is
  * Bionic and Xenial need an additional fix in the systemd service file
to set the `RABBITMQ_STARTUP_TIMEOUT` variable for the
`rabbitmq-server-wait` helper script.

  [Test Case]

  In a clustered setup with two nodes, A and B.

  1. create queue on A
  2. shut down B
  3. shut down A
  4. boot B

  The broker on B will wait for A. The systemd service will wait for 10
  seconds and then fail. Boot A and the rabbitmq-server process on B
  will complete startup.

  [Regression Potential]

  This change alters the behavior of the startup scripts when the Mnesia
  database takes long to become available. This might lead to failures
  further down the service dependency chain.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1874075/+subscriptions

-- 
Mailing list: https://launchpad.net/~sts-sponsors
Post to : sts-sponsors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~sts-sponsors
More help   : https://help.launchpad.net/ListHelp


[Sts-sponsors] [Bug 1874075] Re: rabbitmq-server startup timeouts differ between SysV and systemd

2020-06-26 Thread Dan Streetman
> In the context of the issues we are seeing with charms the 10 minute
> timeout should be sufficient.

right, but this isn't just changing rabbitmq-server used by charms, this
is changing the behavior for *all* Ubuntu users of rabbitmq-server, as
well as upstream. Since upstream did accept it, my *assumption* is yes,
10 minutes is a good default, but since mismatched timeouts is
essentially the cause of this entire problem, I thought it was worth
just re-checking again, to make sure we all thought about it carefully
with *all users* in mind, before leaving it at that.

To poke the thought button further, note that since the upstream (and
f/g) service files also have 'Restart=on-failure' set, and will go 10
minutes (as configured with the TimeoutStartSec=600 param), the service
is *effectively* set to never, ever timeout, since it will just restart
itself each time it times out; as the StartLimitIntervalSec= and
StartLimitBurst= will never be exceeded (since they default to 10s and
5, respectively).

So, I suppose since the effective result is that in F and later
(including upstream), the service will wait forever, with restart-on-
failure happening every 10 minutes, until it successfully is able to
start.  With that in mind, I don't think the actual TimeoutStartSec=
setting makes any difference at all (as long as it's long enough to
avoid reaching the restart StartLimit settings), besides controlling how
often the service logs a failure and then restart.

I guess this all means that 1) the version in focal-proposed is correct,
and 2) the xenial and bionic versions need the addition of
TimeoutStartSec=600 and Restart=on-failure to their service file, right?
Is that all that's needed?

-- 
You received this bug notification because you are a member of STS
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1874075

Title:
  rabbitmq-server startup timeouts differ between SysV and systemd

Status in rabbitmq-server package in Ubuntu:
  Fix Released
Status in rabbitmq-server source package in Xenial:
  Fix Committed
Status in rabbitmq-server source package in Bionic:
  Fix Committed
Status in rabbitmq-server source package in Eoan:
  Won't Fix
Status in rabbitmq-server source package in Focal:
  Fix Committed
Status in rabbitmq-server source package in Groovy:
  Fix Released
Status in rabbitmq-server package in Debian:
  New

Bug description:
  The startup timeouts were recently adjusted and synchronized between
  the SysV and systemd startup files.

  https://github.com/rabbitmq/rabbitmq-server-release/pull/129

  The new startup files should be included in this package.

  [Impact]

  After starting the RabbitMQ server process, the startup script will
  wait for the server to start by calling `rabbitmqctl wait` and will
  time out after 10 s.

  The startup time of the server depends on how quickly the Mnesia
  database becomes available and the server will time out after
  `mnesia_table_loading_retry_timeout` ms times
  `mnesia_table_loading_retry_limit` retries. By default this wait is
  30,000 ms times 10 retries, i.e. 300 s.

  The mismatch between these two timeout values might lead to the
  startup script failing prematurely while the server is still waiting
  for the Mnesia tables.

  This change introduces variable `RABBITMQ_STARTUP_TIMEOUT` and the
  `--timeout` option into the startup script. The default value for this
  timeout is set to 10 minutes (600 seconds).

  This change also updates the systemd service file to match the timeout
  values between the two service management methods.

  [Scope]

  Upstream patch: https://github.com/rabbitmq/rabbitmq-server-
  release/pull/129

  * Fix is not included in the Debian package
  * Fix is not included in any Ubuntu series

  * Groovy and Focal can apply the upstream patch as is
  * Bionic and Xenial need an additional fix in the systemd service file
to set the `RABBITMQ_STARTUP_TIMEOUT` variable for the
`rabbitmq-server-wait` helper script.

  [Test Case]

  In a clustered setup with two nodes, A and B.

  1. create queue on A
  2. shut down B
  3. shut down A
  4. boot B

  The broker on B will wait for A. The systemd service will wait for 10
  seconds and then fail. Boot A and the rabbitmq-server process on B
  will complete startup.

  [Regression Potential]

  This change alters the behavior of the startup scripts when the Mnesia
  database takes long to become available. This might lead to failures
  further down the service dependency chain.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1874075/+subscriptions

-- 
Mailing list: https://launchpad.net/~sts-sponsors
Post to : sts-sponsors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~sts-sponsors
More help   : https://help.launchpad.net/ListHelp