[Bug 2060248] Re: MRE updates of rabbitmq-server for Jammy,Focal

Mitchell Dzurick Fri, 26 Jul 2024 10:35:34 -0700

Apologies on delay, been busy with +1.

** Description changed:

+ [ Impact ]
This bug tracks an update for the rabbitmq-server package in Ubuntu.
-
This bug tracks an update to the following versions:
-
* Focal (20.04): rabbitmq-server 3.8.3
* Jammy (22.04): rabbitmq-server 3.9.27
-
(NOTE) - Jammy is only updating to 3.9.27 because 3.9.28 requires Erlang
24.3. If Erlang updates in the future, then we can upgrade further.
(NOTE) - Focal is only updating to 3.8.3 from 3.8.2 because 3.8.4 requires
etcd v3.4.
+ This is the first MRE of rabbitmq-server.
+ Upstream has a very rapid release cadence with micro releases that contain
many bug fixes that would be good to bring into our LTS releases.
+ One major hurdle with this is the lack of proper dep8 tests, which a limited
suite of dep8 tests were created for this MRE, which is planned to get
integrated into newer releases once approved.
+ rabbitmq-server is a complicated package that the new dep8 tests will not be
able to cover everything, therefore our openstack charms CI/CD ran the new
verison to provide more confidence in the package, and to at least verify that
our workflow works. The results of these runs can be found at
https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/915836.
+ In addition to this, only Jammy has github workflows to build+test the
package, where the results can be found at
https://github.com/mitchdz/rabbitmq-server-3-9-27-tests/actions/runs/8955069098/job/24595393599.

- This is the first MRE of rabbitmq-server.
+ Changelogs can be found at https://github.com/rabbitmq/rabbitmq-
+ server/tree/main/release-notes

- Upstream has a very rapid release cadence with micro releases that
- contain many bug fixes that would be good to bring into our LTS
- releases.
+ [ Test Plan ]
+ The test plan for rabbitmq-server involves 3 different types of tests.

- One major hurdle with this is the lack of proper dep8 tests, which a
- limited suite of dep8 tests were created for this MRE, which is planned
- to get integrated into newer releases once approved.
+ 1. OpenStack CI/CD
+ This is what we run for CI/CD. Testing the newer version in CI/CD tests real
world use-cases, and is a minimum that should be done to ensure our own tooling
should work. Tester will need to request the new version to be ran from the
OpenStack team. An example of a run as mentioned before is:

- rabbitmq-server is a complicated package that the new dep8 tests will
- not be able to cover everything, therefore our openstack charms CI/CD
- ran the new verison to provide more confidence in the package, and to at
- least verify that our workflow works. The results of these runs can be
- found at https://review.opendev.org/c/openstack/charm-rabbitmq-
- server/+/915836.
+ https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/915836

- In addition to this, only Jammy has github workflows to build+test the
- package, where the results can be found at
- https://github.com/mitchdz/rabbitmq-
- server-3-9-27-tests/actions/runs/8955069098/job/24595393599.
+ 2. dep8 tests
+ New dep8 tests were added to the package which must pass. These cover simple,
but real use-cases.

- Reviewing the changes, there is only one change that I want to bring to
attention. That is version 3.9.23
(https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.9.23 ) introduces
the following change:
- Nodes now default to 65536 concurrent client connections instead of using the
effective kernel open file handle limit
+ 3. Upgrade testing
+ 1. lxc launch ubuntu:jammy j-vm --vm
+ 2. lxc shell j-vm
+ 3. sudo apt install -y rabbitmq-server
+ 4. Enable proposed
+ 5. sudo apt install -y rabbitmq-server
+ # ensure no errors or issues during upgrade

- ------------------------------------------------------------------------------
-
- Jammy Changes:
- - Notices:
- + Nodes now default to 65536 concurrent client connections instead of
- using the effective kernel open file handle limit. Users who want to
- override this default, that is, have nodes that should support more
- concurrent connections and open files, now have to perform an additional
- configuration step:
-
- 1. Pick a new limit value they would like to use, for instance, 100K
- 2. Set the maximum open file handle limit (for example, via `systemd`
- or similar tooling) for the OS user used by RabbitMQ to 100K
- 3. Set the ERL_MAX_PORTS environment variable to 100K
-
- This change was introduced because of a change in several Linux
- distributions: they now use a default open file handle limit so high,
- they cause a significant (say, 1.5 GiB) memory preallocated the Erlang
- runtime.
- - Updates:
- + Free disk space monitor robustness improvements.
- + `raft.adaptive_failure_detector.poll_interval` exposes aten()'s
- poll_interval setting to RabbitMQ users. Increasing it can reduce the
- probability of false positives in clusters where inter-node
- communication links are used at close to maximum capacity. The default
- is `5000` (5 seconds).
- + When both `disk_free_limit.relative` and `disk_free_limit.absolute`,
- or both `vm_memory_high_watermark.relative` and
- `vm_memory_high_watermark.absolute` are set, the absolute settings will
- now take precedence.
- + New key supported by `rabbitmqctl list_queues`:
- `effective_policy_definition` that returns merged definitions of regular
- and operator policies effective for the queue.
- + New HTTP API endpoint, `GET /api/config/effective`, returns effective
- node configuration. This is an HTTP API counterpart of
- `rabbitmq-diagnostics environment`.
- + Force GC after definition import to reduce peak memory load by mostly
- idle nodes that import a lot of definitions.
- + A way to configure an authentication timeout, much like in some other
- protocols RabbitMQ supports.
- + Windows installer Service startup is now optional. More environment
- variables are respected by the installer.
- + In environments where DNS resolution is not yet available at the time
- RabbitMQ nodes boot and try to perform peer discovery, such as CoreDNS
- with default caching interval of 30s on Kubernetes, nodes now will
- retry hostname resolution (including of their own host) several times
- with a wait interval.
- + Prometheus plugin now exposes one more metric process_start_time_seconds
- the moment of node process startup in seconds.
- + Reduce log noise when `sysctl` cannot be accessed by node memory
- monitor.
- + Shovels now handle consumer delivery timeouts gracefully and restart.
- + Optimization: internal message GUID is no longer generated for quorum
- queues and streams, as they are specific to classic queues.
- + Two more AMQP 1.0 connection lifecycle events are now logged.
- + TLS configuration for inter-node stream replication connections now can
- use function references and definitions.
- + Stream protocol connection logging is now less verbose.
- + Max stream segment size is now limited to 3 GiB to avoid a potential
- stream position overflow.
- + Logging messages that use microseconds now use "us" for the SI symbol to
- be compatible with more tools.
- + Consul peer discovery now supports client-side TLS options, much like
- its Kubernetes and etcd peers.
- + A minor quorum queue optimization.
- + 40 to 50% throughput improvement for some workloads where AMQP 0-9-1
- clients consumed from a [stream](https://rabbitmq.com/stream.html).
- + Configuration of fallback secrets for Shovel and Federation credential
- obfuscation. This feature allows for secret rotation during rolling
- cluster node restarts.
- + Reduced memory footprint of individual consumer acknowledgements of
- quorum queue consumers.
- + `rabbitmq-diagnostics status` now reports crypto library (OpenSSL,
- LibreSSL, etc) used by the runtime, as well as its version details.
- + With a lot of busy quorum queues, nodes hosting a moderate number of of
- leader replicas could experience growing memory footprint of one of the
- Raft implementation processes.
- + Re-introduced key file log rotation settings. Some log rotation settings
- were left behind during the migration to the standard runtime logger
- starting with 3.9.0. Now some key settings have been re-introduced.
- + Cleaned up some compiler options that are no longer relevant.
- + Quorum queues: better forward compatibility with RabbitMQ 3.10.
- + Significantly faster queue re-import from definitions on subsequent node
- restarts. Initial definition import still takes the same amount of time
- as before.
- + Significantly faster exchange re-import from definitions on subsequent
- node restarts. Initial definition import still takes the same amount of
- time as before.
- + RabbitMQ nodes will now filter out certain log messages related to
- connections, channels, and queue leader replicas receiving internal
- protocol messages sent to this node before a restart. These messages
- usually raise more questions and cause confusion than help.
- + More Erlang 24.3's `eldap` library compatibility improvements.
- + Restart of a node that hosted one or more stream leaders resulted in
- their consumers not "re-attaching" to the newly elected leader.
- + Large fanouts experienced a performance regression when streams were not
- enabled using a feature flag.
- + Stream management plugin did not support mixed version clusters.
- + Stream deletion did not result in a `basic.cancel` being sent to AMQP
- 0-9-1 consumers.
- + Stream clients did not receive a correct stream unavailability error in
- some cases.
- + It is again possible to clear user tags and update the password in a
- single operation.
- + Forward compatibility with Erlang 25.
- + File handle cache efficiency improvements.
- + Uknown stream properties (e.g. those requested by a node that runs a
- newer version)
- are now handled gracefully.
- + Temporary hostname resolution issues-attempts that fail with `nxdomain`
- are now handled more gracefully and with a delay of several seconds.
- + Build time compatibility with Elixir 1.13.
- + `auth_oauth2.additional_scopes_key` in `rabbitmq.conf` was not converted
- correctly during configuration translation and thus had no effect.
- + Adapt to a breaking Erlang 24.3 LDAP client change.
- + Shovels now can be declared with `delete-after` parameter set to `0`.
- Such shovels will immediately stop instead of erroring and failing to
- start after a node restart.
- + Support for Consul 1.1 response code changes
- when an operation is attempted on a non-existent health check.
- - Bug Fixes:
- + Classic queues with Single Active Consumer enabled could run into an
- exception.
- + When a global parameter was cleared,
- nodes emitted an internal event of the wrong type.
- + Fixed a type analyzer definition.
- + LDAP server password could end up in the logs in certain types of
- exceptions.
- + `rabbitmq-diagnostics status` now handles server responses where free
- disk space is not yet computed. This is the case with nodes early in the
- boot process.
- + Management UI links now include "noopener" and "noreferrer" attributes
- to protect them against reverse tabnabbing. Note that since management
- UI only includes a small number of external links to trusted resources,
- reverse tabnabbing is unlikely to affect most users. However, it can
- show up in security scanner results and become an issue in environments
- where a modified version of RabbitMQ is offered as a service.
- + Plugin could stop in environments where no static Shovels were defined
- and a specific sequence of events happens at the same time.
- + When installation directory was overridden, the plugins directory did
- not respect the updated base installation path.
- + Intra-cluster communication link metric collector could run into an
- exception when peer connection has just been re-established, e.g. after
- a peer node restart.
- + When a node was put into maintenance mode, it closed all MQTT client
- connections cluster-wide instead of just local client connections.
- + Reduced log noise from exceptions connections could run into when a
- client was closings it connection end concurrently with other activity.
- + `rabbitmq-env-conf.bat§ on Windows could fail to load when its path
- contained spaces.
- + Stream declaration could run into an exception when stream parameters
- failed validation.
- + Some counters on the Overview page have been moved to global counters
- introduced in RabbitMQ 3.9.
- + Avoid an exception when MQTT client closes TCP connection before server
- could fully process a `CONNECT` frame sent earlier by the same client.
- + Channels on connections to mixed clusters that had 3.8 nodes in them
- could run into an exception.
- + Inter-node cluster link statistics did not have any data when TLS was
- enabled for them.
- + Quorum queues now correctly propagate errors when a `basic.get` (polling
- consumption) operation hits a timeout.
- + Stream consumer that used AMQP 0-9-1 instead of a stream protocol
- client, and disconnected, leaked a file handle.
- + Max frame size and client heartbeat parameters for [RabbitMQ stream]()
- clients were not correctly set when taken from `rabbitmq.conf`.
- + Removed a duplicate exchange decorator set operation.
- + Node restarts could result in a hashing ring inconsistency.
- + Avoid seeding default user in old clusters that still use the deprecated
- `management.load_definitions` option.
- + Streams could run into an exception or fetch stale stream position data
- in some scenarios.
- + `rabbitmqctl set_log_level` did not have any effect on logging via
- `amq.rabbitmq.log`.
- + `rabbitmq-diagnostics status` is now more resilient and won't fail if
- free disk space monitoring repeatedly fails (gets disabled) on the node.
- + CLI tools failed to run on Erlang 25 because of an old version of Elixir
- (compiled on Erlang 21) was used in the release pipeline. Erlang 25 no
- longer loads modules
- compiled on Erlang 21 or older.
- + Default log level used a four-character severity abbreviation instead of
- more common longer format, for example, `warn` instead of `warning`.
- + `rabbitmqctl set_log_level` documentation clarification.
- + Nodes now make sure that maintenance mode status table exists after node
- boot as long as the feature flag is enabled.
- + "In flight" messages directed to an exchange that has just been deleted
- will be silently dropped or returned back to the publisher instead of
- causing an exception.
- + rabbitmq-upgrade await_online_synchronized_mirror is now a no-op in
- single node clusters
- + One metric that was exposed via CLI tools and management plugin's HTTP
- API was not exposed via Prometheus scraping API.
- + Stream delivery rate could drop if concurrent stream consumers consumed
- in a way that made them reach the end of the stream often.
- + If a cluster that had streams enabled was upgraded with a jump of
- multiple patch releases, stream state could fail an upgrade.
- + Significantly faster queue re-import from definitions on subsequent node
- restarts. Initial definition import still takes the same amount of time
- as before.
- + When a policy contained keys unsupported by a particular queue
- type, and later updated or superseded by a higher priority policy,
- effective optional argument list could become inconsistent (policy
- would not have the expected effect).
- + Priority queues could run into an exception in some cases.
- + Maintenance mode could run into a timeout during queue leadership
- transfer.
- + Prometheus collector could run into an exception early on node's
- schema database sync.
- + Connection data transfer rate units were incorrectly displayed when
- rate was less than 1 kiB per second.
- + `rabbitmqadmin` now correctly loads TLS-related keys from its
- configuration file.
- + Corrected a help message for node memory usage tool tip.
- * Added new dep8 tests:
- - d/t/hello-world
- - d/t/publish-subscribe
- - d/t/rpc
- - d/t/work-queue
- * Remove patches fixed upstream:
- - d/p/lp1999816-fix-rabbitmqctl-status-disk-free-timeout.patch
-
- ------------------------------------------------------------------------------
-
- Focal Changes:
- * New upstream verison 3.8.3 (LP: #2060248).
- - Updates:
- + Some Proxy protocol errors are now logged at debug level.
- This reduces log noise in environments where TCP load balancers and
- proxies perform health checks by opening a TCP connection but never
- sending any data.
- + Quorum queue deletion operation no longer supports the "if unused" and
- "if empty" options. They are typically used for transient queues don't
- make much sense for quorum ones.
- + Do not treat applications that do not depend on rabbit as plugins.
- This is especially important for applications that should not be stopped
- before rabbit is stopped.
- + RabbitMQ nodes will now gracefully shutdown when receiving a `SIGTERM`
- signal. Previously the runtime would invoke a default handler that
- terminates the VM giving RabbitMQ no chance to execute its shutdown
- steps.
- + Every cluster now features a persistent internal cluster ID that can be
- used by core features or plugins. Unlike the human-readable cluster name,
- the value cannot be overridden by the user.
- + Speedup execution of boot steps by a factor of 2N, where N is the number
- of attributes per step.
- + New health checks that can be used to determine if it's a good moment to
- shut down a node for an upgrade.
-
- ``` sh
- # Exits with a non-zero code if target node hosts leader replica of at
- # least one queue that has out-of-sync mirror.
- rabbitmq-diagnostics check_if_node_is_mirror_sync_critical
-
- # Exits with a non-zero code if one or more quorum queues will lose
- # online quorum should target node be shut down
- rabbitmq-diagnostics check_if_node_is_quorum_critical
- ```
- + Some proxy protocol errors are now logged at debug level. This reduces
- log noise in environments where TCP load balancers and proxies perform
- health checks by opening a TCP connection but never sending any data.
- + Management and Management Agent Plugins:
- * An undocumented "automagic login" feature on the login form was
- removed.
- * A new `POST /login` endpoint can be used by custom management UI login
- forms to authenticate the user and set the cookie.
- * A new `POST /rebalance/queues` endpoint that is the HTTP API equivalent
- of `rabbitmq-queues rebalance`
- * Warning about a missing `handle.exe` in `PATH` on Windows is now only
- logged every 10 minutes.
- * `rabbitmqadmin declare queue` now supports a new `queue_type` parameter
- to simplify declaration of quorum queues.
- * HTTP API request log entries now includes acting user.
- * Content Security Policy headers are now also set for static assets such
- as JavaScript files.
- + Prometheus Plugin:
- * Add option to aggregate metrics for channels, queues & connections.
- Metrics are now aggregated by default (safe by default).
- + Kubernetes Peer Discovery Plugin:
- * The plugin will now notify Kubernetes API of node startup and peer
- stop/unavailability events. This new behaviour can be disabled via
- `prometheus.return_per_object_metrics = true` config.
- + Federation Plugin:
- * "Command" operations such as binding propagation now use a separate
- channel for all links, preventing latency spikes for asynchronous
- operations (such as message publishing) (a head-of-line blocking
- problem).
- + Auth Backend OAuth 2 Plugin:
- * Additional scopes can be fetched from a predefined JWT token field.
- Those scopes will be combined with the standard scopes field.
- + Trust Store Plugin:
- * HTTPS certificate provider will not longer terminate if upstream
- service response contains invalid JSON.
- + MQTT Plugin:
- * Avoid blocking when registering or unregistering a client ID.
- + AMQP 1.0 Client Plugin:
- * Handle heartbeat in `close_sent/2`.
- - Bug Fixes:
- + Reduced scheduled GC activity in connection socket writer to one run per
- 1 GiB of data transferred, with an option to change the value or disable
- scheduled run entirely.
- + Eliminated an inefficiency in recovery of quorum queues with a backlog of
- messages.
- + In a case where a node hosting a quorum queue replica went offline and
- was removed from the cluster, and later came back, quorum queues could
- enter a loop of Raft leader elections.
- + Quorum queues with a dead lettering could fail to recover.
- + The node now can recover even if virtual host recovery terms file was
- corrupted.
- + Autoheal could fail to finish if one of its state transitions initiated
- by a remote node timed out.
- + Syslog client is now started even when Syslog logging is configured only
- for some log sinks.
- + Policies that quorum queues ignored were still listed as applied to them.
- + If a quorum queue leader rebalancing operation timed out, CLI tools
- failed with an exception instead of a sensible internal API response.
- + Handle timeout error on the rebalance function.
- + Handle and raise protocol error for absent queues assumed to be alive.
- + `rabbitmq-diagnostics status` failed to display the results when executed
- against a node that had high VM watermark set as an absolute value
- (using `vm_memory_high_watermark.absolute`).
- + Management and Management Agent Plugins:
- * Consumer section on individual page was unintentionally hidden.
- + Management and Management Agent Plugins:
- * Fix queue-type select by adding unsafe-inline CSP policy.
- + Etcd Peer Discovery Plugin:
- * Only run healthcheck when backend is configured.
- + Federation Plugin:
- * Use vhost to delete federated exchange.
- * Added new dep8 tests:
- - d/t/smoke-test
- - d/t/hello-world
- - d/t/publish-subscribe
- - d/t/rpc
- - d/t/work-queue
+ For jammy, also ensure ERL_MAX_PORTS and LimitNOFILE are correctly honored on
upgrade.
+ [ Where problems could occur ]
+ * This is the first MRE of this package, so extra caution should be taken.
+ * Upgrading the server may cause downtine during upgrade.
+ * Reports of upgrade failures can happen if users have misconfigured
rabbitmq-server and the maintainerscripts attempt to stop/start the server.

--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2060248

Title:
MRE updates of rabbitmq-server for Jammy,Focal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/2060248/+subscriptions

--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2060248] Re: MRE updates of rabbitmq-server for Jammy,Focal

Reply via email to