Hi,

I'm newly subscribed to the list, hoping to find some pointers. I can't seem to find much about rabbitmq and logind, so I wanted to ask the list if anyone has encountered the same and if so, how they dealt with it.

We're supporting a Victoria cluster (installed with our own deployment method) mostly controlled by pacemaker. And on two of the three control nodes I see this warning constantly:

---snip---
2024-07-29T14:09:23.552576+02:00 control01 su: pam_unix(su:session): session opened for user rabbitmq by (uid=0) 2024-07-29T14:09:24.450657+02:00 control01 su: pam_unix(su:session): session closed for user rabbitmq
2024-07-29T14:09:24.500356+02:00 control01 su: (to rabbitmq) root on none
2024-07-29T14:09:24.502370+02:00 control01 su: pam_systemd(su:session): Failed to create session: Maximum number of sessions (8192) reached, refusing further sessions. 2024-07-29T14:09:24.502681+02:00 control01 su: pam_unix(su:session): session opened for user rabbitmq by (uid=0) 2024-07-29T14:09:25.565203+02:00 control01 su: pam_unix(su:session): session closed for user rabbitmq
2024-07-29T14:09:25.609613+02:00 control01 su: (to rabbitmq) root on none
---snip---

This is obviously initiated by pacemaker (just grabbed newer logs):

Aug 27 13:16:06 control03 lrmd[297534]: INFO: rabbitmq[296363]: su_rabbit_cmd(): the invoked command exited 0: /usr/sbin/rabbitmqctl node_health_check -t 128 Aug 27 13:16:06 control03 lrmd[297542]: INFO: rabbitmq[296363]: get_monitor(): get_monitor function ready to return 0

Looking into loginctl list-sessions, almost all of them belong to rabbitmq and they have a very old timestamp (2023). I'm aware of older systemd versions which can't handle closing sessions correctly [0], but we already use a version newer than required according to [0]. I increased the SessionsMax to 16384 on one of the nodes, and again, rabbitmq uses almost all available sessions:

control03:~ # loginctl list-sessions | grep -c rabbit
16325

But everything seems to be working okay, it's just filling up the logs apparently. And it seems as if all new sessions are closed properly:

control03:~ # journalctl --since 2024-08-14 | grep -c "session opened for user rabbitmq"
7679
control03:~ # journalctl --since 2024-08-14 | grep -c "session closed for user rabbitmq"
7679

What I'm wondering about is why only two out of three control nodes reach the SessionsMax limit while the third (which joined the cluster later) only has 2 rabbitmq sessions. I seem to overlook something, but I don't know what it is yet. And I'm curious if this is working "as designed". This is a cluster with 3 control nodes and 36 compute nodes. What do other operators see in their HA clouds regarding rabbitmq?

Or could this be a rabbitmq issue since the ocf ha resource is from the rabbitmq-server package?

rpm -qf /usr/lib/ocf/resource.d/rabbitmq/rabbitmq-server-ha
rabbitmq-server-3.8.3-lp152.2.3.1.x86_64

Thanks for any pointers!
Eugen

[0] https://www.suse.com/support/kb/doc/?id=000020549


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to