Since c761053 ("Check packets come from the correct interface https://github.com/corosync/corosync/issues/750") in kronosnet, corosync will produce log messages in certain broken network setups. See inner patch for details. Drawing attention to such setups is desirable because such setups may experience whole-cluster fences if the watchdog is active, see [1].
However, the log volume in such broken setups can be inconveniently high. In such a setup, when running the following on node 1: # for i in $(seq 100); do dd if=/dev/urandom bs=1M of=/etc/pve/test.bin count=1; done On node 2, bursts of ~1300 messages per second are observed: # journalctl --since="1min ago" -u corosync.service \ | cut -d' ' -f 1-3 | uniq -c | sort -n | tail -n 10 8 Apr 04 09:51:20 8 Apr 04 09:51:24 8 Apr 04 09:51:30 8 Apr 04 09:51:34 8 Apr 04 09:51:40 12 Apr 04 09:51:00 196 Apr 04 09:51:46 1283 Apr 04 09:51:44 1329 Apr 04 09:51:43 1370 Apr 04 09:51:45 To avoid cluttering the journal, rate-limit log messages to 200 per second. See inner patch for details. [1] https://github.com/corosync/corosync/issues/750 Signed-off-by: Friedrich Weber <f.we...@proxmox.com> --- Notes: I'm a little confused about the rate limit, as with this patch I do see that systemd suppresses messages: Apr 04 09:52:54 coro3 systemd-journald[303]: Suppressed 196 messages from corosync.service but I still see way more than 200 messages per second: 11 Apr 04 09:52:00 11 Apr 04 09:52:45 13 Apr 04 09:52:13 14 Apr 04 09:52:12 19 Apr 04 09:52:07 67 Apr 04 09:52:08 400 Apr 04 09:52:54 695 Apr 04 09:52:52 715 Apr 04 09:52:53 835 Apr 04 09:52:51 Any idea why? ...-rate-limit-log-messages-to-200-per-.patch | 54 +++++++++++++++++++ debian/patches/series | 1 + 2 files changed, 55 insertions(+) create mode 100644 debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch diff --git a/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch new file mode 100644 index 0000000..0f91b42 --- /dev/null +++ b/debian/patches/0004-corosync.service-rate-limit-log-messages-to-200-per-.patch @@ -0,0 +1,54 @@ +From 5470f01296a3bd8f47fd4bd97939b3a68f00d309 Mon Sep 17 00:00:00 2001 +From: Friedrich Weber <f.we...@proxmox.com> +Date: Fri, 4 Apr 2025 09:14:21 +0200 +Subject: [PATCH] corosync.service: rate limit log messages to 200 per second + +Since c761053 ("Check packets come from the correct interface +https://github.com/corosync/corosync/issues/750") in kronosnet, +corosync will log a message like the following every time a packet is +received at the wrong interface, i.e., not the interface on which the +corresponding IP is configured: + +> [KNET ] udp: Received packet from 10.8.1.1 to 10.8.1.3 on i/f ens20 when expected ens19 + +This is to draw attention to broken network setups that +appear to work fine as long as all corosync links are online, but once +a link goes down, may go into a state of "asymmetric connectivity" +which is problematic for corosync. See [1] for more details. + +While it is desirable to draw attention to broken setups, the volume +of log messages in such clusters can get very high and clutter the +journal. In extreme scenarios, occasional bursts of more than 1000 +messages per second were observed. If we approximate each message with +100 bytes, logging 1000 messages per second will produce ~8 GiB of raw +logs per day. While this should be a worst case scenario and the +logs probably compress well, the volume is still inconveniently high. + +Hence, use systemd log rate limiting to limit corosync log messages to +200 per second, which brings the logs in above scenario down to 1.6 +GiB/day and should still provide enough headroom to avoid suppressing +benign log messages in non-broken setups. + +[1] https://github.com/corosync/corosync/issues/750 + +Signed-off-by: Friedrich Weber <f.we...@proxmox.com> +--- + init/corosync.service.in | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/init/corosync.service.in b/init/corosync.service.in +index bd2a48a9..3d7ea2db 100644 +--- a/init/corosync.service.in ++++ b/init/corosync.service.in +@@ -10,6 +10,8 @@ EnvironmentFile=-@INITCONFIGDIR@/corosync + ExecStart=@SBINDIR@/corosync -f $COROSYNC_OPTIONS + ExecStop=@SBINDIR@/corosync-cfgtool -H --force + Type=notify ++LogRateLimitIntervalSec=1s ++LogRateLimitBurst=200 + + # In typical systemd deployments, both standard outputs are forwarded to + # journal (stderr is what's relevant in the pristine corosync configuration), +-- +2.39.5 + diff --git a/debian/patches/series b/debian/patches/series index 147e793..7a796c4 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1,3 +1,4 @@ 0001-Enable-PrivateTmp-in-the-systemd-service-files.patch 0002-only-start-corosync.service-if-conf-exists.patch 0003-totemsrp-Check-size-of-orf_token-msg.patch +0004-corosync.service-rate-limit-log-messages-to-200-per-.patch -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel