Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
On 17/11/2020 18:41, Alejandro Taboada wrote: > Thank you Markus, > > I just updated deb9u2 and works fine. Let me know when you have new updates > and I can test this thing. > > Regards, > Alejandro > >> On 17 Nov 2020, at 05:16, Markus Koschany wrote: >> >> Control: severity -1 normal >> >> Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada: >>> Hi Markus, >>> >>> Sorry for the delay. With this patch works when is applied only to 1 node. >>> The services restart and the arm resources are up. >>> The problem appears again when I install the patch on a 2nd node. The the >>> resources stopped again. >> >> Hello Alejandro, >> >> thanks for your feedback. At the moment I cannot reproduce the problem hence >> I >> have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of >> pacemaker to stretch-security which restores the old behavior. The regression >> tests shipped with pacemaker also don't report anything unusual. I will keep >> this bug report open for discussions and work on another update. This time I >> intend to upgrade pacemaker to the latest upstream release in the 1.1.x >> branch >> which is currently 1.1.24~rc1. This one also includes fixes for >> CVE-2018-16878 >> and CVE-2018-16877. I expect no big changes in terms of existing features >> but I >> will send new packages for testing before I upload a new upstream release. >> >> Regards, >> >> Markus > > I can confirm that 1.1.16-1+deb9u2 works as expected, thanks for the fix. Kind regards, Louis signature.asc Description: OpenPGP digital signature
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Thank you Markus, I just updated deb9u2 and works fine. Let me know when you have new updates and I can test this thing. Regards, Alejandro > On 17 Nov 2020, at 05:16, Markus Koschany wrote: > > Control: severity -1 normal > > Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada: >> Hi Markus, >> >> Sorry for the delay. With this patch works when is applied only to 1 node. >> The services restart and the arm resources are up. >> The problem appears again when I install the patch on a 2nd node. The the >> resources stopped again. > > Hello Alejandro, > > thanks for your feedback. At the moment I cannot reproduce the problem hence I > have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of > pacemaker to stretch-security which restores the old behavior. The regression > tests shipped with pacemaker also don't report anything unusual. I will keep > this bug report open for discussions and work on another update. This time I > intend to upgrade pacemaker to the latest upstream release in the 1.1.x branch > which is currently 1.1.24~rc1. This one also includes fixes for CVE-2018-16878 > and CVE-2018-16877. I expect no big changes in terms of existing features but > I > will send new packages for testing before I upload a new upstream release. > > Regards, > > Markus
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
On Tue, 17 Nov 2020 09:16:48 +0100 Markus Koschany wrote: > This time I intend to upgrade pacemaker to the latest upstream release > in the 1.1.x branch which is currently 1.1.24~rc1. This one also > includes fixes for CVE-2018-16878 and CVE-2018-16877. Hi Markus, Please close #927714 if you fix those CVEs. Unfortunately I forgot to upload the prepared package after getting the blessing of the Security Team, so it slept in my local packaging repo until I noticed it again importing your 1.1.16-1+deb9u1 upload. Tagged as wferi/1.1.16-1+deb9u1 and pushed to Salsa in case you want to have a look; lacking it might even be the reason behind the current IPC problems, I don't know. -- Cheers, Feri
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Control: severity -1 normal Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada: > Hi Markus, > > Sorry for the delay. With this patch works when is applied only to 1 node. > The services restart and the arm resources are up. > The problem appears again when I install the patch on a 2nd node. The the > resources stopped again. Hello Alejandro, thanks for your feedback. At the moment I cannot reproduce the problem hence I have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of pacemaker to stretch-security which restores the old behavior. The regression tests shipped with pacemaker also don't report anything unusual. I will keep this bug report open for discussions and work on another update. This time I intend to upgrade pacemaker to the latest upstream release in the 1.1.x branch which is currently 1.1.24~rc1. This one also includes fixes for CVE-2018-16878 and CVE-2018-16877. I expect no big changes in terms of existing features but I will send new packages for testing before I upload a new upstream release. Regards, Markus signature.asc Description: This is a digitally signed message part
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
On Sat, 14 Nov 2020 04:02:40 +0100 Markus Koschany wrote: Am Freitag, den 13.11.2020, 23:13 -0300 schrieb Alejandro Taboada: > Hello Markus, > > It doesn’t work. The output log is quite different. I throws a timeout and > just at the end the “unprivileged client crmd”. > See attached log. I'm sorry but I uploaded an older version that missed a do_reply line. That's why are you seeing timeouts now. Now I have uploaded the correct version from my test server to https://people.debian.org/~apo/lts/pacemaker/ This update to buster went out over-night and didn't cause the same issues. Start-Date: 2020-11-14 06:02:48 Commandline: /usr/bin/unattended-upgrade Upgrade: pacemaker:amd64 (2.0.1-5, 2.0.1-5+deb10u1) End-Date: 2020-11-14 06:03:13 OpenPGP_0xE92032F399E1C6EC.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Am Freitag, den 13.11.2020, 23:13 -0300 schrieb Alejandro Taboada: > Hello Markus, > > It doesn’t work. The output log is quite different. I throws a timeout and > just at the end the “unprivileged client crmd”. > See attached log. I'm sorry but I uploaded an older version that missed a do_reply line. That's why are you seeing timeouts now. Now I have uploaded the correct version from my test server to https://people.debian.org/~apo/lts/pacemaker/ Please try again. Regards, Markus signature.asc Description: This is a digitally signed message part
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Am Donnerstag, den 12.11.2020, 15:50 -0300 schrieb Alejandro Taboada: > Hi ! > > Just tested v1.1 and the issue persists. The problem is quiet local > connection when using with corosync Hello, I believe I have found and fixed the problem. The refactored code in lrmd.c caused the regression. Since this commit is not strictly needed to fix CVE- 2020-25654, I have reverted the changes. On my local setup I don't see any error messages but I would appreciate a final test from you before I upload to rule out other possible issues. New source and binary packages are available at https://people.debian.org/~apo/lts/pacemaker/ Regards, Markus signature.asc Description: This is a digitally signed message part
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
On 13/11/2020 12:23, Alejandro Taboada wrote: > Maybe Corocync is not using peer communication? Could you check someway the > packet source address .. if it’s form localhost just allow, other check > permissions > I know is not ideal but will solve a tot of production issues in the > meanwhile. > > >> On 12 Nov 2020, at 23:20, Alejandro Taboada >> wrote: >> >> > > I'm not sure I understand what we need to look for. Aren't they communicating via UNIX sockets from abstract namespaces (@cib_rw@, @attrd@, etc.) ? That's what I see when I strace calls to "crm resource cleanup " which also fails with the patched version. signature.asc Description: OpenPGP digital signature
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Maybe Corocync is not using peer communication? Could you check someway the packet source address .. if it’s form localhost just allow, other check permissions I know is not ideal but will solve a tot of production issues in the meanwhile. > On 12 Nov 2020, at 23:20, Alejandro Taboada > wrote: > >
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Hi, Am Donnerstag, den 12.11.2020, 18:21 +0100 schrieb Pallai Roland: > Hi Markus, > > The problem is still the same here: Thanks for your debug log. I have looked at every line of code again and compared the original upstream patch from here https://bugzilla.redhat.com/attachment.cgi?id=1722701 with the released fix from here https://github.com/ClusterLabs/pacemaker/pull/2210/commits/7babd406e7195fcce57850a8589b06e095642c33 There is only one thing that stands out, in fencing/commands.c if client = NULL, then they assume now it is a peer and this is always allowed to interact. For me it is the only explanation at the moment why you still see Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd If you take a closer look at the patch then the allowed variable must be true in lrmd/lrmd.c but in your case it is (incorrectly) false. Since crmd is part of pacemaker it should not be rejected. Please try the new version at https://people.debian.org/~apo/lts/pacemaker/ and report back if that addresses the problem. Thanks, Markus signature.asc Description: This is a digitally signed message part
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Hi ! Just tested v1.1 and the issue persists. The problem is quiet local connection when using with corosync Thanks, Alejandro > On 12 Nov 2020, at 14:21, Pallai Roland wrote: > > Hi Markus, > > The problem is still the same here: > Nov 12 18:14:46 srv1 lrmd[990]: warning: Rejecting IPC request > 'lrmd_rsc_info' from unprivileged client crmd > Nov 12 18:14:46 srv1 lrmd[990]: warning: Rejecting IPC request > 'lrmd_rsc_register' from unprivileged client crmd > Nov 12 18:14:46 srv1 crmd[993]:error: Could not add resource > dummy_activenode to LRM nmsrv1 > Nov 12 18:14:46 srv1 crmd[993]:error: Invalid resource definition for > dummy_activenode > > [root@srv1 root]# dpkg -l pacemaker > ii pacemaker 1.1.16-1+deb9u1.1 amd64 cluster > resource manager > > Downgrading to "pacemaker=1.1.16-1" fixed it again. > > > On 2020. november 12., csütörtök 17:51:28 CET, Markus Koschany wrote: >> Thanks for reporting. This is a permission problem. I assume your clients are >> local and not remote and you don't use the tls_backend. I have prepared >> another >> update that should grant the local hacluser clients the necessary privileges. >> You can download the source and binary files from >> >> https://people.debian.org/~apo/lts/pacemaker/ >> >> Please report back if this fixes the problem. If not, please send me your log >> file via private email after you have set the logfile_priority to debug in >> corosync.conf. >
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Hi Markus, The problem is still the same here: Nov 12 18:14:46 srv1 lrmd[990]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 18:14:46 srv1 lrmd[990]: warning: Rejecting IPC request 'lrmd_rsc_register' from unprivileged client crmd Nov 12 18:14:46 srv1 crmd[993]:error: Could not add resource dummy_activenode to LRM nmsrv1 Nov 12 18:14:46 srv1 crmd[993]:error: Invalid resource definition for dummy_activenode [root@srv1 root]# dpkg -l pacemaker ii pacemaker 1.1.16-1+deb9u1.1 amd64 cluster resource manager Downgrading to "pacemaker=1.1.16-1" fixed it again. On 2020. november 12., csütörtök 17:51:28 CET, Markus Koschany wrote: Thanks for reporting. This is a permission problem. I assume your clients are local and not remote and you don't use the tls_backend. I have prepared another update that should grant the local hacluser clients the necessary privileges. You can download the source and binary files from https://people.debian.org/~apo/lts/pacemaker/ Please report back if this fixes the problem. If not, please send me your log file via private email after you have set the logfile_priority to debug in corosync.conf.
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Thanks for reporting. This is a permission problem. I assume your clients are local and not remote and you don't use the tls_backend. I have prepared another update that should grant the local hacluser clients the necessary privileges. You can download the source and binary files from https://people.debian.org/~apo/lts/pacemaker/ Please report back if this fixes the problem. If not, please send me your log file via private email after you have set the logfile_priority to debug in corosync.conf. Regards, Markus signature.asc Description: This is a digitally signed message part
Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654
Package: pacemaker Version: 1.1.16-1+deb9u1 Severity: grave X-Debbugs-CC: a...@debian.org Hi, I am running corosync 2.4.2-3+deb9u1 with pacemaker and the last run of unattended-upgrades broke the cluster (downgrading pacemaker to 1.1.16-1 fixed it immediately). The logs contain a lot of warnings that seem to point to a permission problem, such as "Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd". I am not using ACLs so the patch should not impact my system. Here is an excerpt from the logs after the upgrade: Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_PENDING -> S_NOT_DC Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_NOT_DC -> S_PENDING Nov 12 06:26:05 cluster-1 attrd[20866]: notice: Defaulting to uname -n for the local corosync node name Nov 12 06:26:05 cluster-1 crmd[20868]: notice: State transition S_PENDING -> S_NOT_DC Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_register' from unprivileged client crmd Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not add resource service to LRM cluster-1 Nov 12 06:26:06 cluster-1 crmd[20868]:error: Invalid resource definition for service Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 crmd[20868]: warning: bad input Nov 12 06:26:06 cluster-1 lrmd[20865]: warning: Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Resource service no longer exists in the lrmd Nov 12 06:26:06 cluster-1 crmd[20868]:error: Result of probe operation for service on cluster-1: Error Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Input I_FAIL received in state S_NOT_DC from get_lrm_resource Nov 12 06:26:06 cluster-1 crmd[20868]: notice: State transition S_NOT_DC -> S_RECOVERY Nov 12 06:26:06 cluster-1 crmd[20868]: warning: Fast-tracking shutdown in response to errors Nov 12 06:26:06 cluster-1 crmd[20868]:error: Input I_TERMINATE received in state S_RECOVERY from do_recover Nov 12 06:26:06 cluster-1 crmd[20868]: notice: Disconnected from the LRM Nov 12 06:26:06 cluster-1 crmd[20868]: notice: Disconnected from Corosync Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not recover from internal error Nov 12 06:26:06 cluster-1 pacemakerd[20857]:error: The crmd process (20868) exited: Generic Pacemaker error (201) Nov 12 06:26:06 cluster-1 pacemakerd[20857]: notice: Respawning failed child process: crmd My corosync.conf is quite standard: totem { version: 2 cluster_name: debian token: 0 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: aes256 crypto_hash: sha256 interface { ringnumber: 0 bindnetaddr: xxx mcastaddr: yyy mcastport: 5405 ttl: 1 } } logging { fileline: off to_stderr: yes to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 } So is my crm configuration: node xxx: cluster-1 \ attributes standby=off node xxx: cluster-2 \ attributes standby=off primitive service systemd:service \ meta failure-timeout=30 \ op monitor interval=5 on-fail=restart timeout=15s primitive vip-1 IPaddr2 \ params ip=xxx cidr_netmask=32 \ op monitor interval=10s primitive vip-2 IPaddr2 \ params ip=xxx cidr_netmask=32 \ op monitor interval=10s clone clone_service service colocation service_vip-1 inf: vip-1 clone_service colocation service_vip-2 inf: vip-2 clone_service order kot_before_vip-1 inf: clone_service vip-1 order kot_before_vip-2 inf: clone_service vip-2 location prefer-cluster1-vip-1 vip-1 1: cluster-1 location prefer-cluster2-vip-2 vip-2 1: cluster-2 property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \