Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-18 Thread Louis Sautier
On 17/11/2020 18:41, Alejandro Taboada wrote:
> Thank you Markus,
> 
> I just updated deb9u2 and works fine. Let me know when you have new updates 
> and I can test this thing.
> 
> Regards,
> Alejandro
> 
>> On 17 Nov 2020, at 05:16, Markus Koschany  wrote:
>>
>> Control: severity -1 normal
>>
>> Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada:
>>> Hi Markus,
>>>
>>> Sorry for the delay. With this patch works when is applied only to 1 node.
>>> The services restart and the arm resources are up.
>>> The problem appears again when I install the patch on a 2nd node. The the
>>> resources stopped again.
>>
>> Hello Alejandro,
>>
>> thanks for your feedback. At the moment I cannot reproduce the problem hence 
>> I
>> have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of
>> pacemaker to stretch-security which restores the old behavior. The regression
>> tests shipped with pacemaker also don't report anything unusual. I will keep
>> this bug report open for discussions and work on another update. This time I
>> intend to upgrade pacemaker to the latest upstream release in the 1.1.x 
>> branch
>> which is currently 1.1.24~rc1. This one also includes fixes for 
>> CVE-2018-16878
>> and CVE-2018-16877. I expect no big changes in terms of existing features 
>> but I
>> will send new packages for testing before I upload a new upstream release. 
>>
>> Regards,
>>
>> Markus
> 
> 
I can confirm that 1.1.16-1+deb9u2 works as expected, thanks for the fix.

Kind regards,

Louis



signature.asc
Description: OpenPGP digital signature


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-17 Thread Alejandro Taboada
Thank you Markus,

I just updated deb9u2 and works fine. Let me know when you have new updates and 
I can test this thing.

Regards,
Alejandro

> On 17 Nov 2020, at 05:16, Markus Koschany  wrote:
> 
> Control: severity -1 normal
> 
> Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada:
>> Hi Markus,
>> 
>> Sorry for the delay. With this patch works when is applied only to 1 node.
>> The services restart and the arm resources are up.
>> The problem appears again when I install the patch on a 2nd node. The the
>> resources stopped again.
> 
> Hello Alejandro,
> 
> thanks for your feedback. At the moment I cannot reproduce the problem hence I
> have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of
> pacemaker to stretch-security which restores the old behavior. The regression
> tests shipped with pacemaker also don't report anything unusual. I will keep
> this bug report open for discussions and work on another update. This time I
> intend to upgrade pacemaker to the latest upstream release in the 1.1.x branch
> which is currently 1.1.24~rc1. This one also includes fixes for CVE-2018-16878
> and CVE-2018-16877. I expect no big changes in terms of existing features but 
> I
> will send new packages for testing before I upload a new upstream release. 
> 
> Regards,
> 
> Markus



Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-17 Thread wferi
On Tue, 17 Nov 2020 09:16:48 +0100 Markus Koschany  wrote:

> This time I intend to upgrade pacemaker to the latest upstream release
> in the 1.1.x branch which is currently 1.1.24~rc1. This one also
> includes fixes for CVE-2018-16878 and CVE-2018-16877.

Hi Markus,

Please close #927714 if you fix those CVEs.  Unfortunately I forgot to
upload the prepared package after getting the blessing of the Security
Team, so it slept in my local packaging repo until I noticed it again
importing your 1.1.16-1+deb9u1 upload.  Tagged as wferi/1.1.16-1+deb9u1
and pushed to Salsa in case you want to have a look; lacking it might
even be the reason behind the current IPC problems, I don't know.
-- 
Cheers,
Feri



Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-17 Thread Markus Koschany
Control: severity -1 normal

Am Montag, den 16.11.2020, 09:22 -0300 schrieb Alejandro Taboada:
> Hi Markus,
> 
> Sorry for the delay. With this patch works when is applied only to 1 node.
> The services restart and the arm resources are up.
> The problem appears again when I install the patch on a 2nd node. The the
> resources stopped again.

Hello Alejandro,

thanks for your feedback. At the moment I cannot reproduce the problem hence I
have reverted the patch and uploaded a new revision, 1.1.16-1+deb9u2, of
pacemaker to stretch-security which restores the old behavior. The regression
tests shipped with pacemaker also don't report anything unusual. I will keep
this bug report open for discussions and work on another update. This time I
intend to upgrade pacemaker to the latest upstream release in the 1.1.x branch
which is currently 1.1.24~rc1. This one also includes fixes for CVE-2018-16878
and CVE-2018-16877. I expect no big changes in terms of existing features but I
will send new packages for testing before I upload a new upstream release. 

Regards,

Markus


signature.asc
Description: This is a digitally signed message part


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-14 Thread Luke Hall

On Sat, 14 Nov 2020 04:02:40 +0100 Markus Koschany  wrote:

Am Freitag, den 13.11.2020, 23:13 -0300 schrieb Alejandro Taboada:
> Hello Markus,
> 
> It doesn’t work. The output log is quite different. I throws a timeout and

> just at the end the “unprivileged client crmd”.
> See attached log.

I'm sorry but I uploaded an older version that missed a do_reply line. That's
why are you seeing timeouts now. Now I have uploaded the correct version from
my test server to https://people.debian.org/~apo/lts/pacemaker/


This update to buster went out over-night and didn't cause the same issues.

Start-Date: 2020-11-14  06:02:48
Commandline: /usr/bin/unattended-upgrade
Upgrade: pacemaker:amd64 (2.0.1-5, 2.0.1-5+deb10u1)
End-Date: 2020-11-14  06:03:13


OpenPGP_0xE92032F399E1C6EC.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-13 Thread Markus Koschany
Am Freitag, den 13.11.2020, 23:13 -0300 schrieb Alejandro Taboada:
> Hello Markus,
> 
> It doesn’t work. The output log is quite different. I throws a timeout and
> just at the end the “unprivileged client crmd”.
> See attached log.

I'm sorry but I uploaded an older version that missed a do_reply line. That's
why are you seeing timeouts now. Now I have uploaded the correct version from
my test server to https://people.debian.org/~apo/lts/pacemaker/

Please try again.

Regards,

Markus



signature.asc
Description: This is a digitally signed message part


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-13 Thread Markus Koschany
Am Donnerstag, den 12.11.2020, 15:50 -0300 schrieb Alejandro Taboada:
> Hi !
> 
> Just tested v1.1 and the issue persists. The problem is quiet local
> connection when using with corosync

Hello,

I believe I have found and fixed the problem. The refactored code in lrmd.c
caused the regression. Since this commit is not strictly needed to fix CVE-
2020-25654, I have reverted the changes. On my local setup I don't see any
error messages but I would appreciate a final test from you before I upload to
rule out other possible issues. New source and binary packages are available at

https://people.debian.org/~apo/lts/pacemaker/

Regards,

Markus


signature.asc
Description: This is a digitally signed message part


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-13 Thread Louis Sautier
On 13/11/2020 12:23, Alejandro Taboada wrote:
> Maybe Corocync is not using peer communication? Could you check someway the 
> packet source address .. if it’s form localhost just allow, other check 
> permissions
> I know is not ideal but will solve a tot of production issues in the 
> meanwhile.
> 
> 
>> On 12 Nov 2020, at 23:20, Alejandro Taboada  
>> wrote:
>>
>> 
> 
> 
I'm not sure I understand what we need to look for.

Aren't they communicating via UNIX sockets from abstract namespaces
(@cib_rw@, @attrd@, etc.) ? That's what I see when I strace calls to
"crm resource cleanup " which also fails with the patched version.



signature.asc
Description: OpenPGP digital signature


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-13 Thread Alejandro Taboada
Maybe Corocync is not using peer communication? Could you check someway the 
packet source address .. if it’s form localhost just allow, other check 
permissions
I know is not ideal but will solve a tot of production issues in the meanwhile.


> On 12 Nov 2020, at 23:20, Alejandro Taboada  
> wrote:
> 
> 



Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-12 Thread Markus Koschany
Hi,

Am Donnerstag, den 12.11.2020, 18:21 +0100 schrieb Pallai Roland:
> Hi Markus,
> 
> The problem is still the same here:

Thanks for your debug log. I have looked at every line of code again and
compared the original upstream patch from here


https://bugzilla.redhat.com/attachment.cgi?id=1722701

with the released fix from here

https://github.com/ClusterLabs/pacemaker/pull/2210/commits/7babd406e7195fcce57850a8589b06e095642c33

There is only one thing that stands out, in fencing/commands.c

if client = NULL, then they assume now it is a peer and this is always allowed
to interact. For me it is the only explanation at the moment why you still see

Rejecting IPC request 'lrmd_rsc_info' from unprivileged client crmd
 
If you take a closer look at the patch then the allowed variable must be true
in lrmd/lrmd.c but in your case it is (incorrectly) false. Since crmd is part
of pacemaker it should not be rejected. Please try the new version at

https://people.debian.org/~apo/lts/pacemaker/

and report back if that addresses the problem.

Thanks,

Markus


signature.asc
Description: This is a digitally signed message part


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-12 Thread Alejandro Taboada
Hi !

Just tested v1.1 and the issue persists. The problem is quiet local connection 
when using with corosync

Thanks,
Alejandro

> On 12 Nov 2020, at 14:21, Pallai Roland  wrote:
> 
> Hi Markus,
> 
> The problem is still the same here:
> Nov 12 18:14:46 srv1 lrmd[990]:  warning: Rejecting IPC request 
> 'lrmd_rsc_info' from unprivileged client crmd
> Nov 12 18:14:46 srv1 lrmd[990]:  warning: Rejecting IPC request 
> 'lrmd_rsc_register' from unprivileged client crmd
> Nov 12 18:14:46 srv1 crmd[993]:error: Could not add resource 
> dummy_activenode to LRM nmsrv1
> Nov 12 18:14:46 srv1 crmd[993]:error: Invalid resource definition for 
> dummy_activenode
> 
> [root@srv1 root]# dpkg -l pacemaker
> ii  pacemaker 1.1.16-1+deb9u1.1 amd64 cluster 
> resource manager
> 
> Downgrading to "pacemaker=1.1.16-1" fixed it again.
> 
> 
> On 2020. november 12., csütörtök 17:51:28 CET, Markus Koschany wrote:
>> Thanks for reporting. This is a permission problem. I assume your clients are
>> local and not remote and you don't use the tls_backend. I have prepared 
>> another
>> update that should grant the local hacluser clients the necessary privileges.
>> You can download the source and binary files from
>> 
>> https://people.debian.org/~apo/lts/pacemaker/
>> 
>> Please report back if this fixes the problem. If not, please send me your log
>> file via private email after you have set the logfile_priority to debug in
>> corosync.conf.
> 



Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-12 Thread Pallai Roland

Hi Markus,

The problem is still the same here:
Nov 12 18:14:46 srv1 lrmd[990]:  warning: Rejecting IPC request 
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 18:14:46 srv1 lrmd[990]:  warning: Rejecting IPC request 
'lrmd_rsc_register' from unprivileged client crmd
Nov 12 18:14:46 srv1 crmd[993]:error: Could not add resource 
dummy_activenode to LRM nmsrv1
Nov 12 18:14:46 srv1 crmd[993]:error: Invalid resource definition for 
dummy_activenode


[root@srv1 root]# dpkg -l pacemaker
ii  pacemaker 1.1.16-1+deb9u1.1 amd64 cluster 
resource manager


Downgrading to "pacemaker=1.1.16-1" fixed it again.


On 2020. november 12., csütörtök 17:51:28 CET, Markus Koschany wrote:
Thanks for reporting. This is a permission problem. I assume 
your clients are
local and not remote and you don't use the tls_backend. I have 
prepared another
update that should grant the local hacluser clients the 
necessary privileges.

You can download the source and binary files from

https://people.debian.org/~apo/lts/pacemaker/

Please report back if this fixes the problem. If not, please 
send me your log

file via private email after you have set the logfile_priority to debug in
corosync.conf.




Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-12 Thread Markus Koschany
Thanks for reporting. This is a permission problem. I assume your clients are
local and not remote and you don't use the tls_backend. I have prepared another
update that should grant the local hacluser clients the necessary privileges.
You can download the source and binary files from

https://people.debian.org/~apo/lts/pacemaker/

Please report back if this fixes the problem. If not, please send me your log
file via private email after you have set the logfile_priority to debug in
corosync.conf.


Regards,

Markus


signature.asc
Description: This is a digitally signed message part


Bug#974563: corosync unable to communicate with pacemaker 1.1.16-1+deb9u1 which contains the fix for CVE-2020-25654

2020-11-12 Thread Louis Sautier
Package: pacemaker
Version: 1.1.16-1+deb9u1
Severity: grave
X-Debbugs-CC: a...@debian.org

Hi,
I am running corosync 2.4.2-3+deb9u1 with pacemaker and the last run of
unattended-upgrades broke the cluster (downgrading pacemaker to 1.1.16-1
fixed it immediately).
The logs contain a lot of warnings that seem to point to a permission
problem, such as "Rejecting IPC request 'lrmd_rsc_info' from
unprivileged client crmd". I am not using ACLs so the patch should not
impact my system.

Here is an excerpt from the logs after the upgrade:
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_PENDING -> S_NOT_DC
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_NOT_DC -> S_PENDING
Nov 12 06:26:05 cluster-1 attrd[20866]:   notice: Defaulting to uname -n
for the local corosync node name
Nov 12 06:26:05 cluster-1 crmd[20868]:   notice: State transition
S_PENDING -> S_NOT_DC
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_register' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not add resource
service to LRM cluster-1
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Invalid resource
definition for service
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input 
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input   
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input   
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input 
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: bad input

Nov 12 06:26:06 cluster-1 lrmd[20865]:  warning: Rejecting IPC request
'lrmd_rsc_info' from unprivileged client crmd
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Resource service no
longer exists in the lrmd
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Result of probe
operation for service on cluster-1: Error
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Input I_FAIL received
in state S_NOT_DC from get_lrm_resource
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: State transition
S_NOT_DC -> S_RECOVERY
Nov 12 06:26:06 cluster-1 crmd[20868]:  warning: Fast-tracking shutdown
in response to errors
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Input I_TERMINATE
received in state S_RECOVERY from do_recover
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: Disconnected from the LRM
Nov 12 06:26:06 cluster-1 crmd[20868]:   notice: Disconnected from Corosync
Nov 12 06:26:06 cluster-1 crmd[20868]:error: Could not recover from
internal error
Nov 12 06:26:06 cluster-1 pacemakerd[20857]:error: The crmd process
(20868) exited: Generic Pacemaker error (201)
Nov 12 06:26:06 cluster-1 pacemakerd[20857]:   notice: Respawning failed
child process: crmd

My corosync.conf is quite standard:
totem {
version: 2
cluster_name: debian
token: 0
token_retransmits_before_loss_const: 10
clear_node_high_bit: yes
crypto_cipher: aes256
crypto_hash: sha256
interface {
ringnumber: 0
bindnetaddr: xxx
mcastaddr: yyy
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
}

So is my crm configuration:
node xxx: cluster-1 \
attributes standby=off
node xxx: cluster-2 \
attributes standby=off
primitive service systemd:service \
meta failure-timeout=30 \
op monitor interval=5 on-fail=restart timeout=15s
primitive vip-1 IPaddr2 \
params ip=xxx cidr_netmask=32 \
op monitor interval=10s
primitive vip-2 IPaddr2 \
params ip=xxx cidr_netmask=32 \
op monitor interval=10s
clone clone_service service
colocation service_vip-1 inf: vip-1 clone_service
colocation service_vip-2 inf: vip-2 clone_service
order kot_before_vip-1 inf: clone_service vip-1
order kot_before_vip-2 inf: clone_service vip-2
location prefer-cluster1-vip-1 vip-1 1: cluster-1
location prefer-cluster2-vip-2 vip-2 1: cluster-2
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-94ff4df \
cluster-infrastructure=corosync \