[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

Jorge Niedbalski Thu, 13 Aug 2020 15:16:10 -0700

I am able to reproduce a similar issue with the following bundle:
https://paste.ubuntu.com/p/VJ3m7nMN79/


Resource created with
sudo pcs resource create test2 ocf:pacemaker:Dummy op_sleep=10 op monitor 
interval=30s timeout=30s op start timeout=30s op stop timeout=30s

juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers 
juju-acda3d-pacemaker-remote-10.cloud.sts"
juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers 
juju-acda3d-pacemaker-remote-11.cloud.sts"
juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers 
juju-acda3d-pacemaker-remote-12.cloud.sts"


Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 
juju-acda3d-pacemaker-remote-9 ]
RemoteOnline: [ juju-acda3d-pacemaker-remote-10.cloud.sts 
juju-acda3d-pacemaker-remote-11.cloud.sts 
juju-acda3d-pacemaker-remote-12.cloud.sts ]

Full list of resources:

Resource Group: grp_nova_vips
res_nova_bf9661e_vip (ocf::heartbeat:IPaddr2): Started 
juju-acda3d-pacemaker-remote-7
Clone Set: cl_nova_haproxy [res_nova_haproxy]
Started: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 
juju-acda3d-pacemaker-remote-9 ]
juju-acda3d-pacemaker-remote-10.cloud.sts (ocf::pacemaker:remote): Started 
juju-acda3d-pacemaker-remote-8
juju-acda3d-pacemaker-remote-12.cloud.sts (ocf::pacemaker:remote): Started 
juju-acda3d-pacemaker-remote-8
juju-acda3d-pacemaker-remote-11.cloud.sts (ocf::pacemaker:remote): Started 
juju-acda3d-pacemaker-remote-7

test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-
remote-10.cloud.sts

## After running the following commands on juju-acda3d-pacemaker-
remote-10.cloud.sts

1) sudo systemctl stop pacemaker_remote
2) forcedfully shutdown (openstack server stop xxxx) in less than 10 seconds 
after the pacemaker_remote gets
executed.

Remote is shutdown

RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ]

The resource status remains as stopped across the 3 machines, and
doesn't recovers.

$ juju run --application nova-cloud-controller "sudo pcs resource show | grep 
-i test2"
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/0
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/1
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/2

However, If I do a clean shutdown (without interrupting the pacemaker_remote 
fence), that ends up
with the resource migrated correctly to another node.

6 nodes configured
9 resources configured

Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 
juju-acda3d-pacemaker-remote-9 ]
RemoteOnline: [ juju-acda3d-pacemaker-remote-11.cloud.sts 
juju-acda3d-pacemaker-remote-12.cloud.sts ]
RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ]

Full list of resources:

[...]
test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-remote-12.cloud.sts

I will keep investigating this behavior and determine is this is linked
to the bug reported.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1890491

Title:
  A pacemaker node fails monitor (probe) and stop /start operations on a
  resource because it returns "rc=189

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

Reply via email to