Public bug reported:
It was brought to me (~inaddy) the following situation:
""""""
* Environment
Ubuntu 14.04 LTS
Pacemaker 1.1.10+git20130802-1ubuntu2
* Priority
High
* Issue
I used "crm node standby" and the resource(haproxy) was stopped successfully.
But lrmd still monitors it and causes "Failed actions".
---------------------------------------
Node A1LB101 (167969461): standby
Online: [ A1LB102 ]
Resource Group: grpHaproxy
vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
Resource Group: grpStonith1
prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
Clone Set: clnHaproxy [haproxy]
Started: [ A1LB102 ]
Stopped: [ A1LB101 ]
Clone Set: clnPing [ping]
Started: [ A1LB102 ]
Stopped: [ A1LB101 ]
Node Attributes:
* Node A1LB101:
* Node A1LB102:
+ default_ping_set : 400
Migration summary:
* Node A1LB101:
haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58
2014'
* Node A1LB102:
Failed actions:
haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete,
last-rc-change=Mon Jul 7 20:28:58 2014
, queued=0ms, exec=0ms
): not running
---------------------------------------
Abstract from log (ha-log.node1)
Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action
42: stop haproxy_stop_0 on A1LB101 (local)
Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event:
A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ]
""""""
I wasn't able to reproduce this error so far but the fix seems a
straightforward cherry-picking from upstream patch set fix:
c72bfea664bd04656c306409381cef824679ea06
[PATCH 1/3] Fix: services: Do not allow duplicate recurring op entries.
7a02cd7745d56009ac65251c77d0fe052008224f
[PATCH 2/3] High: lrmd: Merge duplicate recurring monitor operations.
7e37f9bb35534102b83e2bc45941036361e33214
[PATCH 3/3] Fix: lrmd: Cancel recurring operations before stop action is
executed
So I'm assuming (and testing right now) this will fix the issue...
Opening the public bug for the fix I'll provide after tests, and to ask
others to test the fix also.
** Affects: pacemaker (Ubuntu)
Importance: Undecided
Assignee: Rafael David Tinoco (inaddy)
Status: Confirmed
** Changed in: pacemaker (Ubuntu)
Assignee: (unassigned) => Rafael David Tinoco (inaddy)
** Changed in: pacemaker (Ubuntu)
Status: New => Confirmed
** Description changed:
It was brought to me (~inaddy) the following situation:
""""""
- * Environment
- Ubuntu 14.04 LTS
- Pacemaker 1.1.10+git20130802-1ubuntu2
+ * Environment
+ Ubuntu 14.04 LTS
+ Pacemaker 1.1.10+git20130802-1ubuntu2
- * Priority
- High
+ * Priority
+ High
- * Issue
- I used "crm node standby" and the resource(haproxy) was stopped successfully.
- But lrmd still monitors it and causes "Failed actions".
+ * Issue
+ I used "crm node standby" and the resource(haproxy) was stopped successfully.
But lrmd still monitors it and causes "Failed actions".
- ---------------------------------------
- Node A1LB101 (167969461): standby
- Online: [ A1LB102 ]
+ ---------------------------------------
+ Node A1LB101 (167969461): standby
+ Online: [ A1LB102 ]
- Resource Group: grpHaproxy
- vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
- vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
- vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
- vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
- Resource Group: grpStonith1
- prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
- Clone Set: clnHaproxy [haproxy]
- Started: [ A1LB102 ]
- Stopped: [ A1LB101 ]
- Clone Set: clnPing [ping]
- Started: [ A1LB102 ]
- Stopped: [ A1LB101 ]
+ Resource Group: grpHaproxy
+ vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
+ vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
+ vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
+ vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
+ Resource Group: grpStonith1
+ prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
+ Clone Set: clnHaproxy [haproxy]
+ Started: [ A1LB102 ]
+ Stopped: [ A1LB101 ]
+ Clone Set: clnPing [ping]
+ Started: [ A1LB102 ]
+ Stopped: [ A1LB101 ]
- Node Attributes:
- * Node A1LB101:
- * Node A1LB102:
- + default_ping_set : 400
+ Node Attributes:
+ * Node A1LB101:
+ * Node A1LB102:
+ + default_ping_set : 400
- Migration summary:
- * Node A1LB101:
- haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58
2014'
- * Node A1LB102:
+ Migration summary:
+ * Node A1LB101:
+ haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58
2014'
+ * Node A1LB102:
- Failed actions:
- haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete,
last-rc-change=Mon Jul 7 20:28:58 2014
- , queued=0ms, exec=0ms
- ): not running
- ---------------------------------------
+ Failed actions:
+ haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete,
last-rc-change=Mon Jul 7 20:28:58 2014
+ , queued=0ms, exec=0ms
+ ): not running
+ ---------------------------------------
- Abstract from log (ha-log.node1)
- Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action
42: stop haproxy_stop_0 on A1LB101 (local)
- Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
- Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event:
A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ]
+ Abstract from log (ha-log.node1)
+ Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action
42: stop haproxy_stop_0 on A1LB101 (local)
+ Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
+ Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event:
A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ]
""""""
I wasn't able to reproduce this error so far but the fix seems a
- straightforward cherry-picking from upstream patch set fix:
+ straightforward cherry-picking from upstream patch set fix:
c72bfea664bd04656c306409381cef824679ea06
[PATCH 1/3] Fix: services: Do not allow duplicate recurring op entries.
7a02cd7745d56009ac65251c77d0fe052008224f
[PATCH 2/3] High: lrmd: Merge duplicate recurring monitor operations.
7e37f9bb35534102b83e2bc45941036361e33214
[PATCH 3/3] Fix: lrmd: Cancel recurring operations before stop action is
executed
So I'm assuming (and testing right now) this will fix the issue...
Opening the public bug for the fix I'll provide after tests, and to ask
others to test the fix also.
--
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1353473
Title:
Trusty Pacemaker "crm node standby" stops resource successfully, but
lrmd still monitors it and causes "Failed actions"
Status in “pacemaker” package in Ubuntu:
Confirmed
Bug description:
It was brought to me (~inaddy) the following situation:
""""""
* Environment
Ubuntu 14.04 LTS
Pacemaker 1.1.10+git20130802-1ubuntu2
* Priority
High
* Issue
I used "crm node standby" and the resource(haproxy) was stopped successfully.
But lrmd still monitors it and causes "Failed actions".
---------------------------------------
Node A1LB101 (167969461): standby
Online: [ A1LB102 ]
Resource Group: grpHaproxy
vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
Resource Group: grpStonith1
prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
Clone Set: clnHaproxy [haproxy]
Started: [ A1LB102 ]
Stopped: [ A1LB101 ]
Clone Set: clnPing [ping]
Started: [ A1LB102 ]
Stopped: [ A1LB101 ]
Node Attributes:
* Node A1LB101:
* Node A1LB102:
+ default_ping_set : 400
Migration summary:
* Node A1LB101:
haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58
2014'
* Node A1LB102:
Failed actions:
haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete,
last-rc-change=Mon Jul 7 20:28:58 2014
, queued=0ms, exec=0ms
): not running
---------------------------------------
Abstract from log (ha-log.node1)
Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action
42: stop haproxy_stop_0 on A1LB101 (local)
Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event:
A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ]
""""""
I wasn't able to reproduce this error so far but the fix seems a
straightforward cherry-picking from upstream patch set fix:
c72bfea664bd04656c306409381cef824679ea06
[PATCH 1/3] Fix: services: Do not allow duplicate recurring op entries.
7a02cd7745d56009ac65251c77d0fe052008224f
[PATCH 2/3] High: lrmd: Merge duplicate recurring monitor operations.
7e37f9bb35534102b83e2bc45941036361e33214
[PATCH 3/3] Fix: lrmd: Cancel recurring operations before stop action is
executed
So I'm assuming (and testing right now) this will fix the issue...
Opening the public bug for the fix I'll provide after tests, and to
ask others to test the fix also.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1353473/+subscriptions
_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help : https://help.launchpad.net/ListHelp