[Ubuntu-ha] [Bug 1353473] Re: Trusty Pacemaker "crm node standby" stops resource successfully, but lrmd still monitors it and causes "Failed actions"

Rafael David Tinoco Wed, 06 Aug 2014 06:46:41 -0700

## After applying the fix I could successfully put one node on standby.
Resources migrated correctly.


root@trustycluster02:~# crm_mon
Connection to the CIB terminated
Reconnecting...root@trustycluster02:~# crm_mon -1
Last updated: Wed Aug  6 10:27:35 2014
Last change: Tue Aug  5 15:42:11 2014 via crm_attribute on trustycluster04
Stack: corosync
Current DC: trustycluster02 (739246088) - partition with quorum
Version: 1.1.10-42f2063
4 Nodes configured
5 Resources configured

Node trustycluster01 (739246087): standby
Online: [ trustycluster02 trustycluster03 trustycluster04 ]

 p_fence_cluster01      (stonith:external/vcenter):     Started trustycluster02
 p_fence_cluster02      (stonith:external/vcenter):     Started trustycluster03
 p_fence_cluster03      (stonith:external/vcenter):     Started trustycluster04
 p_fence_cluster04      (stonith:external/vcenter):     Started trustycluster02
 clusterip      (ocf::heartbeat:IPaddr2):       Started trustycluster03

## and resources were active in other nodes:

root@trustycluster01:~# crm_mon -1
Last updated: Wed Aug  6 10:29:48 2014
Last change: Wed Aug  6 10:27:47 2014 via crm_attribute on trustycluster01
Stack: corosync
Current DC: trustycluster02 (739246088) - partition with quorum
Version: 1.1.10-42f2063
4 Nodes configured
5 Resources configured

Node trustycluster01 (739246087): standby
Node trustycluster03 (739246089): standby
Online: [ trustycluster02 trustycluster04 ]

 p_fence_cluster01      (stonith:external/vcenter):     Started trustycluster02
 p_fence_cluster02      (stonith:external/vcenter):     Started trustycluster04
 p_fence_cluster03      (stonith:external/vcenter):     Started trustycluster04
 p_fence_cluster04      (stonith:external/vcenter):     Started trustycluster02
 clusterip      (ocf::heartbeat:IPaddr2):       Started trustycluster02

## After putting nodes back online:

root@trustycluster01:~# crm_mon -1
Last updated: Wed Aug  6 10:30:42 2014
Last change: Wed Aug  6 10:30:36 2014 via crm_attribute on trustycluster01
Stack: corosync
Current DC: trustycluster02 (739246088) - partition with quorum
Version: 1.1.10-42f2063
4 Nodes configured
5 Resources configured

Online: [ trustycluster01 trustycluster02 trustycluster03
trustycluster04 ]

 p_fence_cluster01      (stonith:external/vcenter):     Started trustycluster02
 p_fence_cluster02      (stonith:external/vcenter):     Started trustycluster04
 p_fence_cluster03      (stonith:external/vcenter):     Started trustycluster01
 clusterip      (ocf::heartbeat:IPaddr2):       Started trustycluster01

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1353473

Title:
  Trusty Pacemaker "crm node standby" stops resource successfully, but
  lrmd still monitors it and causes "Failed actions"

Status in “pacemaker” package in Ubuntu:
  Confirmed

Bug description:
  It was brought to me (~inaddy) the following situation:

  """"""

  * Environment
  Ubuntu 14.04 LTS
  Pacemaker 1.1.10+git20130802-1ubuntu2

  * Priority
  High

  * Issue
  I used "crm node standby" and the resource(haproxy) was stopped successfully. 
But lrmd still monitors it and causes "Failed actions".

  ---------------------------------------
  Node A1LB101 (167969461): standby
  Online: [ A1LB102 ]

  Resource Group: grpHaproxy
  vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
  Resource Group: grpStonith1
  prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
  Clone Set: clnHaproxy [haproxy]
  Started: [ A1LB102 ]
  Stopped: [ A1LB101 ]
  Clone Set: clnPing [ping]
  Started: [ A1LB102 ]
  Stopped: [ A1LB101 ]

  Node Attributes:
  * Node A1LB101:
  * Node A1LB102:
  + default_ping_set : 400

  Migration summary:
  * Node A1LB101:
  haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58 
2014'
  * Node A1LB102:

  Failed actions:
  haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete, 
last-rc-change=Mon Jul 7 20:28:58 2014
  , queued=0ms, exec=0ms
  ): not running
  ---------------------------------------

  Abstract from log (ha-log.node1)
  Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action 
42: stop haproxy_stop_0 on A1LB101 (local)
  Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action 
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
  Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event: 
A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ]

  """"""

  I wasn't able to reproduce this error so far but the fix seems a
  straightforward cherry-picking from upstream patch set fix:

  c72bfea664bd04656c306409381cef824679ea06
  [PATCH 1/3] Fix: services: Do not allow duplicate recurring op entries.

  7a02cd7745d56009ac65251c77d0fe052008224f
  [PATCH 2/3] High: lrmd: Merge duplicate recurring monitor operations.

  7e37f9bb35534102b83e2bc45941036361e33214
  [PATCH 3/3] Fix: lrmd: Cancel recurring operations before stop action is 
executed

  So I'm assuming (and testing right now) this will fix the issue...
  Opening the public bug for the fix I'll provide after tests, and to
  ask others to test the fix also.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1353473/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

[Ubuntu-ha] [Bug 1353473] Re: Trusty Pacemaker "crm node standby" stops resource successfully, but lrmd still monitors it and causes "Failed actions"

Reply via email to