[Bug 1353473] Re: Trusty Pacemaker "crm node standby" stops resource successfully, but lrmd still monitors it and causes "Failed actions"

Rafael David Tinoco Fri, 08 Aug 2014 10:06:43 -0700

Uploading fix for Trusty.

** Description changed:


  [Impact]
  
   * Whenever a user uses "crm node standby" the code can make lrmd still
     try to monitor resource put into stand-by and cause error messages.
  
  [Test Case]
  
   * To use "crm node standby" and check lrmd does not stop monitoring
     not set to stand-by.
  
  [Regression Potential]
  
-  * users already tested and are using in production.
+  * users already tested and are using in production.
   * based on upstream fixes for lrmd monitoring.
   * potential race conditions (based on upstream history).
  
  [Other Info]
  
   * Original bug description:
  
  ----------------
  
  It was brought to me (~inaddy) the following situation:
  
  """"""
  
  * Environment
  Ubuntu 14.04 LTS
  Pacemaker 1.1.10+git20130802-1ubuntu2
  
  * Priority
  High
  
  * Issue
  I used "crm node standby" and the resource(haproxy) was stopped successfully. 
But lrmd still monitors it and causes "Failed actions".
  
  ---------------------------------------
  Node A1LB101 (167969461): standby
  Online: [ A1LB102 ]
  
  Resource Group: grpHaproxy
  vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
  Resource Group: grpStonith1
  prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
  Clone Set: clnHaproxy [haproxy]
  Started: [ A1LB102 ]
  Stopped: [ A1LB101 ]
  Clone Set: clnPing [ping]
  Started: [ A1LB102 ]
  Stopped: [ A1LB101 ]
  
  Node Attributes:
  * Node A1LB101:
  * Node A1LB102:
  + default_ping_set : 400
  
  Migration summary:
  * Node A1LB101:
  haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58 
2014'
  * Node A1LB102:
  
  Failed actions:
  haproxy_monitor_10000 (node=A1LB101, call=2332, rc=7, status=complete, 
last-rc-change=Mon Jul 7 20:28:58 2014
  , queued=0ms, exec=0ms
  ): not running
  ---------------------------------------
  
  Abstract from log (ha-log.node1)
  Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action 
42: stop haproxy_stop_0 on A1LB101 (local)
  Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action 
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
  Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event: 
A1LB101-haproxy_monitor_10000:1372 [ haproxy not running.\n ]
  
  """"""
  
  I wasn't able to reproduce this error so far but the fix seems a
  straightforward cherry-picking from upstream patch set fix:
  
- c72bfea664bd04656c306409381cef824679ea06
- [PATCH 1/3] Fix: services: Do not allow duplicate recurring op entries.
- 
- 7a02cd7745d56009ac65251c77d0fe052008224f
- [PATCH 2/3] High: lrmd: Merge duplicate recurring monitor operations.
- 
- 7e37f9bb35534102b83e2bc45941036361e33214
- [PATCH 3/3] Fix: lrmd: Cancel recurring operations before stop action is 
executed
+ c72bfea Fix: services: Do not allow duplicate recurring op entries
+ 7a02cd7 High: lrmd: Merge duplicate recurring monitor operations
+ 7e37f9b Fix: lrmd: Cancel recurring operations before stop action is executed
  
  So I'm assuming (and testing right now) this will fix the issue...
  Opening the public bug for the fix I'll provide after tests, and to ask
  others to test the fix also.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1353473

Title:
  Trusty Pacemaker "crm node standby" stops resource successfully, but
  lrmd still monitors it and causes "Failed actions"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1353473/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

[Bug 1353473] Re: Trusty Pacemaker "crm node standby" stops resource successfully, but lrmd still monitors it and causes "Failed actions"

Reply via email to