Based on my last comment, I have created one PPA for users to test and give us 
feedback for this specific case: 

https://launchpad.net/~inaddy/+archive/ubuntu/lp1368737

Instructions on how to use:

# add-apt-repository ppa:inaddy/lp1368737 
# apt-get update 
# apt-get install pacemaker 

This PPA contains both fixes:

Pacemaker: haproxy monitor NG 
Pacemaker: Pacemaker's lrmd process crashed. 

With the following changelog:

pacemaker (1.1.10+git20130802-1ubuntu4~lp1368737~1) trusty;
urgency=medium

* Fix: services: Prevent use-of-NULL when executing service actions - 1/2 (LP: 
#1368737) 
* Fix: services: Fix the executing of synchronous actions - 2/2 (LP: #1368737) 

-- Rafael David Tinoco <[email protected]> Fri, 12 Sep 2014
15:52:14 -0300

pacemaker (1.1.10+git20130802-1ubuntu3) trusty; urgency=medium

* Fix: services: Do not allow duplicate recurring op entries - 1/3 (LP: 
#1353473) 
* High: lrmd: Merge duplicate recurring monitor operations - 2/3 (LP: #1353473) 
* Fix: lrmd: Cancel recurring operations before stop action is executed - 3/3 
(LP: #1353473) 

-- Rafael David Tinoco <[email protected]> Wed, 06 Aug 2014
09:24:13 -0300

Since I was waiting for the Stable Release Update for pacemaker on Trusty but 
it did not 
get released until the date of this fix. 

If this fix solves the issue I'll push both SRUs (for 2 cases above) for our 
sponsor team 
to upload them for Trusty. 

Waiting on community feedback to request the Release Update.

Thank you in advance.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1368737

Title:
  Pacemaker can seg fault on crm node online/standy

Status in “pacemaker” package in Ubuntu:
  New

Bug description:
  It was brought to my attention the following situation:

  """
  [Issue] 

  lrmd process crashed when repeating "crm node standby" and "crm node
  online"

  ---------------- 
  # grep pacemakerd ha-log.k1pm101 | grep core 
  Aug 27 17:47:06 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 49275 (lrmd) dumped core 
  Aug 27 17:47:06 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=49275, core=1) 
  Aug 27 18:27:14 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 1471 (lrmd) dumped core 
  Aug 27 18:27:14 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=1471, core=1) 
  Aug 27 18:56:41 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 35771 (lrmd) dumped core 
  Aug 27 18:56:41 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=35771, core=1) 
  Aug 27 19:44:09 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 60709 (lrmd) dumped core 
  Aug 27 19:44:09 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=60709, core=1) 
  Aug 27 20:00:53 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 35838 (lrmd) dumped core 
  Aug 27 20:00:53 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=35838, core=1) 
  Aug 27 21:33:52 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 49249 (lrmd) dumped core 
  Aug 27 21:33:52 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=49249, core=1) 
  Aug 27 22:01:16 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 65358 (lrmd) dumped core 
  Aug 27 22:01:16 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=65358, core=1) 
  Aug 27 22:28:02 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 22693 (lrmd) dumped core 
  Aug 27 22:28:02 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=22693, core=1) 
  ---------------- 

  ---------------- 
  # grep pacemakerd ha-log.k1pm102 | grep core 
  Aug 27 15:32:48 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed 
process 5812 (lrmd) dumped core 
  Aug 27 15:32:48 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=5812, core=1) 
  Aug 27 15:52:52 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed 
process 35781 (lrmd) dumped core 
  Aug 27 15:52:52 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=35781, core=1) 
  Aug 27 16:02:54 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed 
process 51984 (lrmd) dumped core 
  Aug 27 16:02:54 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=51984, core=1) 
  """

  Analyzing core file with dbgsyms I could see that:

  #0  0x00007f7184a45983 in services_action_sync (op=0x7f7185b605d0) at 
services.c:434
  434           crm_trace(" >  stdout: %s", op->stdout_data);

  Is responsible for the core.

  I've checked upstream code and there might be 2 important commits that
  could be cherry-picked to fix this behavior:

  commit f2a637cc553cb7aec59bdcf05c5e1d077173419f
  Author: Andrew Beekhof <[email protected]>
  Date:   Fri Sep 20 12:20:36 2013 +1000

      Fix: services: Prevent use-of-NULL when executing service actions
        
  commit 11473a5a8c88eb17d5e8d6cd1d99dc497e817aac
  Author: Gao,Yan <[email protected]>
  Date:   Sun Sep 29 12:40:18 2013 +0800

      Fix: services: Fix the executing of synchronous actions

  The core can be caused by things such as this missing code:

  if (op == NULL) { 
  crm_trace("No operation to execute"); 
  return FALSE; 

  on the beginning of "services_action_sync(svc_action_t * op)"
  function.

  And improved by commit #11473a5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1368737/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to