[Ubuntu-ha] [Bug 1312156] Update Released

Colin Watson Thu, 03 Jul 2014 02:57:00 -0700

The verification of the Stable Release Update for pacemaker has
completed successfully and the package has now been released to
-updates.  Subsequently, the Ubuntu Stable Release Updates Team is being
unsubscribed and will not receive messages about this bug report.  In
the event that you encounter a regression using the package from
-updates please report a new bug using ubuntu-bug and tag the bug report
regression-update so we can easily find any regressions.


-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1312156

Title:
  [Precise] Potential for data corruption

Status in “pacemaker” package in Ubuntu:
  Fix Released
Status in “pacemaker” source package in Precise:
  Fix Released

Bug description:
  [Impact]

   * Pacemaker designated controller can make wrong decisions based on
  uncleared node status on a rare specific situation. This situation can
  make the same resource starts on two nodes at the same time, resulting
  in data corruption.

  [Test Case]

   * The bug trigger is very hard hard to achieve:

  1) If stonith was successful on fencing a node (any node was fenced).
  2) If the target and origin are the same (node killed itself).
  3) If we do not have a dc or the fenced node is our dc (our dc killed itself).
  4) If the executor is not this node (at least 3 nodes).
  5) If this node is elected new DC anytime in the future.
  7) If a policy engine was not yet scheduled.
  8) If takeover runs before policy engine.

   * The bug couldn't be reproduced so far: the patch was made based on
  a community report (https://www.mail-
  archive.com/[email protected]/msg19509.html) analyzed by
  upstream code developer (Andrew Beekhof).

  [Regression Potential]

   * On logic before commit 82aa2d8d17 the node responsible for fencing
  (executioner) the dc was responsible also for updating cib. If this
  update failed (due to a executioner fail, for ex) the dc would be
  fenced a second time because the cluster would not know about fencing
  result. On upstream commit 82aa2d8d17, a logic trying to avoid this
  second dc fencing was introduced. This logic by itself is buggy.

   * To minimize any kind of regression, instead of going forward on
  pacemaker versions, it was decided to go backwards removing only this
  piece of code.

   * It is much more acceptable for SRU to restore old behavior, known
  to be safe even if it implies killing dc twice, than to backport
  several pieces of code to implement a logic that was not there on the
  stable version release.

  [Other Info / Original Description]

  Under certain conditions there is faulty logic in function
  tengine_stonith_notify() which can incorrectly add successfully fenced
  nodes to a list, causing Pacemaker to subsequently erase that node’s
  status section when the next DC (Designated Controller) election
  occurs.  With the status section erased, the cluster considers that
  node is down and starts corresponding services on other nodes.
  Multiple instances of the same service can cause data corruption.

  Conditions:

  1. fenced node must have been the previous DC and been sufficiently 
functional to request its own fencing
  2. fencing notification must arrive after the new DC has been elected but 
before it invokes the policy engine

  Pacemaker versions affected:

  1.1.6 - 1.1.9

  Stable Ubuntu releases affected:

  Ubuntu 12.04 LTS
  Ubuntu 12.10 (EOL?)

  Fix:

  https://github.com/ClusterLabs/pacemaker/commit/f30e1e43

  References:

  https://www.mail-archive.com/[email protected]/msg19509.html
  
http://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pacemaker-1-dot-1-6-through-1-dot-1-9/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1312156/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

[Ubuntu-ha] [Bug 1312156] Update Released

Reply via email to