[Ubuntu-ha] [Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

2020-08-18 Thread Jorge Niedbalski
** Changed in: pacemaker (Ubuntu Bionic)
   Status: New => In Progress

** Changed in: pacemaker (Ubuntu Bionic)
 Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1890491

Title:
  A pacemaker node fails monitor (probe) and stop /start operations on a
  resource because it returns "rc=189

Status in pacemaker package in Ubuntu:
  Fix Released
Status in pacemaker source package in Bionic:
  In Progress
Status in pacemaker source package in Focal:
  Fix Released
Status in pacemaker source package in Groovy:
  Fix Released

Bug description:
  Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker
  Remote node before the stop of the node's Pacemaker Remote connection,
  including stops that were implied by fencing of the node. Also,
  Pacemaker scheduled actions on Pacemaker Remote nodes with a failed
  connection so that the actions could be done once the connection is
  recovered, even if the connection wasn't being recovered (for example,
  if the node was shutting down when the failure occurred).

  Consequence: If a Pacemaker Remote node needed to be fenced while it
  was in the process of shutting down, once the fencing completed
  pacemaker scheduled probes on the node. The probes fail because the
  connection is not actually active. Due to the failed probe, a stop is
  scheduled which also fails, leading to fencing of the node again, and
  the situation repeats itself indefinitely.

  Fix: Pacemaker Remote connection stops are no longer ordered after
  implied stops, and actions are not scheduled on Pacemaker Remote nodes
  when the connection is failed and not being started again.

  Result: A Pacemaker Remote node that needs to be fenced while it is in
  the process of shutting down is fenced once, without repeating
  indefinitely.

  The fix seems to be fixed in pacemaker-1.1.21-1.el7

  Related to https://bugzilla.redhat.com/show_bug.cgi?id=1704870

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

2020-08-14 Thread Jorge Niedbalski
Hello,

I am testing a couple of patches (both imported from master), through
this PPA: https://launchpad.net/~niedbalski/+archive/ubuntu/fix-1890491

c20f8920 - don't order implied stops relative to a remote connection
938e99f2 - remote state is failed if node is shutting down with connection 
failure

I'll report back here if these patches fixes the behavior described in my 
previous
comment.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1890491

Title:
  A pacemaker node fails monitor (probe) and stop /start operations on a
  resource because it returns "rc=189

Status in pacemaker package in Ubuntu:
  Fix Released
Status in pacemaker source package in Bionic:
  New
Status in pacemaker source package in Focal:
  Fix Released
Status in pacemaker source package in Groovy:
  Fix Released

Bug description:
  Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker
  Remote node before the stop of the node's Pacemaker Remote connection,
  including stops that were implied by fencing of the node. Also,
  Pacemaker scheduled actions on Pacemaker Remote nodes with a failed
  connection so that the actions could be done once the connection is
  recovered, even if the connection wasn't being recovered (for example,
  if the node was shutting down when the failure occurred).

  Consequence: If a Pacemaker Remote node needed to be fenced while it
  was in the process of shutting down, once the fencing completed
  pacemaker scheduled probes on the node. The probes fail because the
  connection is not actually active. Due to the failed probe, a stop is
  scheduled which also fails, leading to fencing of the node again, and
  the situation repeats itself indefinitely.

  Fix: Pacemaker Remote connection stops are no longer ordered after
  implied stops, and actions are not scheduled on Pacemaker Remote nodes
  when the connection is failed and not being started again.

  Result: A Pacemaker Remote node that needs to be fenced while it is in
  the process of shutting down is fenced once, without repeating
  indefinitely.

  The fix seems to be fixed in pacemaker-1.1.21-1.el7

  Related to https://bugzilla.redhat.com/show_bug.cgi?id=1704870

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

2020-08-13 Thread Jorge Niedbalski
I am able to reproduce a similar issue with the following bundle:
https://paste.ubuntu.com/p/VJ3m7nMN79/

Resource created with
sudo pcs resource create test2 ocf:pacemaker:Dummy op_sleep=10 op monitor 
interval=30s timeout=30s op start timeout=30s op stop timeout=30s

juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers 
juju-acda3d-pacemaker-remote-10.cloud.sts"
juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers 
juju-acda3d-pacemaker-remote-11.cloud.sts"
juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers 
juju-acda3d-pacemaker-remote-12.cloud.sts"


Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 
juju-acda3d-pacemaker-remote-9 ]
RemoteOnline: [ juju-acda3d-pacemaker-remote-10.cloud.sts 
juju-acda3d-pacemaker-remote-11.cloud.sts 
juju-acda3d-pacemaker-remote-12.cloud.sts ]

Full list of resources:

Resource Group: grp_nova_vips
res_nova_bf9661e_vip (ocf::heartbeat:IPaddr2): Started 
juju-acda3d-pacemaker-remote-7
Clone Set: cl_nova_haproxy [res_nova_haproxy]
Started: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 
juju-acda3d-pacemaker-remote-9 ]
juju-acda3d-pacemaker-remote-10.cloud.sts (ocf::pacemaker:remote): Started 
juju-acda3d-pacemaker-remote-8
juju-acda3d-pacemaker-remote-12.cloud.sts (ocf::pacemaker:remote): Started 
juju-acda3d-pacemaker-remote-8
juju-acda3d-pacemaker-remote-11.cloud.sts (ocf::pacemaker:remote): Started 
juju-acda3d-pacemaker-remote-7

test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-
remote-10.cloud.sts

## After running the following commands on juju-acda3d-pacemaker-
remote-10.cloud.sts

1) sudo systemctl stop pacemaker_remote
2) forcedfully shutdown (openstack server stop ) in less than 10 seconds 
after the pacemaker_remote gets
executed.

Remote is shutdown

RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ]

The resource status remains as stopped across the 3 machines, and
doesn't recovers.

$ juju run --application nova-cloud-controller "sudo pcs resource show | grep 
-i test2"
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/0
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/1
- Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n"
UnitId: nova-cloud-controller/2

However, If I do a clean shutdown (without interrupting the pacemaker_remote 
fence), that ends up
with the resource migrated correctly to another node.

6 nodes configured
9 resources configured

Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 
juju-acda3d-pacemaker-remote-9 ]
RemoteOnline: [ juju-acda3d-pacemaker-remote-11.cloud.sts 
juju-acda3d-pacemaker-remote-12.cloud.sts ]
RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ]

Full list of resources:

[...]
test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-remote-12.cloud.sts

I will keep investigating this behavior and determine is this is linked
to the bug reported.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1890491

Title:
  A pacemaker node fails monitor (probe) and stop /start operations on a
  resource because it returns "rc=189

Status in pacemaker package in Ubuntu:
  Fix Released
Status in pacemaker source package in Bionic:
  New
Status in pacemaker source package in Focal:
  Fix Released
Status in pacemaker source package in Groovy:
  Fix Released

Bug description:
  Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker
  Remote node before the stop of the node's Pacemaker Remote connection,
  including stops that were implied by fencing of the node. Also,
  Pacemaker scheduled actions on Pacemaker Remote nodes with a failed
  connection so that the actions could be done once the connection is
  recovered, even if the connection wasn't being recovered (for example,
  if the node was shutting down when the failure occurred).

  Consequence: If a Pacemaker Remote node needed to be fenced while it
  was in the process of shutting down, once the fencing completed
  pacemaker scheduled probes on the node. The probes fail because the
  connection is not actually active. Due to the failed probe, a stop is
  scheduled which also fails, leading to fencing of the node again, and
  the situation repeats itself indefinitely.

  Fix: Pacemaker Remote connection stops are no longer ordered after
  implied stops, and actions are not scheduled on Pacemaker Remote nodes
  when the connection is failed and not being started again.

  Result: A Pacemaker Remote node that needs to be fenced while it is in
  the process of shutting down is fenced once, without repeating
  indefinitely.

  The fix seems to be fixed in pacemaker-1.1.21-1.el7

  Related to https://bugzilla.redhat.com/show_bug.cgi?id=1704870

To manage notifications about this bug go to:

[Ubuntu-ha] [Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189

2020-08-05 Thread Jorge Niedbalski
** Also affects: pacemaker (Ubuntu Groovy)
   Importance: Undecided
   Status: New

** Also affects: pacemaker (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: pacemaker (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Changed in: pacemaker (Ubuntu Groovy)
   Status: New => Fix Released

** Changed in: pacemaker (Ubuntu Focal)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1890491

Title:
  A pacemaker node fails monitor (probe) and stop /start operations on a
  resource because it returns "rc=189

Status in pacemaker package in Ubuntu:
  Fix Released
Status in pacemaker source package in Bionic:
  New
Status in pacemaker source package in Focal:
  Fix Released
Status in pacemaker source package in Groovy:
  Fix Released

Bug description:
  Cause: Pacemaker implicitly ordered all stops needed on a Pacemaker
  Remote node before the stop of the node's Pacemaker Remote connection,
  including stops that were implied by fencing of the node. Also,
  Pacemaker scheduled actions on Pacemaker Remote nodes with a failed
  connection so that the actions could be done once the connection is
  recovered, even if the connection wasn't being recovered (for example,
  if the node was shutting down when the failure occurred).

  Consequence: If a Pacemaker Remote node needed to be fenced while it
  was in the process of shutting down, once the fencing completed
  pacemaker scheduled probes on the node. The probes fail because the
  connection is not actually active. Due to the failed probe, a stop is
  scheduled which also fails, leading to fencing of the node again, and
  the situation repeats itself indefinitely.

  Fix: Pacemaker Remote connection stops are no longer ordered after
  implied stops, and actions are not scheduled on Pacemaker Remote nodes
  when the connection is failed and not being started again.

  Result: A Pacemaker Remote node that needs to be fenced while it is in
  the process of shutting down is fenced once, without repeating
  indefinitely.

  The fix seems to be fixed in pacemaker-1.1.21-1.el7

  Related to https://bugzilla.redhat.com/show_bug.cgi?id=1704870

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1644152] Re: Pacemaker hang during upgrade to 9.2

2017-04-28 Thread Jorge Niedbalski
** Also affects: pacemaker (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: pacemaker (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1644152

Title:
  Pacemaker hang during upgrade to 9.2

Status in Fuel for OpenStack:
  Fix Released

Bug description:
  During upgrade from pacemaker version 1.1.14-2~u14.04+mos1 to version
  1.1.14-2~u14.04+mos2 lrmd process hang and does not allow pacemaker to
  recover from corosync outage.

  Long way to reproduce:
  ~
  1. Install 9.1 with one controller node in HA mode.
  2. Try to upgrade to 9.2
  --

  Expected result:
  
  Upgrade finished without problems.
  --

  Result:
  ~~
  upgrade fails on some random component outage.

  There are errors in pacemaker log:
  error: mainloop_add_ipc_server: Could not start pengine IPC server: Address 
already in use (-98)
  error: main:Failed to create IPC server: shutting down and inhibiting 
respawn

  Pacemaker process restart every 2-3 minutes.

  For example view https://bugs.launchpad.net/fuel/+bug/1641947
  ==

  Fast way to reproduce:
  ~
  1. Install 9.0 or 9.1 with one controller node in HA mode.
  2. Login to controller over ssh
  3. service corosync stop
  4. Update packages pacemaker-cli-utils, pacemaker-common, 
pacemaker-resource-agents, pacemaker to 1.1.14-2~u14.04+mos2
  5. service corosync start
  6. wait 60 second for pacemaker to respawn 
  7. service pacemaker restart
  --

  Expected result:
  
  Pacemaker recovers from corosync outage.
  --

  Result:
  ~~~
  Pacemaker fail to communicate with zombi lrmd and constantly restart.

To manage notifications about this bug go to:
https://bugs.launchpad.net/fuel/+bug/1644152/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-04-25 Thread Jorge Niedbalski
** Patch added: "lp1677684-trusty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4867853/+files/lp1677684-trusty.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress
Status in corosync source package in Yakkety:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-04-25 Thread Jorge Niedbalski
** Patch removed: "lp1677684-zesty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4850712/+files/lp1677684-zesty.debdiff

** Patch removed: "lp1677684-xenial.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4850713/+files/lp1677684-xenial.debdiff

** Patch removed: "lp1677684-yakkety.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4851293/+files/lp1677684-yakkety.debdiff

** Patch removed: "lp1677684-trusty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4851294/+files/lp1677684-trusty.debdiff

** Patch added: "lp1677684-zesty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4867850/+files/lp1677684-zesty.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress
Status in corosync source package in Yakkety:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-04-05 Thread Jorge Niedbalski
Hello Christian,

Thanks for looking into this, I just followed what the build dependency 
suggested
(>= 0.12) there is no strict dependency on it.

Do you want me to just leave it as libqb-dev or this is something you can
fix when merging?

Let me know how to proceed.

Thanks.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress
Status in corosync source package in Yakkety:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-31 Thread Jorge Niedbalski
Hello Christian,

I've attached the updated debdiff for trusty (checking for libqb-dev >=
0.12) as well as the requested Yakkety debdiff.

Thanks for looking into this.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress
Status in corosync source package in Yakkety:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-31 Thread Jorge Niedbalski
** Patch removed: "lp1677684-trusty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4850762/+files/lp1677684-trusty.debdiff

** Patch added: "lp1677684-yakkety.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4851293/+files/lp1677684-yakkety.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress
Status in corosync source package in Yakkety:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-31 Thread Jorge Niedbalski
** Changed in: corosync (Ubuntu Yakkety)
   Status: Confirmed => In Progress

** Changed in: corosync (Ubuntu Yakkety)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress
Status in corosync source package in Yakkety:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-30 Thread Jorge Niedbalski
** Patch added: "lp1677684-trusty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4850762/+files/lp1677684-trusty.debdiff

** Description changed:

  [Environment]
  
  Ubuntu Xenial 16.04
  Amd64
  
- [Reproduction]
+ [Test Case]
  
- - Install corosync
- - Run the corosync-blackbox executable.
+ 1) sudo apt-get install corosync
+ 2) sudo corosync-blackbox.
  
  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox
  
  Expected results: corosync-blackbox runs OK.
+ 
  Current results:
  
  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found
  
- Fix:
+ [Impact]
  
+  * Cannot run corosync-blackbox
+ 
+ [Regression Potential]
+ 
+ * None identified.
+ 
+ [Fix]
  Make the package dependant of libqb-dev
  
  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Test Case]

  1) sudo apt-get install corosync
  2) sudo corosync-blackbox.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.

  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  [Impact]

   * Cannot run corosync-blackbox

  [Regression Potential]

  * None identified.

  [Fix]
  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-30 Thread Jorge Niedbalski
** Patch added: "lp1677684-xenial.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4850713/+files/lp1677684-xenial.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Reproduction]

  - Install corosync
  - Run the corosync-blackbox executable.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.
  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  Fix:

  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] Re: /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-30 Thread Jorge Niedbalski
** Tags removed: sts
** Tags added: sts-sponsor

** Patch added: "lp1677684-zesty.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+attachment/4850712/+files/lp1677684-zesty.debdiff

** Changed in: corosync (Ubuntu)
   Status: New => In Progress

** Changed in: corosync (Ubuntu)
   Importance: Undecided => Medium

** Changed in: corosync (Ubuntu)
 Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

** Changed in: corosync (Ubuntu Trusty)
   Status: New => In Progress

** Changed in: corosync (Ubuntu Trusty)
   Importance: Undecided => Medium

** Changed in: corosync (Ubuntu Trusty)
 Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

** Changed in: corosync (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: corosync (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: corosync (Ubuntu Xenial)
     Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Xenial:
  In Progress

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Reproduction]

  - Install corosync
  - Run the corosync-blackbox executable.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.
  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  Fix:

  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1677684] [NEW] /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not found

2017-03-30 Thread Jorge Niedbalski
Public bug reported:

[Environment]

Ubuntu Xenial 16.04
Amd64

[Reproduction]

- Install corosync
- Run the corosync-blackbox executable.

root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep black
/usr/bin/corosync-blackbox

Expected results: corosync-blackbox runs OK.
Current results:

$ sudo corosync-blackbox
/usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

Fix:

Make the package dependant of libqb-dev

root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
/usr/sbin/qb-blackbox

** Affects: corosync (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: sts

** Tags added: sts

** Description changed:

  [Environment]
  
  Ubuntu Xenial 16.04
  Amd64
  
  [Reproduction]
  
  - Install corosync
  - Run the corosync-blackbox executable.
  
  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox
  
  Expected results: corosync-blackbox runs OK.
  Current results:
  
  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found
+ 
+ Fix:
+ 
+ Make the package dependant of libqb-dev
+ 
+ root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
+ /usr/sbin/qb-blackbox

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1677684

Title:
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-
  blackbox: not found

Status in corosync package in Ubuntu:
  New

Bug description:
  [Environment]

  Ubuntu Xenial 16.04
  Amd64

  [Reproduction]

  - Install corosync
  - Run the corosync-blackbox executable.

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L corosync |grep 
black
  /usr/bin/corosync-blackbox

  Expected results: corosync-blackbox runs OK.
  Current results:

  $ sudo corosync-blackbox
  /usr/bin/corosync-blackbox: 34: /usr/bin/corosync-blackbox: qb-blackbox: not 
found

  Fix:

  Make the package dependant of libqb-dev

  root@juju-niedbalski-xenial-machine-5:/home/ubuntu# dpkg -L libqb-dev | grep 
qb-bl
  /usr/sbin/qb-blackbox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1677684/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-04-06 Thread Jorge Niedbalski
Hello,

I ran the verification for the Trusty version.

root@juju-niedbalski-sec-machine-15:/home/ubuntu# dpkg -l|grep corosync
ii  corosync 2.3.3-1ubuntu3
amd64Standards-based cluster framework (daemon and modules)
ii  libcorosync-common4  2.3.3-1ubuntu3
amd64Standards-based cluster framework, common library

I configured a 3 nodes nova-cloud-controller environment related with
hacluster.

ubuntu@niedbalski-sec-bastion:~/openstack-charm-testing/bundles/dev$ juju run 
--service nova-cloud-controller "sudo corosync-quorumtool -s|grep votes"
- MachineId: "15"
  Stdout: |
Expected votes:   3
Total votes:  3
  UnitId: nova-cloud-controller/0
- MachineId: "28"
  Stdout: |
Expected votes:   3
Total votes:  3
  UnitId: nova-cloud-controller/1
- MachineId: "29"
  Stdout: |
Expected votes:   3
Total votes:  3
  UnitId: nova-cloud-controller/2

I changed the transport mode to UDP by setting:

$ juju set hacluster-ncc corosync_transport=udpu

After this, I moved to the primary node (the one that holds the virtual ip 
address) and I applied the TC
rules, while monitoring the memory usage of the corosync process (multiple 
times)

root@juju-niedbalski-sec-machine-15:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 550ms
root@juju-niedbalski-sec-machine-15:/home/ubuntu# tc qdisc del dev eth0 root 
netem

Apr  6 17:57:37 juju-niedbalski-sec-machine-15 cib[14387]:  warning: 
cib_process_request: Completed cib_apply_diff operation for section 'all': 
Application of an update diff failed (rc=-206, origin=local/cibadmin/2, 
version=0.27.1)
Apr  6 18:04:12 juju-niedbalski-sec-machine-15 corosync[14376]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:04:13 juju-niedbalski-sec-machine-15 corosync[18645]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:06:27 juju-niedbalski-sec-machine-15 corosync[18645]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:06:28 juju-niedbalski-sec-machine-15 corosync[19528]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:07:48 juju-niedbalski-sec-machine-15 corosync[19985]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:07:49 juju-niedbalski-sec-machine-15 corosync[19985]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:08:16 juju-niedbalski-sec-machine-15 corosync[19985]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:08:59 juju-niedbalski-sec-machine-15 corosync[19985]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
Apr  6 18:09:38 juju-niedbalski-sec-machine-15 corosync[19985]:  [MAIN  ] 
Completed service synchronization, ready to provide 
service.

After 5 minutes of observation on the corosync process by using:

 $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep -E
'.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

I don't see any substantial memory usage increase.

root@juju-niedbalski-sec-machine-15:/home/ubuntu# more memory-usage.log 
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928
135584  3928

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  Fix Released
Status in corosync source package in Trusty:
  Fix Committed
Status in corosync source package in Wily:
  Fix Committed

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2
  Flags:Quorate

  Membership information
  

[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-04-06 Thread Jorge Niedbalski
Based on my latest comment, I am marking the Trusty version as
verification-done-trusty

** Tags removed: verification-needed
** Tags added: verification-done-trusty verification-needed-wily

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  Fix Released
Status in corosync source package in Trusty:
  Fix Committed
Status in corosync source package in Wily:
  Fix Committed

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2
  Flags:Quorate

  Membership information
  --
  Nodeid  Votes Name
    1002  1 10.5.1.57 (local)
    1001  1 10.5.1.58
    1000  1 10.5.1.59

  The primary unit was holding the VIP resource 10.5.105.1/16

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
  2: eth0:  mtu 1500 qdisc netem state UP 
group default qlen 1000
  link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
  inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
     valid_lft forever preferred_lft forever
  inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
     valid_lft forever preferred_lft forever

  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.

  $ sudo tc qdisc add dev eth0 root netem delay 350ms

  5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new
  cluster configuration.

  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  This happens on all of the units.

  6) After receiving this message, I remove the queue from eth0:

  $ sudo tc qdisk del dev eth0 root netem

  Then, the following statement is written in the master node:

  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  7) While executing 5 and 6 repeatedly, I ran the following command to track 
the VSZ and RSS memory usage of the
  corosync process:

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

  $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep
  -E '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

  The results shows that both vsz and rss are increased over time at a
  high ratio.

  25476 4036

  ... (after 5 minutes).

  135644 10352

  [Fix]

  So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
  is a good candidate to be backported in Ubuntu Trusty.

  [Test Case]

  * See reproducer

  [Backport Impact]

  * Not identified

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : 

[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-04-01 Thread Jorge Niedbalski
** Patch added: "Wily Pathc"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+attachment/4619421/+files/fix-lp-1563089-wily.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Wily:
  In Progress

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2
  Flags:Quorate

  Membership information
  --
  Nodeid  Votes Name
    1002  1 10.5.1.57 (local)
    1001  1 10.5.1.58
    1000  1 10.5.1.59

  The primary unit was holding the VIP resource 10.5.105.1/16

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
  2: eth0:  mtu 1500 qdisc netem state UP 
group default qlen 1000
  link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
  inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
     valid_lft forever preferred_lft forever
  inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
     valid_lft forever preferred_lft forever

  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.

  $ sudo tc qdisc add dev eth0 root netem delay 350ms

  5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new
  cluster configuration.

  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  This happens on all of the units.

  6) After receiving this message, I remove the queue from eth0:

  $ sudo tc qdisk del dev eth0 root netem

  Then, the following statement is written in the master node:

  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  7) While executing 5 and 6 repeatedly, I ran the following command to track 
the VSZ and RSS memory usage of the
  corosync process:

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

  $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep
  -E '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

  The results shows that both vsz and rss are increased over time at a
  high ratio.

  25476 4036

  ... (after 5 minutes).

  135644 10352

  [Fix]

  So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
  is a good candidate to be backported in Ubuntu Trusty.

  [Test Case]

  * See reproducer

  [Backport Impact]

  * Not identified

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1564250] Re: Corosync upgrade to 2.3.3-1ubuntu2 leaves pacemaker in a stopped state

2016-03-31 Thread Jorge Niedbalski
** Changed in: pacemaker (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1564250

Title:
  Corosync upgrade to 2.3.3-1ubuntu2 leaves pacemaker in a stopped state

Status in pacemaker package in Ubuntu:
  Confirmed

Bug description:
  Using pacemaker version: 1.1.10+git20130802-1ubuntu2.3

  On ubuntu: Ubuntu 14.04.4 LTS

  Corosync upgrade to 2.3.3-1ubuntu2 leaves pacemaker in a stopped
  state. Specifically from version 2.3.3-1ubuntu1

  I have attached logs from such upgrade.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1564250/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-03-31 Thread Jorge Niedbalski
** Patch removed: "Xenial Patch"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+attachment/4617458/+files/fix-lp-1563089-xenial.debdiff

** Patch added: "Xenial Patch"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+attachment/4618461/+files/fix-lp-1563089-xenial.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Wily:
  In Progress

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2
  Flags:Quorate

  Membership information
  --
  Nodeid  Votes Name
    1002  1 10.5.1.57 (local)
    1001  1 10.5.1.58
    1000  1 10.5.1.59

  The primary unit was holding the VIP resource 10.5.105.1/16

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
  2: eth0:  mtu 1500 qdisc netem state UP 
group default qlen 1000
  link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
  inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
     valid_lft forever preferred_lft forever
  inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
     valid_lft forever preferred_lft forever

  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.

  $ sudo tc qdisc add dev eth0 root netem delay 350ms

  5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new
  cluster configuration.

  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  This happens on all of the units.

  6) After receiving this message, I remove the queue from eth0:

  $ sudo tc qdisk del dev eth0 root netem

  Then, the following statement is written in the master node:

  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  7) While executing 5 and 6 repeatedly, I ran the following command to track 
the VSZ and RSS memory usage of the
  corosync process:

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

  $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep
  -E '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

  The results shows that both vsz and rss are increased over time at a
  high ratio.

  25476 4036

  ... (after 5 minutes).

  135644 10352

  [Fix]

  So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
  is a good candidate to be backported in Ubuntu Trusty.

  [Test Case]

  * See reproducer

  [Backport Impact]

  * Not identified

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha

[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-03-30 Thread Jorge Niedbalski
** Patch added: "Xenial Patch"
   
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+attachment/4617458/+files/fix-lp-1563089-xenial.debdiff

** Changed in: corosync (Ubuntu Wily)
   Status: New => In Progress

** Changed in: corosync (Ubuntu Wily)
   Importance: Undecided => High

** Changed in: corosync (Ubuntu Wily)
 Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Wily:
  In Progress

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2
  Flags:Quorate

  Membership information
  --
  Nodeid  Votes Name
    1002  1 10.5.1.57 (local)
    1001  1 10.5.1.58
    1000  1 10.5.1.59

  The primary unit was holding the VIP resource 10.5.105.1/16

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
  2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem state UP 
group default qlen 1000
  link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
  inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
     valid_lft forever preferred_lft forever
  inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
     valid_lft forever preferred_lft forever

  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.

  $ sudo tc qdisc add dev eth0 root netem delay 350ms

  5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new
  cluster configuration.

  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  This happens on all of the units.

  6) After receiving this message, I remove the queue from eth0:

  $ sudo tc qdisk del dev eth0 root netem

  Then, the following statement is written in the master node:

  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  7) While executing 5 and 6 repeatedly, I ran the following command to track 
the VSZ and RSS memory usage of the
  corosync process:

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

  $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep
  -E '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

  The results shows that both vsz and rss are increased over time at a
  high ratio.

  25476 4036

  ... (after 5 minutes).

  135644 10352

  [Fix]

  So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
  is a good candidate to be backported in Ubuntu Trusty.

  [Test Case]

  * See reproducer

  [Backport Impact]

  * Not identified

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+sou

[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-03-30 Thread Jorge Niedbalski
** Changed in: corosync (Ubuntu)
   Status: New => In Progress

** Changed in: corosync (Ubuntu Trusty)
   Status: New => In Progress

** Changed in: corosync (Ubuntu)
   Importance: Undecided => High

** Changed in: corosync (Ubuntu Trusty)
   Importance: Undecided => High

** Changed in: corosync (Ubuntu)
 Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

** Changed in: corosync (Ubuntu Trusty)
 Assignee: (unassigned) => Jorge Niedbalski (niedbalski)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  In Progress
Status in corosync source package in Trusty:
  In Progress
Status in corosync source package in Wily:
  New

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2
  Flags:Quorate

  Membership information
  --
  Nodeid  Votes Name
    1002  1 10.5.1.57 (local)
    1001  1 10.5.1.58
    1000  1 10.5.1.59

  The primary unit was holding the VIP resource 10.5.105.1/16

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
  2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem state UP 
group default qlen 1000
  link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
  inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
     valid_lft forever preferred_lft forever
  inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
     valid_lft forever preferred_lft forever

  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.

  $ sudo tc qdisc add dev eth0 root netem delay 350ms

  5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new
  cluster configuration.

  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  This happens on all of the units.

  6) After receiving this message, I remove the queue from eth0:

  $ sudo tc qdisk del dev eth0 root netem

  Then, the following statement is written in the master node:

  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  7) While executing 5 and 6 repeatedly, I ran the following command to track 
the VSZ and RSS memory usage of the
  corosync process:

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

  $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep
  -E '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

  The results shows that both vsz and rss are increased over time at a
  high ratio.

  25476 4036

  ... (after 5 minutes).

  135644 10352

  [Fix]

  So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
  is a good candidate to be backported in Ubuntu Trusty.

  [Test Case]

  * See reproducer

  [Backport Impact]

  * Not identified

To manage no

[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-03-30 Thread Jorge Niedbalski
** Description changed:

  [Environment]
  
  Trusty 14.04.3
  
  Packages:
  
  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library
  
  [Reproducer]
- 
  
  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.
  
  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo
  
  2) I changed the default corosync transport mode to unicast.
  
  $ juju set cinder-hacluster corosync_transport=udpu
  
  3) I assured that the 3 units were quorated
  
- cinder/0# corosync-quorumtool 
+ cinder/0# corosync-quorumtool
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
- Quorum:   2  
- Flags:Quorate 
+ Quorum:   2
+ Flags:Quorate
  
  Membership information
  --
- Nodeid  Votes Name
-   1002  1 10.5.1.57 (local)
-   1001  1 10.5.1.58
-   1000  1 10.5.1.59
+ Nodeid  Votes Name
+   1002  1 10.5.1.57 (local)
+   1001  1 10.5.1.58
+   1000  1 10.5.1.59
  
  The primary unit was holding the VIP resource 10.5.105.1/16
  
- root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr 
+ root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr
  2: eth0:  mtu 1500 qdisc netem state UP 
group default qlen 1000
- link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
- inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
-valid_lft forever preferred_lft forever
- inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
-valid_lft forever preferred_lft forever
+ link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
+ inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
+    valid_lft forever preferred_lft forever
+ inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
+    valid_lft forever preferred_lft forever
  
  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.
  
  $ sudo tc qdisc add dev eth0 root netem delay 350ms
  
- 5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new 
+ 5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new
  cluster configuration.
-  
+ 
  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
  
  This happens on all of the units.
  
  6) After receiving this message, I remove the queue from eth0:
  
  $ sudo tc qdisk del dev eth0 root netem
  
  Then, the following statement is written in the master node:
  
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.
  
- 
- 7) While executing 5 and 6 repeatedly, I ran the following command to track 
the SZ and RSS memory usage of the
+ 7) While executing 5 and 6 repeatedly, I ran the following command to track 
the VSZ and RSS memory usage of the
  corosync process:
  
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem
  
- $ sudo while true; do ps -o sz,rss -p $(pgrep corosync) 2>&1 | grep -E
+ $ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep -E
  '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done
  
- The results shows that both sz and rss are increased over time at a high
- ratio.
+ The results shows that both vsz and rss are increased over time at a
+ high ratio.
  
  25476 4036
  
  ... (after 5 minutes).
  
  135644 10352
  
  [Fix]
  
- So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
 
+ So preliminary 

[Ubuntu-ha] [Bug 1530837] Re: Logsys file leaks in /dev/shm after sigabrt, sigsegv and when running corosync -v

2016-03-29 Thread Jorge Niedbalski
I just ran a verification on this package.

root@juju-niedbalski-sec-machine-27:/home/ubuntu# for file in $(strace -e open 
-i corosync -v 2>&1 | grep -E '.*shm.*' |grep -Po '".*?"'| sed -e s/\"//g); do 
du -sh $file; done
12K /dev/shm/qb-corosync-blackbox-header
8.1M/dev/shm/qb-corosync-blackbox-data

After enabling proposed

root@juju-niedbalski-sec-machine-27:/home/ubuntu# dpkg -l | grep corosync
ii  corosync 2.3.3-1ubuntu2   amd64 
   Standards-based cluster framework (daemon and modules)
ii  libcorosync-common4  2.3.3-1ubuntu2   amd64 
   Standards-based cluster framework, common library

root@juju-niedbalski-sec-machine-27:/home/ubuntu# for file in $(strace -e open 
-i corosync -v 2>&1 | grep -E '.*shm.*' |grep -Po '".*?"'| sed -e s/\"//g); do 
du -sh $file; done
du: cannot access ‘/dev/shm/qb-corosync-blackbox-header’: No such file or 
directory
du: cannot access ‘/dev/shm/qb-corosync-blackbox-data’: No such file or 
directory

So, seems to be fixed.

** Tags removed: verification-needed
** Tags added: verification-done

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1530837

Title:
  Logsys file leaks in /dev/shm after sigabrt, sigsegv and when running
  corosync -v

Status in corosync package in Ubuntu:
  Fix Released
Status in corosync source package in Trusty:
  Fix Committed

Bug description:
  [Impact]

   * corosync has a memory leak problem with multiple calls to corosync -v
   * corosync has a memory leak problem by not properly handling signals

  [Test Case]

   * run "corosync -v" multiple times
   * some cloud tools do that

  [Regression Potential] 
   
   * minor code changes on not-core code
   * based on upstream changes
   * based on a redhat fix

  [Other Info]

  # Original BUG Description

  It was brought to my attention that Ubuntu also suffers from:

  https://bugzilla.redhat.com/show_bug.cgi?id=1117911

  And corosync should include the following fixes:

  

  commit dfaca4b10a005681230a81e229384b6cd239b4f6
  Author: Jan Friesse 
  Date: Wed Jul 9 15:52:14 2014 +0200

  Fix compiler warning introduced by previous patch

  QB loop signal handler prototype differs from signal(2) prototype.
  Solution is to create wrapper functions.

  Signed-off-by: Jan Friesse 

  commit 384760cb670836dc37e243f594612c6e68f44351
  Author: zouyu 
  Date: Thu Jul 3 10:56:02 2014 +0800

  Handle SIGSEGV and SIGABRT signals

  SIGSEGV and SIGABRT signals are now correctly handled (blackbox is
  dumped and logsys is finalized).

  Signed-off-by: zouyu 
  Reviewed-by: Jan Friesse 

  commit cc80c8567d6eec1d136f9e85d2f8dfb957337eef
  Author: zouyu 
  Date: Wed Jul 2 10:00:53 2014 +0800

  fix memory leak produced by 'corosync -v'

  Signed-off-by: zouyu 
  Reviewed-by: Jan Friesse 

  

  Description from Red Hat bug:

  """
  Description of problem:
  When corosync receives sigabrt or sigsegv it doesn't delete libqb blackbox 
file (/dev/shm one). Same happens when corosync is executed with -v parameter 
(this shows only version, so it shouldn't cause leak in /dev/shm).

  Version-Release number of selected component (if applicable):
  7.0

  How reproducible:
  100%

  Steps to Reproduce 1:
  1. Start corosync
  2. Send sigabrt to corosync

  Steps to Reproduce 1:
  1. Execute corosync -v

  Actual results:
  File like qb-corosync-*-blackbox-data|header exists results in leak of 
/dev/shm space.

  Expected results:
  No leak

  Additional info:
  """

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1530837/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1563089] Re: Memory Leak when new cluster configuration is formed.

2016-03-29 Thread Jorge Niedbalski
** Summary changed:

- Memory Leak when new configuration is formed.
+ Memory Leak when new cluster configuration is formed.

** Tags added: sts-needs-review

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new cluster configuration is formed.

Status in corosync package in Ubuntu:
  New
Status in corosync source package in Trusty:
  New

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  
  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync transport mode to unicast.

  $ juju set cinder-hacluster corosync_transport=udpu

  3) I assured that the 3 units were quorated

  cinder/0# corosync-quorumtool 
  Votequorum information
  --
  Expected votes:   3
  Highest expected: 3
  Total votes:  3
  Quorum:   2  
  Flags:Quorate 

  Membership information
  --
  Nodeid  Votes Name
1002  1 10.5.1.57 (local)
1001  1 10.5.1.58
1000  1 10.5.1.59

  The primary unit was holding the VIP resource 10.5.105.1/16

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr 
  2: eth0:  mtu 1500 qdisc netem state UP 
group default qlen 1000
  link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
  inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
 valid_lft forever preferred_lft forever
  inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
 valid_lft forever preferred_lft forever

  4) I manually added a TC queue for the eth0 interface on the node
  holding the VIP resource, introducing a 350 ms delay.

  $ sudo tc qdisc add dev eth0 root netem delay 350ms

  5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new 
  cluster configuration.
   
  Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  This happens on all of the units.

  6) After receiving this message, I remove the queue from eth0:

  $ sudo tc qdisk del dev eth0 root netem

  Then, the following statement is written in the master node:

  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
  Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

  
  7) While executing 5 and 6 repeatedly, I ran the following command to track 
the SZ and RSS memory usage of the
  corosync process:

  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
  root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

  $ sudo while true; do ps -o sz,rss -p $(pgrep corosync) 2>&1 | grep -E
  '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

  The results shows that both sz and rss are increased over time at a
  high ratio.

  25476 4036

  ... (after 5 minutes).

  135644 10352

  [Fix]

  So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
 
  is a good candidate to be backported in Ubuntu Trusty.

  [Test Case]

  * See reproducer

  [Backport Impact]

  * Not identified

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1563089/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1563089] [NEW] Memory Leak when new configuration is formed.

2016-03-28 Thread Jorge Niedbalski
Public bug reported:

[Environment]

Trusty 14.04.3

Packages:

ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

[Reproducer]


1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
with a 3 nodes installation of cinder related to an HACluster subordinate unit.

$ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

2) I changed the default corosync transport mode to unicast.

$ juju set cinder-hacluster corosync_transport=udpu

3) I assured that the 3 units were quorated

cinder/0# corosync-quorumtool 
Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  3
Quorum:   2  
Flags:Quorate 

Membership information
--
Nodeid  Votes Name
  1002  1 10.5.1.57 (local)
  1001  1 10.5.1.58
  1000  1 10.5.1.59

The primary unit was holding the VIP resource 10.5.105.1/16

root@juju-niedbalski-sec-machine-4:/home/ubuntu# ip addr 
2: eth0:  mtu 1500 qdisc netem state UP group 
default qlen 1000
link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
   valid_lft forever preferred_lft forever
inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
   valid_lft forever preferred_lft forever

4) I manually added a TC queue for the eth0 interface on the node
holding the VIP resource, introducing a 350 ms delay.

$ sudo tc qdisc add dev eth0 root netem delay 350ms

5) Right after adding the 350ms on the cinder/0 unit, the corosync process 
informs that one of the processors failed, and is forming a new 
cluster configuration.
 
Mar 28 21:57:41 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A 
processor failed, forming new configuration.
Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [QUORUM] 
Members[3]: 1002 1001 1000
Mar 28 22:00:48 juju-niedbalski-sec-machine-5 corosync[4584]:  [MAIN  ] 
Completed service synchronization, ready to provide service.

This happens on all of the units.

6) After receiving this message, I remove the queue from eth0:

$ sudo tc qdisk del dev eth0 root netem

Then, the following statement is written in the master node:

Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [TOTEM ] A new 
membership (10.5.1.57:11628) was formed. Members
Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [QUORUM] 
Members[3]: 1002 1001 1000
Mar 28 22:00:48 juju-niedbalski-sec-machine-4 corosync[9630]:  [MAIN  ] 
Completed service synchronization, ready to provide service.


7) While executing 5 and 6 repeatedly, I ran the following command to track the 
SZ and RSS memory usage of the
corosync process:

root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc add dev eth0 root 
netem delay 350ms
root@juju-niedbalski-sec-machine-4:/home/ubuntu# tc qdisc del dev eth0 root 
netem

$ sudo while true; do ps -o sz,rss -p $(pgrep corosync) 2>&1 | grep -E
'.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done

The results shows that both sz and rss are increased over time at a high
ratio.

25476 4036

... (after 5 minutes).

135644 10352

[Fix]

So preliminary based on this reproducer, I think that this commit 
(https://github.com/corosync/corosync/commit/600fb4084adcbfe7678b44a83fa8f3d3550f48b9)
 
is a good candidate to be backported in Ubuntu Trusty.

[Test Case]

* See reproducer

[Backport Impact]

* Not identified

** Affects: corosync (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1563089

Title:
  Memory Leak when new configuration is formed.

Status in corosync package in Ubuntu:
  New

Bug description:
  [Environment]

  Trusty 14.04.3

  Packages:

  ii  corosync 2.3.3-1ubuntu1
amd64Standards-based cluster framework (daemon and modules)
  ii  libcorosync-common4  2.3.3-1ubuntu1
amd64Standards-based cluster framework, common library

  [Reproducer]

  
  1) I deployed an HA environment using this bundle 
(http://bazaar.launchpad.net/~ost-maintainers/openstack-charm-testing/trunk/view/head:/bundles/dev/next-ha.yaml)
  with a 3 nodes installation of cinder related to an HACluster subordinate 
unit.

  $ juju-deployer -c next-ha.yaml -w 600 trusty-kilo

  2) I changed the default corosync 

[Ubuntu-ha] [Bug 1477198] Re: Stop doesn't works on Trusty

2015-07-23 Thread Jorge Niedbalski
I have verified that the -proposed package fixes the issue. Thanks.

root@juju-testing-machine-18:/home/ubuntu# service haproxy restart
 * Restarting haproxy haproxy   

   [ OK ] 
root@juju-testing-machine-18:/home/ubuntu# ps aux|grep haproxy
haproxy   8530  0.0  0.0  20300   636 ?Ss   19:47   0:00 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
root@juju-testing-machine-18:/home/ubuntu# service haproxy stop
 * Stopping haproxy haproxy 

   [ OK ] 
root@juju-testing-machine-18:/home/ubuntu# ps aux|grep haproxy
root@juju-testing-machine-18:/home/ubuntu# service haproxy start
 * Starting haproxy haproxy 

   [ OK ] 
root@juju-testing-machine-18:/home/ubuntu# ps aux|grep haproxy
haproxy   8567  0.0  0.0  20300   632 ?Ss   19:47   0:00 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
root@juju-testing-machine-18:/home/ubuntu# service haproxy restart
 * Restarting haproxy haproxy   

   [ OK ] 
root@juju-testing-machine-18:/home/ubuntu# service haproxy restart
 * Restarting haproxy haproxy   

   [ OK ] 
root@juju-testing-machine-18:/home/ubuntu# ps aux|grep haproxy
haproxy   8607  0.0  0.0  20300   636 ?Ss   19:47   0:00 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
root  8611  0.0  0.0  10432   624 pts/0S+   19:47   0:00 grep 
--color=auto haproxy
root@juju-testing-machine-18:/home/ubuntu# 


** Tags removed: verification-needed
** Tags added: verification-done

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1477198

Title:
  Stop doesn't works on Trusty

Status in haproxy package in Ubuntu:
  Fix Released
Status in haproxy source package in Trusty:
  Fix Committed

Bug description:
  [Description]

  The stop method is not working properly. I removed the --oknodo 
  --quiet and is returning (No /usr/sbin/haproxy found running; none
  killed)

  I think this is a regression caused by the incorporation of this lines
  on the stop method:

  + for pid in $(cat $PIDFILE); do
  + start-stop-daemon --quiet --oknodo --stop \
  + --retry 5 --pid $pid --exec $HAPROXY || ret=$?

  root@juju-machine-1-lxc-0:~# service haproxy status
  haproxy is running.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root1513 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy restart
   * Restarting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2277 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy restart
   * Restarting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2523 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy stop
   * Stopping haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2584 906  0 14:34 pts/600:00:00 grep 

[Ubuntu-ha] [Bug 1477198] Re: Stop doesn't works on Trusty

2015-07-22 Thread Jorge Niedbalski
** Patch added: Trusty Patch
   
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1477198/+attachment/4432608/+files/fix-lp-1477198-trusty.patch

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1477198

Title:
  Stop doesn't works on Trusty

Status in haproxy package in Ubuntu:
  Fix Released
Status in haproxy source package in Trusty:
  In Progress

Bug description:
  [Description]

  The stop method is not working properly. I removed the --oknodo 
  --quiet and is returning (No /usr/sbin/haproxy found running; none
  killed)

  I think this is a regression caused by the incorporation of this lines
  on the stop method:

  + for pid in $(cat $PIDFILE); do
  + start-stop-daemon --quiet --oknodo --stop \
  + --retry 5 --pid $pid --exec $HAPROXY || ret=$?

  root@juju-machine-1-lxc-0:~# service haproxy status
  haproxy is running.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root1513 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy restart
   * Restarting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2277 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy restart
   * Restarting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2523 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy stop
   * Stopping haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2584 906  0 14:34 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy start
   * Starting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2591   1  0 14:34 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2610 906  0 14:34 pts/600:00:00 grep --color=auto haproxy

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1477198/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1477198] Re: Stop doesn't works on Trusty

2015-07-22 Thread Jorge Niedbalski
** Changed in: haproxy (Ubuntu Trusty)
   Status: New = In Progress

** Changed in: haproxy (Ubuntu Trusty)
 Assignee: (unassigned) = Jorge Niedbalski (niedbalski)

** Changed in: haproxy (Ubuntu Trusty)
   Importance: Undecided = Critical

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1477198

Title:
  Stop doesn't works on Trusty

Status in haproxy package in Ubuntu:
  Fix Released
Status in haproxy source package in Trusty:
  In Progress

Bug description:
  [Description]

  The stop method is not working properly. I removed the --oknodo 
  --quiet and is returning (No /usr/sbin/haproxy found running; none
  killed)

  I think this is a regression caused by the incorporation of this lines
  on the stop method:

  + for pid in $(cat $PIDFILE); do
  + start-stop-daemon --quiet --oknodo --stop \
  + --retry 5 --pid $pid --exec $HAPROXY || ret=$?

  root@juju-machine-1-lxc-0:~# service haproxy status
  haproxy is running.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root1513 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy restart
   * Restarting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2277 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy restart
   * Restarting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2523 906  0 14:33 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy stop
   * Stopping haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2584 906  0 14:34 pts/600:00:00 grep --color=auto haproxy
  root@juju-machine-1-lxc-0:~# service haproxy start
   * Starting haproxy haproxy
     ...done.
  root@juju-machine-1-lxc-0:~# ps -ef| grep haproxy
  haproxy 1269   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2169   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2505   1  0 14:33 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  haproxy 2591   1  0 14:34 ?00:00:00 /usr/sbin/haproxy -f 
/etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid
  root2610 906  0 14:34 pts/600:00:00 grep --color=auto haproxy

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1477198/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1468879] Re: Haproxy doesn't checks for configuration on start/reload

2015-07-15 Thread Jorge Niedbalski
After enabling -proposed and install haproxy.

Edit /etc/default/haproxy, set ENABLED=1

I modified an entry in the configuration /etc/haproxy/haproxy.cfg 
global
log /dev/loglocal0
log /dev/loglocal1 notice
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon

defaults
log globalss

service haproxy restart
 * Restarting haproxy haproxy   
   

[ALERT] 195/191904 (5640) : parsing [/etc/haproxy/haproxy.cfg:10] : 'log' 
expects either address[:port] and facility or 'global' as arguments.
[ALERT] 195/191904 (5640) : Error(s) found in configuration file : 
/etc/haproxy/haproxy.cfg
[WARNING] 195/191904 (5640) : config : log format ignored for proxy 'http-in' 
since it has no log address.
[ALERT] 195/191904 (5640) : Fatal errors found in configuration.

[fail]

** Tags removed: verification-needed
** Tags added: verification-done

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1468879

Title:
  Haproxy doesn't checks for configuration on start/reload

Status in haproxy package in Ubuntu:
  Fix Released
Status in haproxy source package in Trusty:
  Fix Released

Bug description:
  [Environment]

  Trusty 14.04.2

  [Description]

  Configuration is not tested before service start or reload

  [Suggested Fix]

  Backport current check_haproxy_config function from utopic+.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1468879/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1468879] Re: Haproxy doesn't checks for configuration on start/reload

2015-06-25 Thread Jorge Niedbalski
** Changed in: haproxy (Ubuntu Trusty)
   Status: New = In Progress

** Changed in: haproxy (Ubuntu Trusty)
 Assignee: (unassigned) = Jorge Niedbalski (niedbalski)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1468879

Title:
  Haproxy doesn't checks for configuration on start/reload

Status in haproxy package in Ubuntu:
  Fix Released
Status in haproxy source package in Trusty:
  In Progress

Bug description:
  [Environment]

  Trusty 14.04.2

  [Description]

  Configuration is not tested before service start or reload

  [Suggested Fix]

  Backport current check_haproxy_config function from utopic+.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1468879/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1468879] [NEW] Haproxy doesn't checks for configuration on start/reload

2015-06-25 Thread Jorge Niedbalski
Public bug reported:

[Environment]

Trusty 14.04.2

[Description]

Configuration is not tested before service start or reload

[Suggested Fix]

Backport current check_haproxy_config function from utopic+.

** Affects: haproxy (Ubuntu)
 Importance: Undecided
 Status: Fix Released

** Changed in: haproxy (Ubuntu)
   Status: New = Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1468879

Title:
  Haproxy doesn't checks for configuration on start/reload

Status in haproxy package in Ubuntu:
  Fix Released

Bug description:
  [Environment]

  Trusty 14.04.2

  [Description]

  Configuration is not tested before service start or reload

  [Suggested Fix]

  Backport current check_haproxy_config function from utopic+.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1468879/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1462495] Re: Init file does not respect the LSB spec.

2015-06-08 Thread Jorge Niedbalski
** Description changed:

  [Environment]
  
- Trusty 14.02
+ Trusty 14.04.2
  
  [Description]
  
  Looking in the /etc/init.d/haproxy script, particularly the stop method,
  is returning 4 in case of the pidfile doesn't exists.
  
-  /bin/kill $pid || return 4.
+  /bin/kill $pid || return 4.
  
  According to the spec that means 'insufficient privileges' which is not 
correct. This is causing pacemaker and other
  system monitoring tools to fail because it doesn't complains with LSB.
  
  An example:
  
  Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: 
glance02-res_glance_haproxy_monitor_5000:22 [ haproxy dead, but 
/var/run/haproxy.pid exists.\n ]
  Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: LRM 
operation res_glance_haproxy_stop_0 (call=33, rc=4, cib-update=19, 
confirmed=true) insufficient privileges
  
  Reference:
  
  haproxy_stop()
  {
- if [ ! -f $PIDFILE ] ; then
- # This is a success according to LSB
- return 0
- fi
- for pid in $(cat $PIDFILE) ; do
- /bin/kill $pid || return 4
- done
- rm -f $PIDFILE
- return 0
+ if [ ! -f $PIDFILE ] ; then
+ # This is a success according to LSB
+ return 0
+ fi
+ for pid in $(cat $PIDFILE) ; do
+ /bin/kill $pid || return 4
+ done
+ rm -f $PIDFILE
+ return 0
  }
  
  [Proposed solution]
  
  Backport the current devel (wily) init.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1462495

Title:
  Init file does not respect the LSB spec.

Status in haproxy package in Ubuntu:
  In Progress
Status in haproxy source package in Trusty:
  New
Status in haproxy source package in Utopic:
  New

Bug description:
  [Environment]

  Trusty 14.04.2

  [Description]

  Looking in the /etc/init.d/haproxy script, particularly the stop
  method, is returning 4 in case of the pidfile doesn't exists.

   /bin/kill $pid || return 4.

  According to the spec that means 'insufficient privileges' which is not 
correct. This is causing pacemaker and other
  system monitoring tools to fail because it doesn't complains with LSB.

  An example:

  Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: 
glance02-res_glance_haproxy_monitor_5000:22 [ haproxy dead, but 
/var/run/haproxy.pid exists.\n ]
  Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: LRM 
operation res_glance_haproxy_stop_0 (call=33, rc=4, cib-update=19, 
confirmed=true) insufficient privileges

  Reference:

  haproxy_stop()
  {
  if [ ! -f $PIDFILE ] ; then
  # This is a success according to LSB
  return 0
  fi
  for pid in $(cat $PIDFILE) ; do
  /bin/kill $pid || return 4
  done
  rm -f $PIDFILE
  return 0
  }

  [Proposed solution]

  Backport the current devel (wily) init.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1462495/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1462495] [NEW] Init file does not respect the LSB spec.

2015-06-05 Thread Jorge Niedbalski
Public bug reported:

[Environment]

Trusty 14.02

[Description]

Looking in the /etc/init.d/haproxy script, particularly the stop method,
is returning 4 in case of the pidfile doesn't exists.

 /bin/kill $pid || return 4.

According to the spec that means 'insufficient privileges' which is not 
correct. This is causing pacemaker and other
system monitoring tools to fail because it doesn't complains with LSB.

An example:

Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: 
glance02-res_glance_haproxy_monitor_5000:22 [ haproxy dead, but 
/var/run/haproxy.pid exists.\n ]
Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: LRM operation 
res_glance_haproxy_stop_0 (call=33, rc=4, cib-update=19, confirmed=true) 
insufficient privileges

Reference:

haproxy_stop()
{
if [ ! -f $PIDFILE ] ; then
# This is a success according to LSB
return 0
fi
for pid in $(cat $PIDFILE) ; do
/bin/kill $pid || return 4
done
rm -f $PIDFILE
return 0
}

[Proposed solution]

Backport the current devel (wily) init.

** Affects: haproxy (Ubuntu)
 Importance: High
 Assignee: Jorge Niedbalski (niedbalski)
 Status: In Progress

** Changed in: haproxy (Ubuntu)
   Status: New = In Progress

** Changed in: haproxy (Ubuntu)
   Importance: Undecided = High

** Changed in: haproxy (Ubuntu)
 Assignee: (unassigned) = Jorge Niedbalski (niedbalski)

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1462495

Title:
  Init file does not respect the LSB spec.

Status in haproxy package in Ubuntu:
  In Progress

Bug description:
  [Environment]

  Trusty 14.02

  [Description]

  Looking in the /etc/init.d/haproxy script, particularly the stop
  method, is returning 4 in case of the pidfile doesn't exists.

   /bin/kill $pid || return 4.

  According to the spec that means 'insufficient privileges' which is not 
correct. This is causing pacemaker and other
  system monitoring tools to fail because it doesn't complains with LSB.

  An example:

  Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: 
glance02-res_glance_haproxy_monitor_5000:22 [ haproxy dead, but 
/var/run/haproxy.pid exists.\n ]
  Jun  2 12:52:13 glance02 crmd[2518]:   notice: process_lrm_event: LRM 
operation res_glance_haproxy_stop_0 (call=33, rc=4, cib-update=19, 
confirmed=true) insufficient privileges

  Reference:

  haproxy_stop()
  {
  if [ ! -f $PIDFILE ] ; then
  # This is a success according to LSB
  return 0
  fi
  for pid in $(cat $PIDFILE) ; do
  /bin/kill $pid || return 4
  done
  rm -f $PIDFILE
  return 0
  }

  [Proposed solution]

  Backport the current devel (wily) init.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1462495/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1353473] Re: Pacemaker crm node standby stops resource successfully, but lrmd still monitors it and causes Failed actions

2014-10-10 Thread Jorge Niedbalski
** Tags added: cts

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1353473

Title:
  Pacemaker crm node standby stops resource successfully, but lrmd
  still monitors it and causes Failed actions

Status in “pacemaker” package in Ubuntu:
  Fix Released
Status in “pacemaker” source package in Trusty:
  Fix Released
Status in “pacemaker” package in Debian:
  New

Bug description:
  [Impact]

   * Whenever a user uses crm node standby the code can make lrmd still
     try to monitor resource put into stand-by and cause error messages.

  [Test Case]

   * To use crm node standby and check lrmd does not stop monitoring
     not set to stand-by.

  [Regression Potential]

   * users already tested and are using in production.
   * based on upstream fixes for lrmd monitoring.
   * potential race conditions (based on upstream history).

  [Other Info]

   * Original bug description:

  

  It was brought to me (~inaddy) the following situation:

  

  * Environment
  Ubuntu 14.04 LTS
  Pacemaker 1.1.10+git20130802-1ubuntu2

  * Priority
  High

  * Issue
  I used crm node standby and the resource(haproxy) was stopped successfully. 
But lrmd still monitors it and causes Failed actions.

  ---
  Node A1LB101 (167969461): standby
  Online: [ A1LB102 ]

  Resource Group: grpHaproxy
  vip-internal (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-external (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-nfs (ocf::heartbeat:IPaddr2): Started A1LB102
  vip-iscsi (ocf::heartbeat:IPaddr2): Started A1LB102
  Resource Group: grpStonith1
  prmStonith1-1 (stonith:external/stonith-helper): Started A1LB102
  Clone Set: clnHaproxy [haproxy]
  Started: [ A1LB102 ]
  Stopped: [ A1LB101 ]
  Clone Set: clnPing [ping]
  Started: [ A1LB102 ]
  Stopped: [ A1LB101 ]

  Node Attributes:
  * Node A1LB101:
  * Node A1LB102:
  + default_ping_set : 400

  Migration summary:
  * Node A1LB101:
  haproxy: migration-threshold=1 fail-count=18 last-failure='Mon Jul 7 20:28:58 
2014'
  * Node A1LB102:

  Failed actions:
  haproxy_monitor_1 (node=A1LB101, call=2332, rc=7, status=complete, 
last-rc-change=Mon Jul 7 20:28:58 2014
  , queued=0ms, exec=0ms
  ): not running
  ---

  Abstract from log (ha-log.node1)
  Jul 7 20:28:50 A1LB101 crmd[6364]: notice: te_rsc_command: Initiating action 
42: stop haproxy_stop_0 on A1LB101 (local)
  Jul 7 20:28:50 A1LB101 crmd[6364]: info: match_graph_event: Action 
haproxy_stop_0 (42) confirmed on A1LB101 (rc=0)
  Jul 7 20:28:58 A1LB101 crmd[6364]: notice: process_lrm_event: 
A1LB101-haproxy_monitor_1:1372 [ haproxy not running.\n ]

  

  I wasn't able to reproduce this error so far but the fix seems a
  straightforward cherry-picking from upstream patch set fix:

  48f90f6 Fix: services: Do not allow duplicate recurring op entries
  c29ab27 High: lrmd: Merge duplicate recurring monitor operations
  348bb51 Fix: lrmd: Cancel recurring operations before stop action is executed

  So I'm assuming (and testing right now) this will fix the issue...
  Opening the public bug for the fix I'll provide after tests, and to
  ask others to test the fix also.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1353473/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp


[Ubuntu-ha] [Bug 1368737] Re: Pacemaker can seg fault on crm node online/standy

2014-10-10 Thread Jorge Niedbalski
** Tags added: cts

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1368737

Title:
  Pacemaker can seg fault on crm node online/standy

Status in “pacemaker” package in Ubuntu:
  Confirmed

Bug description:
  It was brought to my attention the following situation:

  
  [Issue] 

  lrmd process crashed when repeating crm node standby and crm node
  online

   
  # grep pacemakerd ha-log.k1pm101 | grep core 
  Aug 27 17:47:06 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 49275 (lrmd) dumped core 
  Aug 27 17:47:06 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=49275, core=1) 
  Aug 27 18:27:14 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 1471 (lrmd) dumped core 
  Aug 27 18:27:14 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=1471, core=1) 
  Aug 27 18:56:41 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 35771 (lrmd) dumped core 
  Aug 27 18:56:41 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=35771, core=1) 
  Aug 27 19:44:09 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 60709 (lrmd) dumped core 
  Aug 27 19:44:09 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=60709, core=1) 
  Aug 27 20:00:53 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 35838 (lrmd) dumped core 
  Aug 27 20:00:53 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=35838, core=1) 
  Aug 27 21:33:52 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 49249 (lrmd) dumped core 
  Aug 27 21:33:52 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=49249, core=1) 
  Aug 27 22:01:16 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 65358 (lrmd) dumped core 
  Aug 27 22:01:16 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=65358, core=1) 
  Aug 27 22:28:02 k1pm101 pacemakerd[49271]: error: child_waitpid: Managed 
process 22693 (lrmd) dumped core 
  Aug 27 22:28:02 k1pm101 pacemakerd[49271]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=22693, core=1) 
   

   
  # grep pacemakerd ha-log.k1pm102 | grep core 
  Aug 27 15:32:48 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed 
process 5812 (lrmd) dumped core 
  Aug 27 15:32:48 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=5812, core=1) 
  Aug 27 15:52:52 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed 
process 35781 (lrmd) dumped core 
  Aug 27 15:52:52 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=35781, core=1) 
  Aug 27 16:02:54 k1pm102 pacemakerd[5808]: error: child_waitpid: Managed 
process 51984 (lrmd) dumped core 
  Aug 27 16:02:54 k1pm102 pacemakerd[5808]: notice: pcmk_child_exit: Child 
process lrmd terminated with signal 11 (pid=51984, core=1) 
  

  Analyzing core file with dbgsyms I could see that:

  #0  0x7f7184a45983 in services_action_sync (op=0x7f7185b605d0) at 
services.c:434
  434   crm_trace(   stdout: %s, op-stdout_data);

  Is responsible for the core.

  I've checked upstream code and there might be 2 important commits that
  could be cherry-picked to fix this behavior:

  commit f2a637cc553cb7aec59bdcf05c5e1d077173419f
  Author: Andrew Beekhof and...@beekhof.net
  Date:   Fri Sep 20 12:20:36 2013 +1000

  Fix: services: Prevent use-of-NULL when executing service actions

  commit 11473a5a8c88eb17d5e8d6cd1d99dc497e817aac
  Author: Gao,Yan y...@suse.com
  Date:   Sun Sep 29 12:40:18 2013 +0800

  Fix: services: Fix the executing of synchronous actions

  The core can be caused by things such as this missing code:

  if (op == NULL) { 
  crm_trace(No operation to execute); 
  return FALSE; 

  on the beginning of services_action_sync(svc_action_t * op)
  function.

  And improved by commit #11473a5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1368737/+subscriptions

___
Mailing list: https://launchpad.net/~ubuntu-ha
Post to : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp