@nacc:

The error condition was that when corosync restarted, pacemaker
disconnected (as was normal) and then tried reconnecting, but when
reconnecting ran into this error:

error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library
error (2)

So, pacemaker is trying to re-handshake with the revived corosync, and
when it does, the api fails due to a library error.  Given that it's the
CPG API, and the libcpg4 package was updated, I'd guess that there was
an incompatible patch added to the libcpg4 library was incompatible with
the previous version of libcpg4 that was in-memory linked into the
running pacemaker binary.  Once we restarted the dead pacemaker service,
pacemaker reloaded the new library and was able to connect to the CPG
API as normal.

I don't know if that's a library failure or a change to the CPG API that
was not version-compatible with the previously running version of
libcpg4 whenever the dying pacemaker had been started.

The issue occured in trusty and xenial clouds across Mitaka and Ocata
cloud archives.

** Changed in: corosync (Ubuntu)
       Status: Incomplete => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1740892

Title:
  corosync upgrade on 2018-01-02 caused pacemaker to fail

Status in OpenStack hacluster charm:
  Invalid
Status in corosync package in Ubuntu:
  In Progress
Status in pacemaker package in Ubuntu:
  New

Bug description:
  During upgrades on 2018-01-02, corosync and it's libs were upgraded:

  (from a trusty/mitaka cloud)

  Upgrade: libcmap4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
  corosync:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcfg6:amd64
  (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libcpg4:amd64 (2.3.3-1ubuntu3,
  2.3.3-1ubuntu4), libquorum5:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
  libcorosync-common4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4),
  libsam4:amd64 (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libvotequorum6:amd64
  (2.3.3-1ubuntu3, 2.3.3-1ubuntu4), libtotem-pg5:amd64 (2.3.3-1ubuntu3,
  2.3.3-1ubuntu4)

  During this process, it appears that pacemaker service is restarted
  and it errors:

  syslog:Jan  2 16:09:33 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node 
juju-machine-1-lxc-3[1001] - state is now lost (was member)
  syslog:Jan  2 16:09:34 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node 
juju-machine-1-lxc-3[1001] - state is now member (was lost)
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:    error: 
cfg_connection_destroy: Connection destroyed
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: 
pcmk_shutdown_worker: Shuting down Pacemaker
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:   notice: 
stop_child: Stopping crmd: Sent -15 to process 2050
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:    error: 
pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
  syslog:Jan  2 16:14:32 juju-machine-0-lxc-4 pacemakerd[1994]:    error: 
mcp_cpg_destroy: Connection destroyed

  
  Also affected xenial/ocata

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-hacluster/+bug/1740892/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to