[Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc

Felipe Reyes Wed, 06 May 2015 14:51:12 -0700

I'm seeing this problem in another environment, similar deployment (3
lxc containers)


Apr 20 16:39:26 juju-machine-3-lxc-4 crm_verify[31774]:   notice: crm_log_args: 
Invoked: crm_verify -V -p 
Apr 20 16:39:27 juju-machine-3-lxc-4 cibadmin[31786]:   notice: crm_log_args: 
Invoked: cibadmin -p -P 
Apr 20 16:50:01 juju-machine-3-lxc-4 cib[780]:    error: pcmk_cpg_dispatch: 
Connection to the CPG API failed: Library error (2)
Apr 20 16:50:01 juju-machine-3-lxc-4 cib[780]:    error: cib_cs_destroy: 
Corosync connection lost!  Exiting.
Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[785]:    error: crmd_quorum_destroy: 
connection terminated
Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]:    error: pcmk_cpg_dispatch: 
Connection to the CPG API failed: Library error (2)
Apr 20 16:50:01 juju-machine-3-lxc-4 stonith-ng[781]:    error: 
pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[785]:   notice: crmd_exit: Forcing 
immediate exit: Link has been severed (67)
Apr 20 16:50:01 juju-machine-3-lxc-4 lrmd[782]:  warning: qb_ipcs_event_sendv: 
new_event_notification (782-785-6): Bad file descriptor (9)
Apr 20 16:50:01 juju-machine-3-lxc-4 lrmd[782]:  warning: send_client_notify: 
Notification of client crmd/8ad990ba-cf09-4ba3-b74b-a7d05d377a1b failed
Apr 20 16:50:01 juju-machine-3-lxc-4 lrmd[782]:    error: crm_abort: 
crm_glib_handler: Forked child 760 to record non-fatal assert at logging.c:63 : 
Source ID 4601370 was not found when attempting to remove it
Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]:    error: 
pcmk_child_exit: Child process cib (780) exited: Invalid argument (22)
Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]:   notice: 
pcmk_process_exit: Respawning failed child process: cib
Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]:    error: 
pcmk_child_exit: Child process crmd (785) exited: Link has been severed (67)
Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]:   notice: 
pcmk_process_exit: Respawning failed child process: crmd
Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]:     crit: attrd_cs_destroy: 
Lost connection to Corosync service!
Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]:   notice: main: Exiting...
Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]:   notice: main: Disconnecting 
client 0x7ff985e478e0, pid=785...
Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]:    error: 
pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]:    error: 
mcp_cpg_destroy: Connection destroyed
Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]:    error: 
attrd_cib_connection_destroy: Connection to the CIB terminated...
Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]:    debug: crm_update_callsites: 
Enabling callsites based on priority=7, files=(null), functions=(null), 
formats=(null), tags=(null)
Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[767]:    debug: crm_update_callsites: 
Enabling callsites based on priority=7, files=(null), functions=(null), 
formats=(null), tags=(null)
Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[767]:   notice: main: CRM Git 
Version: 42f2063
Apr 20 16:50:01 juju-machine-3-lxc-4 stonith-ng[781]:    error: 
stonith_peer_cs_destroy: Corosync connection terminated
Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: corosync
Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]:    error: cluster_connect_cpg: 
Could not connect to the Cluster Process Group API: 2
Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]:     crit: cib_init: Cannot sign 
in to the cluster... terminating
Apr 20 16:50:02 juju-machine-3-lxc-4 crmd[767]:  warning: do_cib_control: 
Couldn't complete CIB registration 1 times... pause and retry
Apr 20 16:50:05 juju-machine-3-lxc-4 crmd[767]:  warning: do_cib_control: 
Couldn't complete CIB registration 2 times... pause and retry

These are the only processes running in one of the nodes:

root       782  0.0  0.0  81464  1828 ?        Ss   Feb12  25:13 
/usr/lib/pacemaker/lrmd
haclust+   784  0.0  0.0  73920   776 ?        Ss   Feb12   8:25 
/usr/lib/pacemaker/pengine
root       780  0.8  0.0 130256  4152 ?        Ssl  16:50   0:00 
/usr/sbin/corosync


A possible explanation could be: 
http://thread.gmane.org/gmane.linux.highavailability.corosync/592/focus=639

I only have logs for one of the nodes, I'm trying to get logs of the
other 2 nodes to get a better understanding of what was happening with
the communication.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1439649

Title:
  Pacemaker unable to communicate with corosync on restart under lxc

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1439649/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1439649] Re: Pacemaker unable to communicate with corosync on restart under lxc

Reply via email to