I'm seeing this problem in another environment, similar deployment (3 lxc containers)
Apr 20 16:39:26 juju-machine-3-lxc-4 crm_verify[31774]: notice: crm_log_args: Invoked: crm_verify -V -p Apr 20 16:39:27 juju-machine-3-lxc-4 cibadmin[31786]: notice: crm_log_args: Invoked: cibadmin -p -P Apr 20 16:50:01 juju-machine-3-lxc-4 cib[780]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Apr 20 16:50:01 juju-machine-3-lxc-4 cib[780]: error: cib_cs_destroy: Corosync connection lost! Exiting. Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[785]: error: crmd_quorum_destroy: connection terminated Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Apr 20 16:50:01 juju-machine-3-lxc-4 stonith-ng[781]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[785]: notice: crmd_exit: Forcing immediate exit: Link has been severed (67) Apr 20 16:50:01 juju-machine-3-lxc-4 lrmd[782]: warning: qb_ipcs_event_sendv: new_event_notification (782-785-6): Bad file descriptor (9) Apr 20 16:50:01 juju-machine-3-lxc-4 lrmd[782]: warning: send_client_notify: Notification of client crmd/8ad990ba-cf09-4ba3-b74b-a7d05d377a1b failed Apr 20 16:50:01 juju-machine-3-lxc-4 lrmd[782]: error: crm_abort: crm_glib_handler: Forked child 760 to record non-fatal assert at logging.c:63 : Source ID 4601370 was not found when attempting to remove it Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]: error: pcmk_child_exit: Child process cib (780) exited: Invalid argument (22) Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]: notice: pcmk_process_exit: Respawning failed child process: cib Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]: error: pcmk_child_exit: Child process crmd (785) exited: Link has been severed (67) Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]: notice: pcmk_process_exit: Respawning failed child process: crmd Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]: crit: attrd_cs_destroy: Lost connection to Corosync service! Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]: notice: main: Exiting... Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]: notice: main: Disconnecting client 0x7ff985e478e0, pid=785... Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Apr 20 16:50:01 juju-machine-3-lxc-4 pacemakerd[773]: error: mcp_cpg_destroy: Connection destroyed Apr 20 16:50:01 juju-machine-3-lxc-4 attrd[783]: error: attrd_cib_connection_destroy: Connection to the CIB terminated... Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]: debug: crm_update_callsites: Enabling callsites based on priority=7, files=(null), functions=(null), formats=(null), tags=(null) Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[767]: debug: crm_update_callsites: Enabling callsites based on priority=7, files=(null), functions=(null), formats=(null), tags=(null) Apr 20 16:50:01 juju-machine-3-lxc-4 crmd[767]: notice: main: CRM Git Version: 42f2063 Apr 20 16:50:01 juju-machine-3-lxc-4 stonith-ng[781]: error: stonith_peer_cs_destroy: Corosync connection terminated Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 2 Apr 20 16:50:01 juju-machine-3-lxc-4 cib[761]: crit: cib_init: Cannot sign in to the cluster... terminating Apr 20 16:50:02 juju-machine-3-lxc-4 crmd[767]: warning: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry Apr 20 16:50:05 juju-machine-3-lxc-4 crmd[767]: warning: do_cib_control: Couldn't complete CIB registration 2 times... pause and retry These are the only processes running in one of the nodes: root 782 0.0 0.0 81464 1828 ? Ss Feb12 25:13 /usr/lib/pacemaker/lrmd haclust+ 784 0.0 0.0 73920 776 ? Ss Feb12 8:25 /usr/lib/pacemaker/pengine root 780 0.8 0.0 130256 4152 ? Ssl 16:50 0:00 /usr/sbin/corosync A possible explanation could be: http://thread.gmane.org/gmane.linux.highavailability.corosync/592/focus=639 I only have logs for one of the nodes, I'm trying to get logs of the other 2 nodes to get a better understanding of what was happening with the communication. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1439649 Title: Pacemaker unable to communicate with corosync on restart under lxc To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1439649/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
