Public bug reported:

We've seen this a few times with three node clusters, all running in LXC
containers; pacemaker fails to restart correctly as it can't communicate
with corosync, resulting in a down cluster.  Rebooting the containers
resolves the issue, so suspect some sort of bad state either in corosync
or pacemaker.

Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
mcp_read_config: Configured corosync to accept connections from group 115: 
Library error (2)
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: main: 
Starting Pacemaker 1.1.10 (Build: 42f2063):  generated-manpages agent-manpages 
ncurses libqb-logging libqb-ipc lha-fencing upstart nagios  heartbeat 
corosync-native snmp libesmtp
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
cluster_connect_quorum: Quorum acquired
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1000
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1003
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
get_node_name: Defaulting to uname -n for the local corosync node name
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node 
juju-machine-4-lxc-4[1001] - state is now member (was (null))
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1003
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
crm_update_peer_state: pcmk_quorum_notification: Node (null)[1003] - state is 
now member (was (null))
Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: main: CRM Git 
Version: 42f2063
Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
corosync_node_name: Unable to get node name for nodeid 1001
Apr  2 11:41:32 juju-machine-4-lxc-4 stonith-ng[1033744]:   notice: 
get_node_name: Defaulting to uname -n for the local corosync node name
Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [MAIN  ] Denied 
connection attempt from 109:115
Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [QB    ] Invalid IPC 
credentials (1033732-1033746).
Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:    error: 
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:    error: main: HA Signon 
failed
Apr  2 11:41:32 juju-machine-4-lxc-4 attrd[1033746]:    error: main: Aborting 
startup
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:    error: 
pcmk_child_exit: Child process attrd (1033746) exited: Network is down (100)
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:  warning: 
pcmk_child_exit: Pacemaker child process attrd no longer wishes to be 
respawned. Shutting ourselves down.
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
pcmk_shutdown_worker: Shuting down Pacemaker
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: stop_child: 
Stopping crmd: Sent -15 to process 1033748
Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:  warning: do_cib_control: 
Couldn't complete CIB registration 1 times... pause and retry
Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: crm_shutdown: 
Requesting shutdown, upper limit is 1200000ms
Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:  warning: do_log: FSA: 
Input I_SHUTDOWN from crm_shutdown() received in state S_STARTING
Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: 
do_state_transition: State transition S_STARTING -> S_STOPPING [ 
input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Apr  2 11:41:32 juju-machine-4-lxc-4 cib[1033743]:   notice: 
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Apr  2 11:41:32 juju-machine-4-lxc-4 crmd[1033748]:   notice: 
terminate_cs_connection: Disconnecting from Corosync
Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [MAIN  ] Denied 
connection attempt from 109:115
Apr  2 11:41:32 juju-machine-4-lxc-4 corosync[1033732]:  [QB    ] Invalid IPC 
credentials (1033732-1033743).
Apr  2 11:41:32 juju-machine-4-lxc-4 cib[1033743]:    error: 
cluster_connect_cpg: Could not connect to the Cluster Process Group API: 11
Apr  2 11:41:32 juju-machine-4-lxc-4 cib[1033743]:     crit: cib_init: Cannot 
sign in to the cluster... terminating
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: stop_child: 
Stopping pengine: Sent -15 to process 1033747
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:    error: 
pcmk_child_exit: Child process cib (1033743) exited: Network is down (100)
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:  warning: 
pcmk_child_exit: Pacemaker child process cib no longer wishes to be respawned. 
Shutting ourselves down.
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: stop_child: 
Stopping lrmd: Sent -15 to process 1033745
Apr  2 11:41:32 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: stop_child: 
Stopping stonith-ng: Sent -15 to process 1033744
Apr  2 11:41:34 juju-machine-4-lxc-4 corosync[1033732]:  [TOTEM ] A new 
membership (10.245.160.62:284) was formed. Members joined: 1000
Apr  2 11:41:41 juju-machine-4-lxc-4 stonith-ng[1033744]:    error: setup_cib: 
Could not connect to the CIB service: Transport endpoint is not connected (-107)
Apr  2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
pcmk_shutdown_worker: Shutdown complete
Apr  2 11:41:41 juju-machine-4-lxc-4 pacemakerd[1033741]:   notice: 
pcmk_shutdown_worker: Attempting to inhibit respawning after fatal error

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: pacemaker 1.1.10+git20130802-1ubuntu2.3
ProcVersionSignature: User Name 3.16.0-33.44~14.04.1-generic 3.16.7-ckt7
Uname: Linux 3.16.0-33-generic x86_64
NonfreeKernelModules: vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT 
ip6table_filter ip6_tables ebtable_nat ebtables veth 8021q garp xt_CHECKSUM mrp 
iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables nbd 
ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi openvswitch gre vxlan dm_crypt bridge 
dm_multipath intel_rapl stp scsi_dh x86_pkg_temp_thermal llc intel_powerclamp 
coretemp ioatdma kvm_intel ipmi_si joydev sb_edac kvm hpwdt hpilo dca 
ipmi_msghandler acpi_power_meter edac_core lpc_ich shpchp serio_raw mac_hid xfs 
libcrc32c btrfs xor raid6_pq hid_generic usbhid hid crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd psmouse tg3 ptp pata_acpi hpsa pps_core
ApportVersion: 2.14.1-0ubuntu3.7
Architecture: amd64
Date: Thu Apr  2 11:42:18 2015
SourcePackage: pacemaker
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: pacemaker (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug trusty uec-images

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1439649

Title:
  Pacemaker unable to communicate with corosync on restart

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1439649/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to