** Also affects: pacemaker (Ubuntu)
Importance: Undecided
Status: New
** Description changed:
I'm running a two node HA Cluster with pacemaker/corosync and a pretty
simple configuration - only an IP address, one service and two clone
sets of resources are managed (see below). however i run into constant
- crashes of corosync on both nodes. At the moment this behaviour makes
- the cluster unusable.
+ crashes of pacemaker (looked like corossync at first) on both nodes. At
+ the moment this behaviour makes the cluster unusable.
I attached the cluster config, cib.xml and the crashdumps to the bug,
hopefully someone can make something of it.
-
~# crm_mon -1
crm_mon -1
Last updated: Fri Jun 6 15:43:14 2014
Last change: Fri Jun 6 10:28:17 2014 via cibadmin on lbsrv52
Stack: corosync
Current DC: lbsrv51 (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
6 Resources configured
Online: [ lbsrv51 lbsrv52 ]
- Resource Group: grp_HAProxy-Front-IPs
- res_IPaddr2_Test (ocf::heartbeat:IPaddr2): Started lbsrv51
- res_pdnsd_pdnsd (lsb:pdnsd): Started lbsrv51
- Clone Set: cl_isc-dhcp-server_1 [res_isc-dhcp-server_1]
- Started: [ lbsrv51 lbsrv52 ]
- Clone Set: cl_tftpd-hpa_1 [res_tftpd-hpa_1]
- Started: [ lbsrv51 lbsrv52 ]
-
+ Resource Group: grp_HAProxy-Front-IPs
+ res_IPaddr2_Test (ocf::heartbeat:IPaddr2): Started lbsrv51
+ res_pdnsd_pdnsd (lsb:pdnsd): Started lbsrv51
+ Clone Set: cl_isc-dhcp-server_1 [res_isc-dhcp-server_1]
+ Started: [ lbsrv51 lbsrv52 ]
+ Clone Set: cl_tftpd-hpa_1 [res_tftpd-hpa_1]
+ Started: [ lbsrv51 lbsrv52 ]
== corosync.log; ==
Jun 06 15:14:56 [2324] lbsrv51 cib: error: pcmk_cpg_dispatch:
Connection to the CPG API failed: Library error (2)
Jun 06 15:14:56 [2327] lbsrv51 attrd: error: pcmk_cpg_dispatch:
Connection to the CPG API failed: Library error (2)
Jun 06 15:14:56 [2327] lbsrv51 attrd: crit: attrd_cs_destroy: Lost
connection to Corosync service!
Jun 06 15:14:56 [2327] lbsrv51 attrd: notice: main: Exiting...
Jun 06 15:14:56 [2324] lbsrv51 cib: error: cib_cs_destroy:
Corosync connection lost! Exiting.
Jun 06 15:14:56 [2327] lbsrv51 attrd: notice: main: Disconnecting
client 0x7f1f86244a10, pid=2329...
Jun 06 15:14:56 [2324] lbsrv51 cib: info: terminate_cib:
cib_cs_destroy: Exiting fast...
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [2327] lbsrv51 attrd: error:
attrd_cib_connection_destroy: Connection to the CIB terminated...
Jun 06 15:14:56 [2324] lbsrv51 cib: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: error: crm_ipc_read:
Connection to cib_rw failed
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: error: mainloop_gio_callback:
Connection to cib_rw[0x7f52f2d82c10] closed (I/O condition=17)
Jun 06 15:14:56 [2324] lbsrv51 cib: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [2324] lbsrv51 cib: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: notice: cib_connection_destroy:
Connection to the CIB terminated. Shutting down.
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: stonith_shutdown:
Terminating with 1 clients
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: main: Done
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: crm_ipc_read:
Connection to cib_shm failed
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: mainloop_gio_callback:
Connection to cib_shm[0x7f97ed1f6980] closed (I/O condition=17)
Jun 06 15:14:56 [2329] lbsrv51 crmd: error:
crmd_cib_connection_destroy: Connection to the CIB terminated...
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: do_log: FSA: Input
I_ERROR from crmd_cib_connection_destroy() received in state S_IDLE
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: do_state_transition:
State transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
origin=crmd_cib_connection_destroy ]
Jun 06 15:14:56 [2329] lbsrv51 crmd: warning: do_recover:
Fast-tracking shutdown in response to errors
Jun 06 15:14:56 [2329] lbsrv51 crmd: warning: do_election_vote: Not
voting in election, we're in state S_RECOVERY
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_dc_release: DC
role released
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: pcmk_child_exit: Child
process stonith-ng (2325) exited: OK (0)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: crm_cs_flush: Sent
0 CPG messages (1 remaining, last=10): Library error (2)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: notice: pcmk_process_exit:
Respawning failed child process: stonith-ng
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: pe_ipc_destroy:
Connection to the Policy Engine released
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_te_control:
Transitioner is now inactive
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: do_log: FSA: Input
I_TERMINATE from do_recover() received in state S_RECOVERY
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_state_transition:
State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE
cause=C_FSA_INTERNAL origin=do_recover ]
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_shutdown:
Disconnecting STONITH...
Jun 06 15:14:56 [2329] lbsrv51 crmd: info:
tengine_stonith_connection_destroy: Fencing daemon disconnected
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: start_child:
Forked child 59988 for process stonith-ng
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_actions:
Cancelling op 27 for res_tftpd-hpa_1 (res_tftpd-hpa_1:27)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: error: pcmk_child_exit: Child
process attrd (2327) exited: Transport endpoint is not connected (107)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: notice: pcmk_process_exit:
Respawning failed child process: attrd
Jun 06 15:14:56 [2328] lbsrv51 pengine: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: start_child: Using
uid=111 and group=119 for process attrd
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: start_child:
Forked child 59989 for process attrd
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: mcp_quorum_destroy:
connection closed
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: error: pcmk_cpg_dispatch:
Connection to the CPG API failed: Library error (2)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: error: mcp_cpg_destroy:
Connection destroyed
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_recurring_action:
Cancelling operation res_tftpd-hpa_1_status_15000
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_actions:
Cancelling op 35 for res_IPaddr2_Test (res_IPaddr2_Test:35)
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_recurring_action:
Cancelling operation res_IPaddr2_Test_monitor_10000
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_actions:
Cancelling op 41 for res_pdnsd_pdnsd (res_pdnsd_pdnsd:41)
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_recurring_action:
Cancelling operation res_pdnsd_pdnsd_status_15000
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_actions:
Cancelling op 47 for res_isc-dhcp-server_1 (res_isc-dhcp-server_1:47)
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_recurring_action:
Cancelling operation res_isc-dhcp-server_1_status_15000
Jun 06 15:14:56 [59989] lbsrv51 attrd: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jun 06 15:14:56 [59989] lbsrv51 attrd: error: cluster_connect_cpg:
Could not connect to the Cluster Process Group API: 2
Jun 06 15:14:56 [59989] lbsrv51 attrd: error: main: HA Signon
failed
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice:
lrm_state_verify_stopped: Stopped 4 recurring operations at (null) (3942893656
ops remaining)
Jun 06 15:14:56 [59989] lbsrv51 attrd: error: main: Aborting
startup
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice:
lrm_state_verify_stopped: Recurring action res_pdnsd_pdnsd:41
(res_pdnsd_pdnsd_monitor_15000) incomplete at shutdown
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice:
lrm_state_verify_stopped: Recurring action res_isc-dhcp-server_1:47
(res_isc-dhcp-server_1_monitor_15000) incomplete at shutdown
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice:
lrm_state_verify_stopped: Recurring action res_IPaddr2_Test:35
(res_IPaddr2_Test_monitor_10000) incomplete at shutdown
Jun 06 15:14:56 [2329] lbsrv51 crmd: error:
lrm_state_verify_stopped: 3 resources were active at shutdown.
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_lrm_control:
Disconnecting from the LRM
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrmd_api_disconnect:
Disconnecting from lrmd service
Jun 06 15:14:56 [2329] lbsrv51 crmd: info:
lrmd_ipc_connection_destroy: IPC connection destroyed
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrm_connection_destroy:
LRM Connection disconnected
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrmd_api_disconnect:
Disconnecting from lrmd service
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: do_lrm_control:
Disconnected from the LRM
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crm_cluster_disconnect:
Disconnecting from cluster infrastructure: corosync
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: terminate_cs_connection:
Disconnecting from Corosync
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crm_cluster_disconnect:
Disconnected from corosync
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_ha_control:
Disconnected from the cluster
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_cib_control:
Disconnecting CIB
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: qb_ipcs_us_withdraw:
withdrawing server sockets
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_exit: Performing
A_EXIT_0 - gracefully exiting the CRMd
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_exit: [crmd]
stopped (0)
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_exit:
Dropping I_PENDING: [ state=S_TERMINATE cause=C_FSA_INTERNAL
origin=do_election_vote ]
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_exit:
Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL
origin=do_dc_release ]
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_exit:
Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_quorum_destroy:
connection closed
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_cs_destroy:
connection closed
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_init: 2329
stopped: OK (0)
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: crmd_fast_exit: Could
not recover from internal error
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crm_xml_cleanup:
Cleaning up memory from libxml2
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: crm_client_destroy:
Destroying 0 events
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: crm_log_init:
Changed active directory to /var/lib/heartbeat/cores/root
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: get_cluster_type:
Verifying cluster type: 'corosync'
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: get_cluster_type:
Assuming an active 'corosync' cluster
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: notice: crm_cluster_connect:
Connecting to cluster infrastructure: corosync
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: error: cluster_connect_cpg:
Could not connect to the Cluster Process Group API: 2
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: crit: main: Cannot sign
in to the cluster... terminating
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: crm_xml_cleanup:
Cleaning up memory from libxml2
-
== dmesg: ==
[60379.304488] show_signal_msg: 18 callbacks suppressed
[60379.304493] crm_resource[19768]: segfault at 0 ip 00007f276681c0aa sp
00007fffe49ea2a8 error 4 in libc-2.19.so[7f27666db000+1bc000]
[60379.858371] cib[2234]: segfault at 0 ip 00007f59013760aa sp
00007fff0e21a0d8 error 4 in libc-2.19.so[7f5901235000+1bc000]
-
== syslog: ==
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: crm_ipc_read: Connection to
cib_ro failed
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: mainloop_gio_callback:
Connection to cib_ro[0x7f188c76f240] closed (I/O condition=17)
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: cib_connection_destroy:
Connection to the CIB terminated... exiting
Jun 6 15:14:56 lbsrv51 attrd[59989]: notice: crm_add_logfile: Additional
logging available in /var/log/corosync/corosync.log
- Jun 6 15:14:56 lbsrv51 crm_simulate[59990]: notice: crm_log_args: Invoked:
crm_simulate -s -S -VVVVV -L
+ Jun 6 15:14:56 lbsrv51 crm_simulate[59990]: notice: crm_log_args: Invoked:
crm_simulate -s -S -VVVVV -L
Jun 6 15:14:56 lbsrv51 stonith-ng[59988]: notice: crm_add_logfile:
Additional logging available in /var/log/corosync/corosync.log
- Jun 6 15:14:56 lbsrv51 crm_simulate[60012]: notice: crm_log_args: Invoked:
crm_simulate -s -S -VVVVV -L
+ Jun 6 15:14:56 lbsrv51 crm_simulate[60012]: notice: crm_log_args: Invoked:
crm_simulate -s -S -VVVVV -L
Jun 6 15:14:56 lbsrv51 crm_simulate[60038]: notice: crm_log_args: Invoked:
crm_simulate -s -S -VVVVV -L
** Summary changed:
- Segfault: corosync segfaults randomly on Ubuntu trusty 14.04
+ Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1327222
Title:
Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1327222/+subscriptions
--
Ubuntu-server-bugs mailing list
[email protected]
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs