** Also affects: pacemaker (Ubuntu)
   Importance: Undecided
       Status: New

** Description changed:

  I'm running a two node HA Cluster with pacemaker/corosync and a pretty
  simple configuration  - only an IP address, one service  and two clone
  sets of resources are managed (see below). however i run into constant
- crashes of corosync on both nodes. At the moment this behaviour makes
- the cluster unusable.
+ crashes of pacemaker (looked like corossync at first) on both nodes. At
+ the moment this behaviour makes the cluster unusable.
  
  I attached the cluster config, cib.xml and the crashdumps to the bug,
  hopefully someone can make something of it.
- 
  
  ~# crm_mon -1
  crm_mon -1
  Last updated: Fri Jun  6 15:43:14 2014
  Last change: Fri Jun  6 10:28:17 2014 via cibadmin on lbsrv52
  Stack: corosync
  Current DC: lbsrv51 (1) - partition with quorum
  Version: 1.1.10-42f2063
  2 Nodes configured
  6 Resources configured
  
  Online: [ lbsrv51 lbsrv52 ]
  
-  Resource Group: grp_HAProxy-Front-IPs
-      res_IPaddr2_Test   (ocf::heartbeat:IPaddr2):       Started lbsrv51 
-  res_pdnsd_pdnsd        (lsb:pdnsd):    Started lbsrv51 
-  Clone Set: cl_isc-dhcp-server_1 [res_isc-dhcp-server_1]
-      Started: [ lbsrv51 lbsrv52 ]
-  Clone Set: cl_tftpd-hpa_1 [res_tftpd-hpa_1]
-      Started: [ lbsrv51 lbsrv52 ]
- 
+  Resource Group: grp_HAProxy-Front-IPs
+      res_IPaddr2_Test   (ocf::heartbeat:IPaddr2):       Started lbsrv51
+  res_pdnsd_pdnsd        (lsb:pdnsd):    Started lbsrv51
+  Clone Set: cl_isc-dhcp-server_1 [res_isc-dhcp-server_1]
+      Started: [ lbsrv51 lbsrv52 ]
+  Clone Set: cl_tftpd-hpa_1 [res_tftpd-hpa_1]
+      Started: [ lbsrv51 lbsrv52 ]
  
  == corosync.log; ==
  Jun 06 15:14:56 [2324] lbsrv51        cib:    error: pcmk_cpg_dispatch:       
  Connection to the CPG API failed: Library error (2)
  Jun 06 15:14:56 [2327] lbsrv51      attrd:    error: pcmk_cpg_dispatch:       
  Connection to the CPG API failed: Library error (2)
  Jun 06 15:14:56 [2327] lbsrv51      attrd:     crit: attrd_cs_destroy:  Lost 
connection to Corosync service!
  Jun 06 15:14:56 [2327] lbsrv51      attrd:   notice: main:      Exiting...
  Jun 06 15:14:56 [2324] lbsrv51        cib:    error: cib_cs_destroy:    
Corosync connection lost!  Exiting.
  Jun 06 15:14:56 [2327] lbsrv51      attrd:   notice: main:      Disconnecting 
client 0x7f1f86244a10, pid=2329...
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: terminate_cib:     
cib_cs_destroy: Exiting fast...
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [2327] lbsrv51      attrd:    error: 
attrd_cib_connection_destroy:      Connection to the CIB terminated...
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: qb_ipcs_us_withdraw:     
  withdrawing server sockets
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:    error: crm_ipc_read:      
Connection to cib_rw failed
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:    error: mainloop_gio_callback:   
  Connection to cib_rw[0x7f52f2d82c10] closed (I/O condition=17)
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: qb_ipcs_us_withdraw:     
  withdrawing server sockets
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: qb_ipcs_us_withdraw:     
  withdrawing server sockets
  Jun 06 15:14:56 [2324] lbsrv51        cib:     info: crm_xml_cleanup:   
Cleaning up memory from libxml2
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:   notice: cib_connection_destroy:  
  Connection to the CIB terminated. Shutting down.
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:     info: stonith_shutdown:  
Terminating with  1 clients
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:     info: qb_ipcs_us_withdraw:     
  withdrawing server sockets
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:     info: main:      Done
  Jun 06 15:14:56 [2325] lbsrv51 stonith-ng:     info: crm_xml_cleanup:   
Cleaning up memory from libxml2
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: crm_ipc_read:      
Connection to cib_shm failed
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: mainloop_gio_callback:   
  Connection to cib_shm[0x7f97ed1f6980] closed (I/O condition=17)
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: 
crmd_cib_connection_destroy:       Connection to the CIB terminated...
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: do_log:    FSA: Input 
I_ERROR from crmd_cib_connection_destroy() received in state S_IDLE
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: do_state_transition:     
  State transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL 
origin=crmd_cib_connection_destroy ]
  Jun 06 15:14:56 [2329] lbsrv51       crmd:  warning: do_recover:        
Fast-tracking shutdown in response to errors
  Jun 06 15:14:56 [2329] lbsrv51       crmd:  warning: do_election_vote:  Not 
voting in election, we're in state S_RECOVERY
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_dc_release:     DC 
role released
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: pcmk_child_exit:   Child 
process stonith-ng (2325) exited: OK (0)
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: crm_cs_flush:      Sent 
0 CPG messages  (1 remaining, last=10): Library error (2)
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:   notice: pcmk_process_exit:       
  Respawning failed child process: stonith-ng
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: pe_ipc_destroy:    
Connection to the Policy Engine released
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_te_control:     
Transitioner is now inactive
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: do_log:    FSA: Input 
I_TERMINATE from do_recover() received in state S_RECOVERY
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_state_transition:     
  State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE 
cause=C_FSA_INTERNAL origin=do_recover ]
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_shutdown:       
Disconnecting STONITH...
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: 
tengine_stonith_connection_destroy:        Fencing daemon disconnected
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: start_child:       
Forked child 59988 for process stonith-ng
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: stop_recurring_actions:  
  Cancelling op 27 for res_tftpd-hpa_1 (res_tftpd-hpa_1:27)
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:    error: pcmk_child_exit:   Child 
process attrd (2327) exited: Transport endpoint is not connected (107)
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:   notice: pcmk_process_exit:       
  Respawning failed child process: attrd
  Jun 06 15:14:56 [2328] lbsrv51    pengine:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: start_child:       Using 
uid=111 and group=119 for process attrd
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: start_child:       
Forked child 59989 for process attrd
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: mcp_quorum_destroy:      
  connection closed
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:    error: pcmk_cpg_dispatch:       
  Connection to the CPG API failed: Library error (2)
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:    error: mcp_cpg_destroy:   
Connection destroyed
  Jun 06 15:14:56 [2322] lbsrv51 pacemakerd:     info: crm_xml_cleanup:   
Cleaning up memory from libxml2
  Jun 06 15:14:56 [2326] lbsrv51       lrmd:     info: cancel_recurring_action: 
  Cancelling operation res_tftpd-hpa_1_status_15000
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: stop_recurring_actions:  
  Cancelling op 35 for res_IPaddr2_Test (res_IPaddr2_Test:35)
  Jun 06 15:14:56 [2326] lbsrv51       lrmd:     info: cancel_recurring_action: 
  Cancelling operation res_IPaddr2_Test_monitor_10000
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: stop_recurring_actions:  
  Cancelling op 41 for res_pdnsd_pdnsd (res_pdnsd_pdnsd:41)
  Jun 06 15:14:56 [2326] lbsrv51       lrmd:     info: cancel_recurring_action: 
  Cancelling operation res_pdnsd_pdnsd_status_15000
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: stop_recurring_actions:  
  Cancelling op 47 for res_isc-dhcp-server_1 (res_isc-dhcp-server_1:47)
  Jun 06 15:14:56 [2326] lbsrv51       lrmd:     info: cancel_recurring_action: 
  Cancelling operation res_isc-dhcp-server_1_status_15000
  Jun 06 15:14:56 [59989] lbsrv51      attrd:   notice: crm_cluster_connect:    
  Connecting to cluster infrastructure: corosync
  Jun 06 15:14:56 [59989] lbsrv51      attrd:    error: cluster_connect_cpg:    
  Could not connect to the Cluster Process Group API: 2
  Jun 06 15:14:56 [59989] lbsrv51      attrd:    error: main:     HA Signon 
failed
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: 
lrm_state_verify_stopped:  Stopped 4 recurring operations at (null) (3942893656 
ops remaining)
  Jun 06 15:14:56 [59989] lbsrv51      attrd:    error: main:     Aborting 
startup
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: 
lrm_state_verify_stopped:  Recurring action res_pdnsd_pdnsd:41 
(res_pdnsd_pdnsd_monitor_15000) incomplete at shutdown
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: 
lrm_state_verify_stopped:  Recurring action res_isc-dhcp-server_1:47 
(res_isc-dhcp-server_1_monitor_15000) incomplete at shutdown
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: 
lrm_state_verify_stopped:  Recurring action res_IPaddr2_Test:35 
(res_IPaddr2_Test_monitor_10000) incomplete at shutdown
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: 
lrm_state_verify_stopped:  3 resources were active at shutdown.
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_lrm_control:    
Disconnecting from the LRM
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: lrmd_api_disconnect:     
  Disconnecting from lrmd service
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: 
lrmd_ipc_connection_destroy:       IPC connection destroyed
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: lrm_connection_destroy:  
  LRM Connection disconnected
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: lrmd_api_disconnect:     
  Disconnecting from lrmd service
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: do_lrm_control:    
Disconnected from the LRM
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crm_cluster_disconnect:  
  Disconnecting from cluster infrastructure: corosync
  Jun 06 15:14:56 [2329] lbsrv51       crmd:   notice: terminate_cs_connection: 
  Disconnecting from Corosync
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crm_cluster_disconnect:  
  Disconnected from corosync
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_ha_control:     
Disconnected from the cluster
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_cib_control:    
Disconnecting CIB
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: qb_ipcs_us_withdraw:     
  withdrawing server sockets
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_exit:   Performing 
A_EXIT_0 - gracefully exiting the CRMd
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: do_exit:   [crmd] 
stopped (0)
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crmd_exit:         
Dropping I_PENDING: [ state=S_TERMINATE cause=C_FSA_INTERNAL 
origin=do_election_vote ]
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crmd_exit:         
Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL 
origin=do_dc_release ]
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crmd_exit:         
Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crmd_quorum_destroy:     
  connection closed
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crmd_cs_destroy:   
connection closed
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crmd_init:         2329 
stopped: OK (0)
  Jun 06 15:14:56 [2329] lbsrv51       crmd:    error: crmd_fast_exit:    Could 
not recover from internal error
  Jun 06 15:14:56 [2329] lbsrv51       crmd:     info: crm_xml_cleanup:   
Cleaning up memory from libxml2
  Jun 06 15:14:56 [2326] lbsrv51       lrmd:     info: crm_client_destroy:      
  Destroying 0 events
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:     info: crm_log_init:     
Changed active directory to /var/lib/heartbeat/cores/root
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:     info: get_cluster_type:       
  Verifying cluster type: 'corosync'
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:     info: get_cluster_type:       
  Assuming an active 'corosync' cluster
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:   notice: crm_cluster_connect:    
  Connecting to cluster infrastructure: corosync
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:    error: cluster_connect_cpg:    
  Could not connect to the Cluster Process Group API: 2
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:     crit: main:     Cannot sign 
in to the cluster... terminating
  Jun 06 15:14:56 [59988] lbsrv51 stonith-ng:     info: crm_xml_cleanup:  
Cleaning up memory from libxml2
  
- 
  == dmesg: ==
  [60379.304488] show_signal_msg: 18 callbacks suppressed
  [60379.304493] crm_resource[19768]: segfault at 0 ip 00007f276681c0aa sp 
00007fffe49ea2a8 error 4 in libc-2.19.so[7f27666db000+1bc000]
  [60379.858371] cib[2234]: segfault at 0 ip 00007f59013760aa sp 
00007fff0e21a0d8 error 4 in libc-2.19.so[7f5901235000+1bc000]
- 
  
  == syslog: ==
  Jun  6 15:14:56 lbsrv51 cibmon[15100]:    error: crm_ipc_read: Connection to 
cib_ro failed
  Jun  6 15:14:56 lbsrv51 cibmon[15100]:    error: mainloop_gio_callback: 
Connection to cib_ro[0x7f188c76f240] closed (I/O condition=17)
  Jun  6 15:14:56 lbsrv51 cibmon[15100]:    error: cib_connection_destroy: 
Connection to the CIB terminated... exiting
  Jun  6 15:14:56 lbsrv51 attrd[59989]:   notice: crm_add_logfile: Additional 
logging available in /var/log/corosync/corosync.log
- Jun  6 15:14:56 lbsrv51 crm_simulate[59990]:   notice: crm_log_args: Invoked: 
crm_simulate -s -S -VVVVV -L 
+ Jun  6 15:14:56 lbsrv51 crm_simulate[59990]:   notice: crm_log_args: Invoked: 
crm_simulate -s -S -VVVVV -L
  Jun  6 15:14:56 lbsrv51 stonith-ng[59988]:   notice: crm_add_logfile: 
Additional logging available in /var/log/corosync/corosync.log
- Jun  6 15:14:56 lbsrv51 crm_simulate[60012]:   notice: crm_log_args: Invoked: 
crm_simulate -s -S -VVVVV -L 
+ Jun  6 15:14:56 lbsrv51 crm_simulate[60012]:   notice: crm_log_args: Invoked: 
crm_simulate -s -S -VVVVV -L
  Jun  6 15:14:56 lbsrv51 crm_simulate[60038]:   notice: crm_log_args: Invoked: 
crm_simulate -s -S -VVVVV -L

** Summary changed:

- Segfault: corosync segfaults randomly on Ubuntu trusty 14.04
+ Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to corosync in Ubuntu.
https://bugs.launchpad.net/bugs/1327222

Title:
  Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1327222/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to