Hi, I have posted a question about this error attached to another thread, but because it was old and there is no answer I thought it could have been missed, so I am sorry for repeating it.
Regarding the problem. I have a cluster, and when the cluster gets bigger (around 40 remote nodes) some remote nodes go offline after a while and their logs report some message errors, there is no indication about anything wrong in the other logs. Details: - 40 ec2 m3.xlarge nodes, 1 corosync ring member, 39 remote - maybe irrelevant, but either "cib" or "pengine" process goes to ~100% CPU - it does not happen immediately - smaller cluster (~20 remote nodes) does not show any problems - pacemaker: 1.1.15-1.1f8e642.git.el6.x86_64 - corosync: 2.4.1-1.2.0da1.el6.x86_64 - libqb-1.0.0-1.28.4dff.el6.x86_64 - CentOS 6 Logs: [...] Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_abort: crm_remote_header: Triggered assert at remote.c:119 : endian == ENDIAN_LOCAL Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_remote_header: Invalid message detected, endian mismatch: badadbbd is neither 63646330 nor the swab'd 30636463 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_abort: crm_remote_header: Triggered assert at remote.c:119 : endian == ENDIAN_LOCAL Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_remote_header: Invalid message detected, endian mismatch: badadbbd is neither 63646330 nor the swab'd 30636463 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_abort: crm_remote_header: Triggered assert at remote.c:119 : endian == ENDIAN_LOCAL Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_remote_header: Invalid message detected, endian mismatch: badadbbd is neither 63646330 nor the swab'd 30636463 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: info: lrmd_remote_client_msg: Client disconnect detected in tls msg dispatcher. Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: info: ipc_proxy_remove_provider: ipc proxy connection for client ca8df213-6da7-4c42-8cb3-b8bc0887f2ce pid 21815 destroyed because cluster node disconnected. Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: info: cancel_recurring_action: Cancelling ocf operation monitor_all_monitor_191000 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_send_tls: Connection terminated rc = -53 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_send_tls: Connection terminated rc = -10 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: crm_remote_send: Failed to send remote msg, rc = -10 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: error: lrmd_tls_send_msg: Failed to send remote lrmd tls msg, rc = -10 Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: warning: send_client_notify: Notification of client remote-lrmd-ip-10-237-223-67:3121/b6034d3a-e296-492f-b296-725735d17e22 failed Sep 27 17:18:31 [19626] ip-10-237-223-67 pacemaker_remoted: notice: lrmd_remote_client_destroy: LRMD client disconnecting remote client - name: remote-lrmd-ip-10-237-223-67:3121 id: b6034d3a-e296-492f-b296- 725735d17e22 Sep 27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted: error: ipc_proxy_accept: No ipc providers available for uid 0 gid 0 Sep 27 17:19:35 [19626] ip-10-237-223-67 pacemaker_remoted: error: handle_new_connection: Error in connection setup (19626-21815-14): Remote I/O error (121) Sep 27 17:19:50 [19626] ip-10-237-223-67 pacemaker_remoted: error: ipc_proxy_accept: No ipc providers available for uid 0 gid 0 Sep 27 17:19:50 [19626] ip-10-237-223-67 pacemaker_remoted: error: handle_new_connection: Error in connection setup (19626-21815-14): Remote I/O error (121) [...] -- Best Regards, Radoslaw Garbacz XtremeData Incorporation
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
