Hi, I am seeing massages like this in my logs:
Jan 29 07:00:41 B5-2U-205-LS lrmd[3012]: notice: operation_finished: diskManager_monitor_30000:18807:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:00:41 B5-2U-205-LS lrmd[3012]: notice: operation_finished: pmdh_monitor_30000:18817:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:00:41 B5-2U-205-LS lrmd[3012]: notice: operation_finished: sddh_monitor_30000:18818:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:00:41 B5-2U-205-LS lrmd[3012]: notice: operation_finished: sm0_monitor_30000:18821:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main process was not scheduled for 12483.7363 ms (threshold is 800.0000 ms). Consider token timeout increase. Jan 29 07:00:44 B5-2U-205-LS crmd[3015]: notice: process_lrm_event: Operation sm0dh_monitor_30000: not running (node=node-0, call=59, rc=7, cib-update=261, confirmed=false) Jan 29 07:00:44 B5-2U-205-LS crmd[3015]: notice: process_lrm_event: node-0-sm0dh_monitor_30000:59 [ Failed to get properties: Connection timed out\n ] Jan 29 07:01:02 B5-2U-205-LS corosync[2742]: [TOTEM ] Process pause detected for 17843 ms, flushing membership messages. Jan 29 07:01:04 B5-2U-205-LS indexServer(indexServer)[18891]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:01:04 B5-2U-205-LS diskHelper(dmdh)[18892]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:01:19 B5-2U-205-LS adminServer(adminServer)[18911]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:01:36 B5-2U-205-LS lrmd[3012]: notice: operation_finished: indexServer_monitor_30000:18828:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:01:41 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main process was not scheduled for 55969.9180 ms (threshold is 800.0000 ms). Consider token timeout increase. Jan 29 07:02:01 B5-2U-205-LS lrmd[3012]: notice: operation_finished: dmdh_monitor_30000:18830:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:03:39 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase. Jan 29 07:03:47 B5-2U-205-LS notificationService(notificationService)[18959]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:03:47 B5-2U-205-LS storageManager(sm0)[18958]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:03:47 B5-2U-205-LS diskManager(diskManager)[18960]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:03:58 B5-2U-205-LS diskHelper(pmdh)[18964]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:04:00 B5-2U-205-LS lrmd[3012]: notice: operation_finished: adminServer_monitor_30000:18853:stderr [ Failed to get properties: Connection timed out ] Jan 29 07:04:04 B5-2U-205-LS diskHelper(sm0dh)[18968]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:04:16 B5-2U-205-LS diskHelper(sddh)[18987]: WARNING: RA: [monitor] : got rc=1 Jan 29 07:04:31 B5-2U-205-LS corosync[2742]: [TOTEM ] Process pause detected for 109635 ms, flushing membership messages. What is happening to the cluster here? Why Corosync says "Corosync main process was not scheduled for ..."? Why lrmd says "... _monitor_30000:18828:stderr [ Failed to get properties: Connection timed out ]"? It is worth to mention that the system was under big IO load. Also, I am not sure whether is has to do something with load-threshold="400%". Thank you, Kostia
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
