On Wed, 2018-06-20 at 13:30 +0200, Stefan Krueger wrote: > Hi Ken, > > I don't see any issues in the logs, periodically this is in the logs: > > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just > popped (900000ms) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | > input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > do_state_transition: Progressed to state S_POLICY_ENGINE after > C_TIMER_POPPED > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > process_pe_message: Input has not changed since last time, not > saving to disk > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > determine_online_status: Node zfs-serv3 is online > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > determine_online_status: Node zfs-serv4 is online > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > native_print: vm_storage (ocf::heartbeat:ZFS): Stopped > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > native_print: ha-ip (ocf::heartbeat:IPaddr2): Stopped > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > native_print: resIPMI- > zfs4 (stonith:external/ipmi): Started zfs-serv3 > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > native_print: resIPMI- > zfs3 (stonith:external/ipmi): Started zfs-serv4 > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > native_print: nfs-server (systemd:nfs-server): Stopped > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > RecurringOp: Start recurring monitor (5s) for vm_storage on zfs- > serv3 > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > RecurringOp: Start recurring monitor (10s) for ha-ip on zfs- > serv3 > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > RecurringOp: Start recurring monitor (60s) for nfs-server on > zfs-serv3 > Jun 20 11:52:19 [5612] zfs-serv3 pengine: notice: > LogActions: Start vm_storage (zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3 pengine: notice: > LogActions: Start ha-ip (zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > LogActions: Leave resIPMI-zfs4 (Started zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3 pengine: info: > LogActions: Leave resIPMI-zfs3 (Started zfs-serv4) > Jun 20 11:52:19 [5612] zfs-serv3 pengine: notice: > LogActions: Start nfs-server (zfs-serv3) > Jun 20 11:52:19 [5612] zfs-serv3 pengine: notice: > process_pe_message: Calculated transition 80, saving inputs in > /var/lib/pacemaker/pengine/pe-input-228.bz2 > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: > do_te_invoke: Processing graph 80 (ref=pe_calc-dc-1529488339-113) > derived from /var/lib/pacemaker/pengine/pe-input-228.bz2 > Jun 20 11:52:19 [5613] zfs-serv3 crmd: warning: > run_graph: Transition 80 (Complete=0, Pending=0, Fired=0, > Skipped=0, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input- > 228.bz2): Terminated > Jun 20 11:52:19 [5613] zfs-serv3 crmd: warning: > te_graph_trigger: Transition failed: terminated
Well this is why nothing starts, but I don't see any reason it happens. :-/ Can you attach the pe-input file listed just above here? > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_graph: Graph 80 with 6 actions: batch-limit=0 jobs, > network-delay=60000ms > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 5]: Pending rsc op > vm_storage_monitor_5000 on zfs-serv3 (priority: 0, > waiting: 4) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 4]: Pending rsc op > vm_storage_start_0 on zfs-serv3 (priority: 0, > waiting: 12) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 7]: Pending rsc op ha- > ip_monitor_10000 on zfs-serv3 (priority: 0, > waiting: 6) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 6]: Pending rsc op ha- > ip_start_0 on zfs-serv3 (priority: 0, > waiting: 4 12) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 13]: Pending rsc op nfs- > server_monitor_60000 on zfs-serv3 (priority: 0, > waiting: 12) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 12]: Pending rsc op nfs- > server_start_0 on zfs-serv3 (priority: 0, > waiting: 4) > Jun 20 11:52:19 [5613] zfs-serv3 crmd: info: do_log: Input > I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd > Jun 20 11:52:19 [5613] zfs-serv3 crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> > S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd > Jun 20 12:07:19 [5613] zfs-serv3 crmd: info: > crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just > popped (900000ms) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | > input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped > Jun 20 12:07:19 [5613] zfs-serv3 crmd: info: > do_state_transition: Progressed to state S_POLICY_ENGINE after > C_TIMER_POPPED > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > process_pe_message: Input has not changed since last time, not > saving to disk > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > determine_online_status: Node zfs-serv3 is online > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > determine_online_status: Node zfs-serv4 is online > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > native_print: vm_storage (ocf::heartbeat:ZFS): Stopped > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > native_print: ha-ip (ocf::heartbeat:IPaddr2): Stopped > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > native_print: resIPMI- > zfs4 (stonith:external/ipmi): Started zfs-serv3 > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > native_print: resIPMI- > zfs3 (stonith:external/ipmi): Started zfs-serv4 > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > native_print: nfs-server (systemd:nfs-server): Stopped > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > RecurringOp: Start recurring monitor (5s) for vm_storage on zfs- > serv3 > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > RecurringOp: Start recurring monitor (10s) for ha-ip on zfs- > serv3 > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > RecurringOp: Start recurring monitor (60s) for nfs-server on > zfs-serv3 > Jun 20 12:07:19 [5612] zfs-serv3 pengine: notice: > LogActions: Start vm_storage (zfs-serv3) > Jun 20 12:07:19 [5612] zfs-serv3 pengine: notice: > LogActions: Start ha-ip (zfs-serv3) > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > LogActions: Leave resIPMI-zfs4 (Started zfs-serv3) > Jun 20 12:07:19 [5612] zfs-serv3 pengine: info: > LogActions: Leave resIPMI-zfs3 (Started zfs-serv4) > Jun 20 12:07:19 [5612] zfs-serv3 pengine: notice: > LogActions: Start nfs-server (zfs-serv3) > Jun 20 12:07:19 [5612] zfs-serv3 pengine: notice: > process_pe_message: Calculated transition 81, saving inputs in > /var/lib/pacemaker/pengine/pe-input-228.bz2 > Jun 20 12:07:19 [5613] zfs-serv3 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > Jun 20 12:07:19 [5613] zfs-serv3 crmd: info: > do_te_invoke: Processing graph 81 (ref=pe_calc-dc-1529489239-114) > derived from /var/lib/pacemaker/pengine/pe-input-228.bz2 > Jun 20 12:07:19 [5613] zfs-serv3 crmd: warning: > run_graph: Transition 81 (Complete=0, Pending=0, Fired=0, > Skipped=0, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input- > 228.bz2): Terminated > Jun 20 12:07:19 [5613] zfs-serv3 crmd: warning: > te_graph_trigger: Transition failed: terminated > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_graph: Graph 81 with 6 actions: batch-limit=0 jobs, > network-delay=60000ms > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 5]: Pending rsc op > vm_storage_monitor_5000 on zfs-serv3 (priority: 0, > waiting: 4) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 4]: Pending rsc op > vm_storage_start_0 on zfs-serv3 (priority: 0, > waiting: 12) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 7]: Pending rsc op ha- > ip_monitor_10000 on zfs-serv3 (priority: 0, > waiting: 6) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 6]: Pending rsc op ha- > ip_start_0 on zfs-serv3 (priority: 0, > waiting: 4 12) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 13]: Pending rsc op nfs- > server_monitor_60000 on zfs-serv3 (priority: 0, > waiting: 12) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > print_synapse: [Action 12]: Pending rsc op nfs- > server_start_0 on zfs-serv3 (priority: 0, > waiting: 4) > Jun 20 12:07:19 [5613] zfs-serv3 crmd: info: do_log: Input > I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd > Jun 20 12:07:19 [5613] zfs-serv3 crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> > S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd > > > > Then there should be messages from the crmd about "Initiating" the > > command and obtaining its "Result". > > After a restart of the cluster there is nothing useful in the logs: > > grep crmd /var/log/corosync/corosync.log |egrep 'Initi|Result' > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation vm_storage_monitor_0 > locally on zfs-serv3 | action 2 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation ha-ip_monitor_0 locally > on zfs-serv3 | action 3 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation resIPMI-zfs4_monitor_0 > locally on zfs-serv3 | action 4 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation resIPMI-zfs3_monitor_0 > locally on zfs-serv3 | action 5 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation nfs-server_monitor_0 > locally on zfs-serv3 | action 6 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > process_lrm_event: Result of probe operation for resIPMI-zfs4 on > zfs-serv3: 7 (not running) | call=13 key=resIPMI-zfs4_monitor_0 > confirmed=true cib-update=31 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > process_lrm_event: Result of probe operation for resIPMI-zfs3 on > zfs-serv3: 7 (not running) | call=17 key=resIPMI-zfs3_monitor_0 > confirmed=true cib-update=32 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > process_lrm_event: Result of probe operation for nfs-server on > zfs-serv3: 7 (not running) | call=21 key=nfs-server_monitor_0 > confirmed=true cib-update=33 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > process_lrm_event: Result of probe operation for vm_storage on > zfs-serv3: 7 (not running) | call=5 key=vm_storage_monitor_0 > confirmed=true cib-update=34 > Jun 20 12:16:53 [27559] zfs-serv3 crmd: notice: > process_lrm_event: Result of probe operation for ha-ip on zfs- > serv3: 7 (not running) | call=9 key=ha-ip_monitor_0 confirmed=true > cib-update=35 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation vm_storage_monitor_0 on > zfs-serv4 | action 2 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation ha-ip_monitor_0 on zfs- > serv4 | action 3 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation resIPMI-zfs4_monitor_0 > on zfs-serv4 | action 4 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation resIPMI-zfs3_monitor_0 > on zfs-serv4 | action 5 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation nfs-server_monitor_0 on > zfs-serv4 | action 6 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating start operation resIPMI-zfs4_start_0 > locally on zfs-serv3 | action 11 > Jun 20 12:17:06 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating start operation resIPMI-zfs3_start_0 on > zfs-serv4 | action 13 > Jun 20 12:17:07 [27559] zfs-serv3 crmd: notice: > process_lrm_event: Result of start operation for resIPMI-zfs4 on > zfs-serv3: 0 (ok) | call=22 key=resIPMI-zfs4_start_0 confirmed=true > cib-update=61 > Jun 20 12:17:07 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation resIPMI- > zfs3_monitor_60000 on zfs-serv4 | action 14 > Jun 20 12:17:07 [27559] zfs-serv3 crmd: notice: > te_rsc_command: Initiating monitor operation resIPMI- > zfs4_monitor_60000 locally on zfs-serv3 | action 12 > Jun 20 12:17:08 [27559] zfs-serv3 crmd: info: > process_lrm_event: Result of monitor operation for resIPMI-zfs4 > on zfs-serv3: 0 (ok) | call=23 key=resIPMI-zfs4_monitor_60000 > confirmed=false cib-update=62 > > > Now i will try that: https://wiki.clusterlabs.org/wiki/Debugging_Reso > urce_Failures, maybe it help > > best regards > Stefan > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org