Dear Reid, Thanks for the reply. Yes, lots of pacemaker logs - I have included just over a minute of them below and 5m of them as an attached .log file. The same behaviour occurs for a period of roughly 6 minutes before the corosync shutdown happens and can be summarised like so:
This leads to several more questions…
Any help is much appreciated! Cheers Alex |
pacemaker.log.gz
Description: GNU Zip compressed data
Apr 23 11:04:59.462 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) warning: virtual_ip_monitor_30000[724966] timed out after 20000ms Apr 23 11:05:20.190 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) warning: PingChecks_monitor_10000[724970] timed out after 30000ms Apr 23 11:05:24.754 testcluster-c1 pacemaker-controld [1295875] (throttle_check_thresholds) notice: High CPU load detected: 108.660004 Apr 23 11:05:29.558 testcluster-c1 pacemaker-controld [1295875] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:05:30.270 testcluster-c1 pacemaker-controld [1295875] (pe_ipc_destroy) info: Connection to the scheduler released Apr 23 11:05:34.982 testcluster-c1 pacemaker-controld [1295875] (tengine_stonith_connection_destroy) info: Fencing daemon disconnected Apr 23 11:05:35.350 testcluster-c1 pacemaker-controld [1295875] (crmd_exit) notice: Forcing immediate exit with status 100 (Fatal error occurred, will not respawn) Apr 23 11:05:35.538 testcluster-c1 pacemaker-controld [1295875] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:05:35.850 testcluster-c1 pacemaker-controld [1295875] (crm_exit) info: Exiting pacemaker-controld | with status 100 Apr 23 11:05:38.630 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation virtual_ip_monitor_30000 Apr 23 11:05:38.630 testcluster-c1 pacemaker-execd [1295872] (services_action_cancel) info: Terminating in-flight op virtual_ip_monitor_30000[724993] early because it was cancelled Apr 23 11:05:38.610 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) warning: Shutting cluster down because pacemaker-controld[1295875] had fatal failure Apr 23 11:05:38.630 testcluster-c1 pacemakerd [1295869] (pcmk_shutdown_worker) notice: Shutting down Pacemaker Apr 23 11:05:38.630 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-schedulerd | sent signal 15 to process 1295874 Apr 23 11:05:38.634 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) info: virtual_ip_monitor_30000[724993] terminated with signal 9 (Killed) Apr 23 11:05:38.634 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation virtual_ip_monitor_30000 Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (send_client_notify) warning: Could not notify client crmd: Bad file descriptor | rc=9 Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling systemd operation docker-services_status_60000 Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation PingChecks_monitor_10000 Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (services_action_cancel) info: Terminating in-flight op PingChecks_monitor_10000[724994] early because it was cancelled Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) info: PingChecks_monitor_10000[724994] terminated with signal 9 (Killed) Apr 23 11:05:38.650 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation PingChecks_monitor_10000 Apr 23 11:05:38.654 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation DrbdFS_monitor_20000 Apr 23 11:05:38.654 testcluster-c1 pacemaker-execd [1295872] (services_action_cancel) info: Terminating in-flight op DrbdFS_monitor_20000[724990] early because it was cancelled Apr 23 11:05:38.662 testcluster-c1 pacemaker-execd [1295872] (async_action_complete) info: DrbdFS_monitor_20000[724990] terminated with signal 9 (Killed) Apr 23 11:05:38.662 testcluster-c1 pacemaker-execd [1295872] (cancel_recurring_action) info: Cancelling ocf operation DrbdFS_monitor_20000 Apr 23 11:05:38.802 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: Promoting DrbdData:0 (Promoted testcluster-c1) Apr 23 11:05:38.802 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: DrbdData-clone: Promoted 1 instances of a possible 1 Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.149 2 Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.150 c0591b944f7de4d8a5b8ce02555e8983 Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: -- /cib/status/node_state[@id='1']/transient_attributes[@id='1'] Apr 23 11:05:38.806 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=150 Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (RecurringOp) info: Start recurring monitor (20s) for DrbdFS on testcluster-c1 Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:0 (Promoted testcluster-c1) Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:1 (Unpromoted testcluster-c2) Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Recover DrbdFS ( testcluster-c1 ) Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart docker-services ( testcluster-c1 ) due to required DrbdFS start Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart virtual_ip ( testcluster-c1 ) due to required docker-services start Apr 23 11:05:38.806 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:0 (Started testcluster-c1) Apr 23 11:05:38.810 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:1 (Started testcluster-c2) Apr 23 11:05:39.186 testcluster-c1 pacemaker-schedulerd[1295874] (handle_pecalc_op) error: Couldn't send transition graph to peer, writing to /var/lib/pacemaker/pengine/pengine.graph.3KGsrP instead Apr 23 11:05:39.442 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__log_transition_summary) notice: Calculated transition 495, saving inputs in /var/lib/pacemaker/pengine/pe-input-28.bz2 Apr 23 11:05:39.570 testcluster-c1 pacemaker-schedulerd[1295874] (handle_pecalc_op) info: Input has not changed since last time, not saving to disk Apr 23 11:05:41.426 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_delete operation for section //node_state[@uname='testcluster-c1']/transient_attributes: OK (rc=0, origin=testcluster-c2/crmd/27, version=0.30.150) Apr 23 11:05:41.998 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section cib: OK (rc=0, origin=testcluster-c2/crmd/29, version=0.30.150) Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_config) notice: On loss of quorum: Ignore Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c1 is online Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c2 is online Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling virtual_ip_monitor_30000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;43:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling PingChecks_monitor_10000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;51:4:0:a085676a-5b2b-4928-83f2-c01315e5aec5 Apr 23 11:05:42.002 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op_failure) warning: Unexpected result (error: Process did not exit within specified timeout) was recorded for monitor of DrbdFS on testcluster-c1 at Apr 23 11:02:43 2024 | rc=1 id=DrbdFS_last_failure _0 Apr 23 11:05:43.334 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section crm_config: OK (rc=0, origin=testcluster-c2/crmd/31, version=0.30.150) Apr 23 11:05:43.334 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: DrbdFS has failed 1 times on testcluster-c1 Apr 23 11:05:43.334 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__threshold_reached) info: DrbdFS can fail 999999 more times on testcluster-c1 before reaching migration threshold (1000000) Apr 23 11:05:44.022 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section crm_config: OK (rc=0, origin=testcluster-c2/crmd/33, version=0.30.150) Apr 23 11:05:46.818 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section crm_config: OK (rc=0, origin=testcluster-c2/crmd/35, version=0.30.150) Apr 23 11:05:47.342 testcluster-c1 pacemaker-based [1295870] (cib_process_replace) info: Digest matched on replace from testcluster-c2: c0591b944f7de4d8a5b8ce02555e8983 Apr 23 11:05:47.414 testcluster-c1 pacemaker-based [1295870] (cib_process_replace) info: Replaced 0.30.150 with 0.30.150 from testcluster-c2 Apr 23 11:05:48.786 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_replace operation for section 'all': OK (rc=0, origin=testcluster-c2/crmd/38, version=0.30.150) Apr 23 11:05:48.794 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: Promoting DrbdData:0 (Promoted testcluster-c1) Apr 23 11:05:48.794 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: DrbdData-clone: Promoted 1 instances of a possible 1 Apr 23 11:05:48.794 testcluster-c1 pacemaker-schedulerd[1295874] (RecurringOp) info: Start recurring monitor (20s) for DrbdFS on testcluster-c1 Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:0 (Promoted testcluster-c1) Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:1 (Unpromoted testcluster-c2) Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Recover DrbdFS ( testcluster-c1 ) Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart docker-services ( testcluster-c1 ) due to required DrbdFS start Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart virtual_ip ( testcluster-c1 ) due to required docker-services start Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:0 (Started testcluster-c1) Apr 23 11:05:48.798 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:1 (Started testcluster-c2) Apr 23 11:05:48.802 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__log_transition_summary) notice: Calculated transition 496, saving inputs in /var/lib/pacemaker/pengine/pe-input-29.bz2 Apr 23 11:05:48.806 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section nodes: OK (rc=0, origin=testcluster-c2/crmd/39, version=0.30.150) Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.150 2 Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.151 (null) Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: -- /cib/status/node_state[@id='2']/lrm[@id='2'] Apr 23 11:05:48.818 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=151 Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_delete operation for section //node_state[@uname='testcluster-c2']/lrm: OK (rc=0, origin=testcluster-c2/crmd/40, version=0.30.151) Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.151 2 Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.152 (null) Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=152 Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']: @crm-debug-origin=do_lrm_query_internal Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ /cib/status/node_state[@id='2']: <lrm id="2"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resources> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="virtual_ip" type="IPaddr2" class="ocf" provider="heartbeat"> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="virtual_ip_last_0" operation_key="virtual_ip_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3. 11.0" transition-key="5:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:7;5:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="21" rc-code="7" op-status="0" interval="0" last-rc-change="1713422197" exec-time="89" queue-time="0" op -digest="3a768873ca9f51044dbe8d63366fb7ce"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="DrbdData" type="drbd" class="ocf" provider="linbit"> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="DrbdData_last_0" operation_key="DrbdData_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" tr ansition-key="12:1:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;12:1:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="11" rc-code="0" op-status="0" interval="0" last-rc-change="1713422196" exec-time="985" queue-time="0" op-dige st="c0e018b73fdf522b6cdd355e125af15e" op-force-restart=" drbd_resource " op-restart-digest="c0e018b73fdf522b6cdd355e125af15e"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="DrbdData_monitor_60000" operation_key="DrbdData_monitor_60000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_ set="3.11.0" transition-key="13:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;13:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="13" rc-code="0" op-status="0" interval="60000" last-rc-change="1713422197" exec-time="129" que ue-time="0" op-digest="db2fb3216fa7207fdc0b2bf31bbd09c1"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="docker-services" type="docker-services" class="systemd"> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="docker-services_last_0" operation_key="docker-services_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_featu re_set="3.11.0" transition-key="4:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:7;4:2:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="17" rc-code="7" op-status="0" interval="0" last-rc-change="1713422197" exec-time="6" queue-ti me="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="PingChecks" type="ping" class="ocf" provider="pacemaker"> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="PingChecks_last_0" operation_key="PingChecks_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0 " transition-key="50:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;50:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="27" rc-code="0" op-status="0" interval="0" last-rc-change="1713422197" exec-time="9196" queue-time="0" op -digest="1ac9a59a0519ceb6f9188233ae443791" op-force-restart=" use_fping failure_score options host_list multiplier name pidfile " op-restart-digest="e460710e452f12aebd0abcae1b26e5f9"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="PingChecks_monitor_10000" operation_key="PingChecks_monitor_10000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feat ure_set="3.11.0" transition-key="51:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:0;51:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="28" rc-code="0" op-status="0" interval="10000" last-rc-change="1713422207" exec-time="9281 " queue-time="0" op-digest="56e4de959c35addb7fded1aeb1ae4e30"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_resource id="DrbdFS" type="Filesystem" class="ocf" provider="heartbeat"> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ <lrm_rsc_op id="DrbdFS_last_0" operation_key="DrbdFS_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" tr ansition-key="4:1:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" transition-magic="0:7;4:1:7:3f9900ac-185f-4dc1-972d-934d41ca6dc9" exit-reason="" _on_node_="testcluster-c2" call-id="10" rc-code="7" op-status="0" interval="0" last-rc-change="1713422196" exec-time="121" queue-time="0" op-digest ="40b189eb6bf31d9b108771148daa913b"/> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resource> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm_resources> Apr 23 11:05:48.822 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: ++ </lrm> Apr 23 11:05:48.826 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/crmd/41, version=0.30.152) Apr 23 11:05:48.830 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section nodes: OK (rc=0, origin=testcluster-c2/crmd/44, version=0.30.152) Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.152 2 Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.153 (null) Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=153 Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='1']: @crmd=offline, @crm-debug-origin=do_state_transition, @join=down Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']: @crm-debug-origin=do_state_transition Apr 23 11:05:48.834 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/crmd/45, version=0.30.153) Apr 23 11:05:49.346 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.153 2 Apr 23 11:05:49.350 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.154 (null) Apr 23 11:05:49.350 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=154, @dc-uuid=2 Apr 23 11:05:50.922 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section cib: OK (rc=0, origin=testcluster-c2/crmd/46, version=0.30.154) Apr 23 11:05:51.146 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_config) notice: On loss of quorum: Ignore Apr 23 11:05:51.150 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c1 is online Apr 23 11:05:51.150 testcluster-c1 pacemaker-schedulerd[1295874] (determine_online_status) info: Node testcluster-c2 is online Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1 Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0 Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1 Apr 23 11:05:51.154 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0 Apr 23 11:05:51.158 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: virtual_ip has failed 1 times on testcluster-c1 Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.154 2 Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.155 (null) Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: -- /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='DrbdData']/lrm_rsc_op[@id='DrbdData_monitor_60000'] Apr 23 11:05:51.158 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=155 Apr 23 11:05:51.158 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of virtual_ip on testcluster-c1 because it expired | virtual_ip_clear_failcount_0 Apr 23 11:05:51.158 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling virtual_ip_monitor_30000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;43:2:0:3f9900ac-185f-4dc1-972d-934d41ca6dc9 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0 Apr 23 11:05:51.162 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_delete operation for section status: OK (rc=0, origin=testcluster-c2/crmd/49, version=0.30.155) Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: PingChecks:0 has failed 1 times on testcluster-c1 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (pe__clear_failcount) notice: Clearing failure of PingChecks:0 on testcluster-c1 because it expired | PingChecks:0_clear_failcount_0 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op) notice: Rescheduling PingChecks_monitor_10000 after failure expired on testcluster-c1 | actual=1 expected=0 magic=2:1;51:4:0:a085676a-5b2b-4928-83f2-c01315e5aec5 Apr 23 11:05:51.162 testcluster-c1 pacemaker-schedulerd[1295874] (unpack_rsc_op_failure) warning: Unexpected result (error: Process did not exit within specified timeout) was recorded for monitor of DrbdFS on testcluster-c1 at Apr 23 11:02:43 2024 | rc=1 id=DrbdFS_last_failure _0 Apr 23 11:05:51.942 testcluster-c1 pacemaker-schedulerd[1295874] (pe_get_failcount) info: DrbdFS has failed 1 times on testcluster-c1 Apr 23 11:05:51.942 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__threshold_reached) info: DrbdFS can fail 999999 more times on testcluster-c1 before reaching migration threshold (1000000) Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: --- 0.30.155 2 Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: Diff: +++ 0.30.156 (null) Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib: @num_updates=156 Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']: @crm-debug-origin=do_update_resource Apr 23 11:05:51.962 testcluster-c1 pacemaker-based [1295870] (cib_perform_op) info: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='DrbdData']/lrm_rsc_op[@id='DrbdData_last_0']: @operation_key=DrbdData_promote_0, @operation=promote, @crm-deb ug-origin=do_update_resource, @transition-key=6:0:0:91e178a7-4098-4729-92b3-d7cf8fdc6e64, @transition-magic=-1:193;6:0:0:91e178a7-4098-4729-92b3-d7cf8fdc6e64, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1713863139, @exec-time=0 Apr 23 11:05:51.974 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/crmd/50, version=0.30.156) Apr 23 11:05:52.674 testcluster-c1 pacemaker-based [1295870] (cib_process_ping) info: Reporting our current digest to testcluster-c2: fddf1766600ffe74873bf40cd3a0cbcb for 0.30.156 (0x564d66568980 0) Apr 23 11:05:53.074 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: Promoting DrbdData:0 (Promoted testcluster-c1) Apr 23 11:05:53.074 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__set_instance_roles) info: DrbdData-clone: Promoted 1 instances of a possible 1 Apr 23 11:05:53.082 testcluster-c1 pacemaker-schedulerd[1295874] (RecurringOp) info: Start recurring monitor (20s) for DrbdFS on testcluster-c1 Apr 23 11:05:53.946 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:0 (Promoted testcluster-c1) Apr 23 11:05:54.086 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave DrbdData:1 (Unpromoted testcluster-c2) Apr 23 11:05:54.334 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Recover DrbdFS ( testcluster-c1 ) Apr 23 11:05:55.130 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart docker-services ( testcluster-c1 ) due to required DrbdFS start Apr 23 11:05:55.906 testcluster-c1 pacemaker-schedulerd[1295874] (log_list_item) notice: Actions: Restart virtual_ip ( testcluster-c1 ) due to required docker-services start Apr 23 11:05:56.162 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:0 (Started testcluster-c1) Apr 23 11:05:56.422 testcluster-c1 pacemaker-schedulerd[1295874] (rsc_action_default) info: Leave PingChecks:1 (Started testcluster-c2) Apr 23 11:05:56.986 testcluster-c1 pacemaker-based [1295870] (cib_file_backup) info: Archived previous version as /var/lib/pacemaker/cib/cib-35.raw Apr 23 11:05:57.446 testcluster-c1 pacemaker-based [1295870] (cib_file_write_with_digest) info: Wrote version 0.30.0 of the CIB to disk (digest: 6559c9de6a8bfa2291405312f21fe540) Apr 23 11:05:57.590 testcluster-c1 pacemaker-based [1295870] (cib_file_write_with_digest) info: Reading cluster configuration file /var/lib/pacemaker/cib/cib.SOWrXW (digest: /var/lib/pacemaker/cib/cib.pDuq4R) Apr 23 11:05:58.734 testcluster-c1 pacemaker-based [1295870] (cib_process_ping) info: Reporting our current digest to testcluster-c1: fddf1766600ffe74873bf40cd3a0cbcb for 0.30.156 (0x564d66568980 0) Apr 23 11:05:59.138 testcluster-c1 pacemaker-schedulerd[1295874] (pcmk__log_transition_summary) notice: Calculated transition 497, saving inputs in /var/lib/pacemaker/pengine/pe-input-30.bz2 Apr 23 11:05:59.166 testcluster-c1 pacemaker-schedulerd[1295874] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler) Apr 23 11:05:59.174 testcluster-c1 pacemaker-schedulerd[1295874] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:05:59.178 testcluster-c1 pacemaker-schedulerd[1295874] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:05:59.198 testcluster-c1 pacemaker-schedulerd[1295874] (crm_exit) info: Exiting pacemaker-schedulerd | with status 0 Apr 23 11:06:00.486 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-schedulerd[1295874] exited with status 0 (OK) Apr 23 11:06:00.546 testcluster-c1 pacemaker-attrd [1295873] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler) Apr 23 11:06:00.550 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-attrd | sent signal 15 to process 1295873 Apr 23 11:06:00.570 testcluster-c1 pacemaker-attrd [1295873] (main) info: Shutting down attribute manager Apr 23 11:06:00.582 testcluster-c1 pacemaker-attrd [1295873] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:06:01.078 testcluster-c1 pacemaker-attrd [1295873] (attrd_cib_destroy_cb) info: Connection disconnection complete Apr 23 11:06:01.126 testcluster-c1 pacemaker-attrd [1295873] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:06:01.382 testcluster-c1 pacemaker-attrd [1295873] (crm_exit) info: Exiting pacemaker-attrd | with status 0 Apr 23 11:06:02.926 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-attrd[1295873] exited with status 0 (OK) Apr 23 11:06:02.958 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-execd | sent signal 15 to process 1295872 Apr 23 11:06:03.006 testcluster-c1 pacemaker-execd [1295872] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler) Apr 23 11:06:03.010 testcluster-c1 pacemaker-execd [1295872] (lrmd_exit) info: Terminating with 0 clients Apr 23 11:06:03.014 testcluster-c1 pacemaker-execd [1295872] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:06:03.022 testcluster-c1 pacemaker-execd [1295872] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:06:03.206 testcluster-c1 pacemaker-execd [1295872] (crm_exit) info: Exiting pacemaker-execd | with status 0 Apr 23 11:06:03.382 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/attrd/5, version=0.30.156) Apr 23 11:06:03.438 testcluster-c1 pacemaker-based [1295870] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=testcluster-c2/attrd/6, version=0.30.156) Apr 23 11:06:03.966 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-execd[1295872] exited with status 0 (OK) Apr 23 11:06:03.970 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-fenced | sent signal 15 to process 1295871 Apr 23 11:06:03.978 testcluster-c1 pacemaker-fenced [1295871] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler) Apr 23 11:06:04.010 testcluster-c1 pacemaker-fenced [1295871] (stonith_shutdown) info: Terminating with 0 clients Apr 23 11:06:04.630 testcluster-c1 pacemaker-fenced [1295871] (cib_connection_destroy) info: Connection to the CIB manager closed Apr 23 11:06:04.702 testcluster-c1 pacemaker-fenced [1295871] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:06:04.778 testcluster-c1 pacemaker-fenced [1295871] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:06:04.842 testcluster-c1 pacemaker-fenced [1295871] (crm_exit) info: Exiting pacemaker-fenced | with status 0 Apr 23 11:06:06.258 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-fenced[1295871] exited with status 0 (OK) Apr 23 11:06:06.266 testcluster-c1 pacemaker-based [1295870] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler) Apr 23 11:06:06.266 testcluster-c1 pacemakerd [1295869] (stop_child) notice: Stopping pacemaker-based | sent signal 15 to process 1295870 Apr 23 11:06:06.274 testcluster-c1 pacemaker-based [1295870] (cib_shutdown) info: Disconnected 0 clients Apr 23 11:06:06.274 testcluster-c1 pacemaker-based [1295870] (cib_shutdown) info: All clients disconnected (0) Apr 23 11:06:06.282 testcluster-c1 pacemaker-based [1295870] (initiate_exit) info: Sending disconnect notification to 2 peers... Apr 23 11:06:06.334 testcluster-c1 pacemaker-based [1295870] (cib_process_shutdown_req) info: Peer testcluster-c1 is requesting to shut down Apr 23 11:06:06.346 testcluster-c1 pacemaker-based [1295870] (cib_process_shutdown_req) info: Peer testcluster-c2 has acknowledged our shutdown request Apr 23 11:06:06.346 testcluster-c1 pacemaker-based [1295870] (terminate_cib) info: cib_process_shutdown_req: Exiting from mainloop... Apr 23 11:06:06.350 testcluster-c1 pacemaker-based [1295870] (crm_cluster_disconnect) info: Disconnecting from corosync cluster infrastructure Apr 23 11:06:06.358 testcluster-c1 pacemaker-based [1295870] (pcmk__corosync_disconnect) notice: Disconnected from Corosync Apr 23 11:06:06.366 testcluster-c1 pacemaker-based [1295870] (crm_get_peer) info: Created entry 48e34a90-0596-48f1-b2b2-171d1692aac5/0x564d66683e40 for node testcluster-c2/0 (1 total) Apr 23 11:06:06.406 testcluster-c1 pacemaker-based [1295870] (cib_peer_update_callback) info: No more peers Apr 23 11:06:06.422 testcluster-c1 pacemaker-based [1295870] (terminate_cib) info: cib_peer_update_callback: Exiting from mainloop... Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (crm_cluster_disconnect) info: Disconnecting from corosync cluster infrastructure Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (cluster_disconnect_cpg) info: No CPG connection Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (pcmk__corosync_disconnect) notice: Disconnected from Corosync Apr 23 11:06:06.426 testcluster-c1 pacemaker-based [1295870] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:06:06.442 testcluster-c1 pacemaker-based [1295870] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:06:06.442 testcluster-c1 pacemaker-based [1295870] (qb_ipcs_us_withdraw) info: withdrawing server sockets Apr 23 11:06:06.458 testcluster-c1 pacemaker-based [1295870] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:06:09.682 testcluster-c1 pacemaker-based [1295870] (crm_exit) info: Exiting pacemaker-based | with status 0 Apr 23 11:06:10.450 testcluster-c1 pacemakerd [1295869] (pcmk_child_exit) info: pacemaker-based[1295870] exited with status 0 (OK) Apr 23 11:06:10.458 testcluster-c1 pacemakerd [1295869] (pcmk_shutdown_worker) notice: Shutdown complete Apr 23 11:06:10.458 testcluster-c1 pacemakerd [1295869] (pcmk_shutdown_worker) notice: Shutting down and staying down after fatal error Apr 23 11:06:10.458 testcluster-c1 pacemakerd [1295869] (pcmkd_shutdown_corosync) info: Asking Corosync to shut down Apr 23 11:06:10.530 testcluster-c1 pacemakerd [1295869] (crm_xml_cleanup) info: Cleaning up memory from libxml2 Apr 23 11:06:10.554 testcluster-c1 pacemakerd [1295869] (crm_exit) info: Exiting pacemakerd | with status 100
|
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/