Hi, a few days ago one of my nodes was fenced and i don't know why, which is something i really don't like. What i did: I put one node (ha-idg-1) in standby. The resources on it (most of all virtual domains) were migrated to ha-idg-2, except one domain (vm_nextcloud). On ha-idg-2 a mountpoint was missing the xml of the domain points to. Then the cluster tries to start vm_nextcloud on ha-idg-2 which of course also failed. Then ha-idg-1 was fenced.
I did a "crm history" over the respective time period, you find it here: https://hmgubox2.helmholtz-muenchen.de/index.php/s/529dfcXf5a72ifF Here, from my point of view, the most interesting from the logs: ha-idg-1: Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: Diff: --- 2.16196.19 2 Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: Diff: +++ 2.16197.0 bc9a558dfbe6d7196653ce56ad1ee758 Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: + /cib: @epoch=16197, @num_updates=0 Jul 20 16:59:33 [23763] ha-idg-1 cib: info: cib_perform_op: + /cib/configuration/nodes/node[@id='1084777482']/instance_attributes[@id='nodes-108 4777482']/nvpair[@id='nodes-1084777482-standby']: @value=on ha-idg-1 set to standby Jul 20 16:59:34 [23768] ha-idg-1 crmd: notice: process_lrm_event: ha-idg-1-vm_nextcloud_migrate_to_0:3169 [ error: Cannot access storage file '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-18.04.4-live-server-amd64.iso': No such file or directory\nocf-exit-reason:vm_nextcloud: live migration to ha-idg-2 failed: 1\n ] migration failed Jul 20 17:04:01 [23767] ha-idg-1 pengine: error: native_create_actions: Resource vm_nextcloud is active on 2 nodes (attempting recovery) ??? Jul 20 17:04:01 [23767] ha-idg-1 pengine: notice: LogAction: * Recover vm_nextcloud ( ha-idg-2 ) Jul 20 17:04:01 [23768] ha-idg-1 crmd: notice: te_rsc_command: Initiating stop operation vm_nextcloud_stop_0 on ha-idg-2 | action 106 Jul 20 17:04:01 [23768] ha-idg-1 crmd: notice: te_rsc_command: Initiating stop operation vm_nextcloud_stop_0 locally on ha-idg-1 | action 2 Jul 20 17:04:01 [23768] ha-idg-1 crmd: info: match_graph_event: Action vm_nextcloud_stop_0 (106) confirmed on ha-idg-2 (rc=0) Jul 20 17:04:06 [23768] ha-idg-1 crmd: notice: process_lrm_event: Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 key=vm_nextcloud_stop_0 confirmed=true cib-update=5960 Jul 20 17:05:29 [23761] ha-idg-1 pacemakerd: notice: crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler) systemctl stop pacemaker.service ha-idg-2: Jul 20 17:04:03 [10691] ha-idg-2 crmd: notice: process_lrm_event: Result of stop operation for vm_nextcloud on ha-idg-2: 0 (ok) | call=157 key=vm_nextcloud_stop_0 confirmed=true cib-update=57 the log from ha-idg-2 is two seconds ahead of ha-idg-1 Jul 20 17:04:08 [10688] ha-idg-2 lrmd: notice: log_execute: executing - rsc:vm_nextcloud action:start call_id:192 Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: vm_nextcloud_start_0:29107:stderr [ error: Failed to create domain from /mnt/share/vm_nextcloud.xml ] Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: vm_nextcloud_start_0:29107:stderr [ error: Cannot access storage file '/mnt/mcd/AG_BioInformatik/Technik/software_und_treiber/linux/ubuntu/ubuntu-18.04.4-live-server-amd64.iso': No such file or directory ] Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: operation_finished: vm_nextcloud_start_0:29107:stderr [ ocf-exit-reason:Failed to start virtual domain vm_nextcloud. ] Jul 20 17:04:09 [10688] ha-idg-2 lrmd: notice: log_finished: finished - rsc:vm_nextcloud action:start call_id:192 pid:29107 exit-code:1 exec-time:581ms queue-time:0ms start on ha-idg-2 failed Jul 20 17:05:32 [10691] ha-idg-2 crmd: info: do_dc_takeover: Taking over DC status for this partition ha-idg-1 stopped pacemaker Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: unpack_rsc_op_failure: Processing failed migrate_to of vm_nextcloud on ha-idg-1: unknown error | rc=1 Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: unpack_rsc_op_failure: Processing failed start of vm_nextcloud on ha-idg-2: unknown error | rc Jul 20 17:05:33 [10690] ha-idg-2 pengine: info: native_color: Resource vm_nextcloud cannot run anywhere logical Jul 20 17:05:33 [10690] ha-idg-2 pengine: warning: custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (pending) ??? Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline) Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: pe_fence_node: Cluster node ha-idg-1 will be fenced: resource actions are unrunnable Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: stage6: Scheduling Node ha-idg-1 for STONITH Jul 20 17:05:35 [10690] ha-idg-2 pengine: info: native_stop_constraints: vm_nextcloud_stop_0 is implicit after ha-idg-1 is fenced Jul 20 17:05:35 [10690] ha-idg-2 pengine: notice: LogNodeActions: * Fence (Off) ha-idg-1 'resource actions are unrunnable' Why does it say "Jul 20 17:05:35 [10690] ha-idg-2 pengine: warning: custom_action: Action vm_nextcloud_stop_0 on ha-idg-1 is unrunnable (offline)" although "Jul 20 17:04:06 [23768] ha-idg-1 crmd: notice: process_lrm_event: Result of stop operation for vm_nextcloud on ha-idg-1: 0 (ok) | call=3197 key=vm_nextcloud_stop_0 confirmed=true cib-update=5960" says that stop was ok ? Bernd -- Bernd Lentes Systemadministration Institute for Metabolism and Cell Death (MCD) Building 25 - office 122 HelmholtzZentrum München [email protected] phone: +49 89 3187 1241 phone: +49 89 3187 3827 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/mcd stay healthy Helmholtz Zentrum München Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
