----- On Aug 9, 2020, at 10:17 PM, Bernd Lentes [email protected] wrote:
>> So this appears to be the problem. From these logs I would guess the >> successful stop on ha-idg-1 did not get written to the CIB for some >> reason. I'd look at the pe input from this transition on ha-idg-2 to >> confirm that. >> >> Without the DC knowing about the stop, it tries to schedule a new one, >> but the node is shutting down so it can't do it, which means it has to >> be fenced. I checked all relevant pe-files in this time period. This is what i found out (i just write the important entries): ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3116 -G transition-3116.xml -D transition-3116.dot Current cluster status: ... vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-1 Transition Summary: ... * Migrate vm_nextcloud ( ha-idg-1 -> ha-idg-2 ) Executing cluster transition: * Resource action: vm_nextcloud migrate_from on ha-idg-2 <======= migrate vm_nextcloud * Resource action: vm_nextcloud stop on ha-idg-1 * Pseudo action: vm_nextcloud_start_0 Revised cluster status: Node ha-idg-1 (1084777482): standby Online: [ ha-idg-2 ] vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-2 ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-error-48 -G transition-4514.xml -D transition-4514.dot Current cluster status: Node ha-idg-1 (1084777482): standby Online: [ ha-idg-2 ] ... vm_nextcloud (ocf::heartbeat:VirtualDomain): FAILED[ ha-idg-2 ha-idg-1 ] <====== migration failed Transition Summary: .. * Recover vm_nextcloud ( ha-idg-2 ) Executing cluster transition: * Resource action: vm_nextcloud stop on ha-idg-2 * Resource action: vm_nextcloud stop on ha-idg-1 * Resource action: vm_nextcloud start on ha-idg-2 * Resource action: vm_nextcloud monitor=30000 on ha-idg-2 Revised cluster status: vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-2 ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3117 -G transition-3117.xml -D transition-3117.dot Current cluster status: Node ha-idg-1 (1084777482): standby Online: [ ha-idg-2 ] vm_nextcloud (ocf::heartbeat:VirtualDomain): FAILED ha-idg-2 <====== start on ha-idg-2 failed Transition Summary: * Stop vm_nextcloud ( ha-idg-2 ) due to node availability <==== stop vm_nextcloud (what means due to node availability ?) Executing cluster transition: * Resource action: vm_nextcloud stop on ha-idg-2 Revised cluster status: vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3118 -G transition-4516.xml -D transition-4516.dot Current cluster status: Node ha-idg-1 (1084777482): standby Online: [ ha-idg-2 ] vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <============== vm_nextcloud is stopped Transition Summary: * Shutdown ha-idg-1 Executing cluster transition: * Resource action: vm_nextcloud stop on ha-idg-1 <==== why stop ? It is already stopped Revised cluster status: vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-input-3545 -G transition-0.xml -D transition-0.dot Current cluster status: Node ha-idg-1 (1084777482): pending Online: [ ha-idg-2 ] vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <====== vm_nextcloud is stopped Transition Summary: Executing cluster transition: Using the original execution date of: 2020-07-20 15:05:33Z Revised cluster status: vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-warn-749 -G transition-1.xml -D transition-1.dot Current cluster status: Node ha-idg-1 (1084777482): OFFLINE (standby) Online: [ ha-idg-2 ] vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <======= vm_nextcloud is stopped Transition Summary: * Fence (Off) ha-idg-1 'resource actions are unrunnable' Executing cluster transition: * Fencing ha-idg-1 (Off) * Pseudo action: vm_nextcloud_stop_0 <======= why stop ? It is already stopped ? Revised cluster status: Node ha-idg-1 (1084777482): OFFLINE (standby) Online: [ ha-idg-2 ] vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped I don't understand why the cluster tries to stop a resource which is already stopped. Bernd Helmholtz Zentrum München Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
