>>> Digimer <[email protected]> schrieb am 18.01.2021 um 03:11 in Nachricht <[email protected]>: > Hi all, > > Mind the slew of questions, well into testing now and finding lots of > issues. This one is two questions... :) > > I set a server to be unamaged in pacemaker while the server was > running. Then I tried to remove the resource, and it refused saying it > couldn't stop it, and to use '--force'. So I did, and the node got > fenced. Now, the resource was setup with;
My guess is you shouldn't do it that way: Why not stop the resource, unconfigure it in the cluster, then start it manually? > > pcs resource create srv07-el6 ocf:alteeve:server name="srv07-el6" \ > meta allow-migrate="true" target-role="started" \ > op monitor interval="60" start timeout="INFINITY" \ > on-fail="block" stop timeout="INFINITY" on-fail="block" \ > migrate_to timeout="INFINITY" > > I would have expected the 'stop timeout="INFINITY" on-fail="block"' to > prevent fencing if the server failed to stop (question 1) and that if a > resource was unmanaged, that the resource wouldn't even try to stop > (question 2). > > Can someone help me understand what happened here? Fencing reason was " srv01-test_stop_0 process (PID 113779) timed out". Did have a failutre before your actions? The logs indicate such it seems: "Clearing failure of srv01-test on el8-a01n02 because resource parameters have changed" Haveing the cluster in a clean state before configuring it highly desirable IMHO. I use this command frequently to check: "crm_mon -1Arfj" The logs should help to explain! Regards, Ulrich > > digimer > > More below; > > ==== > [root@el8-a01n01 ~]# pcs resource remove srv01-test > Attempting to stop: srv01-test... Warning: 'srv01-test' is unmanaged > Error: Unable to stop: srv01-test before deleting (re-run with --force > to force deletion) > [root@el8-a01n01 ~]# pcs resource remove srv01-test --force > Deleting Resource - srv01-test > [root@el8-a01n01 ~]# client_loop: send disconnect: Broken pipe > ==== > > As you can see, the node was fenced. The logs on that node were; > > ==== > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-execd[1872]: warning: > srv01-test_stop_0 process (PID 113779) timed out > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-execd[1872]: warning: > srv01-test_stop_0[113779] timed out after 20000ms > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-controld[1875]: error: > Result of stop operation for srv01-test on el8-a01n01: Timed Out > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-controld[1875]: notice: > el8-a01n01-srv01-test_stop_0:37 [ The server: [srv01-test] is indeed > running. It will be shut down now.\n ] > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]: notice: > Setting fail-count-srv01-test#stop_0[el8-a01n01]: (unset) -> INFINITY > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]: notice: > Setting last-failure-srv01-test#stop_0[el8-a01n01]: (unset) -> 1610935435 > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]: notice: > Setting fail-count-srv01-test#stop_0[el8-a01n01]: INFINITY -> (unset) > Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]: notice: > Setting last-failure-srv01-test#stop_0[el8-a01n01]: 1610935435 -> (unset) > client_loop: send disconnect: Broken pipe > ==== > > On the peer node, the logs showed; > > ==== > Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: State transition S_IDLE -> S_POLICY_ENGINE > Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Calculated transition 58, saving inputs in > /var/lib/pacemaker/pengine/pe-input-100.bz2 > Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 58 (Complete=0, Pending=0, Fired=0, Skipped=0, > Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-100.bz2): Complete > Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: State transition S_TRANSITION_ENGINE -> S_IDLE > Jan 18 02:03:18 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: State transition S_IDLE -> S_POLICY_ENGINE > Jan 18 02:03:18 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Calculated transition 59, saving inputs in > /var/lib/pacemaker/pengine/pe-input-101.bz2 > Jan 18 02:03:18 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 59 (Complete=0, Pending=0, Fired=0, Skipped=0, > Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-101.bz2): Complete > Jan 18 02:03:18 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: State transition S_TRANSITION_ENGINE -> S_IDLE > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: State transition S_IDLE -> S_POLICY_ENGINE > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Detected active orphan srv01-test running on el8-a01n01 > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Clearing failure of srv01-test on el8-a01n02 because resource > parameters have changed > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Removing srv01-test from el8-a01n01 > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Removing srv01-test from el8-a01n02 > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Stop srv01-test ( el8-a01n01 > ) due to node availability > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Calculated transition 60, saving inputs in > /var/lib/pacemaker/pengine/pe-input-102.bz2 > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Initiating stop operation srv01-test_stop_0 on el8-a01n01 > Jan 18 02:03:35 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 60 aborted by deletion of > lrm_rsc_op[@id='srv01-test_last_failure_0']: Resource operation removal > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 60 action 11 (srv01-test_stop_0 on el8-a01n01): > expected 'ok' but got 'error' > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 60 (Complete=2, Pending=0, Fired=0, Skipped=0, > Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-102.bz2): Complete > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Setting fail-count-srv01-test#stop_0[el8-a01n01]: (unset) -> INFINITY > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Setting last-failure-srv01-test#stop_0[el8-a01n01]: (unset) -> 1610935435 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Unexpected result (error) was recorded for stop of srv01-test > on el8-a01n01 at Jan 18 02:03:35 2021 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Unexpected result (error) was recorded for stop of srv01-test > on el8-a01n01 at Jan 18 02:03:35 2021 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Cluster node el8-a01n01 will be fenced: srv01-test failed there > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Detected active orphan srv01-test running on el8-a01n01 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Scheduling Node el8-a01n01 for STONITH > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Stop of failed resource srv01-test is implicit after el8-a01n01 > is fenced > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Fence (reboot) el8-a01n01 'srv01-test failed there' > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Move virsh_node2_pulsar ( el8-a01n01 -> el8-a01n02 ) > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Stop srv01-test ( el8-a01n01 > ) due to node availability > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Calculated transition 61 (with warnings), saving inputs in > /var/lib/pacemaker/pengine/pe-warn-1.bz2 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Unexpected result (error) was recorded for stop of srv01-test > on el8-a01n01 at Jan 18 02:03:35 2021 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Unexpected result (error) was recorded for stop of srv01-test > on el8-a01n01 at Jan 18 02:03:35 2021 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Cluster node el8-a01n01 will be fenced: srv01-test failed there > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Detected active orphan srv01-test running on el8-a01n01 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Forcing srv01-test away from el8-a01n01 after 1000000 failures > (max=1000000) > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Clearing failure of srv01-test on el8-a01n01 because it is orphaned > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Scheduling Node el8-a01n01 for STONITH > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Stop of failed resource srv01-test is implicit after el8-a01n01 > is fenced > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Fence (reboot) el8-a01n01 'srv01-test failed there' > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Move virsh_node2_pulsar ( el8-a01n01 -> el8-a01n02 ) > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: * Stop srv01-test ( el8-a01n01 > ) due to node availability > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > warning: Calculated transition 62 (with warnings), saving inputs in > /var/lib/pacemaker/pengine/pe-warn-2.bz2 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Requesting fencing (reboot) of node el8-a01n01 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Initiating start operation virsh_node2_pulsar_start_0 locally on > el8-a01n02 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Client pacemaker-controld.490050.72911c98 wants to fence (reboot) > 'el8-a01n01' with device '(any)' > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Requesting peer fencing (reboot) targeting el8-a01n01 > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Setting fail-count-srv01-test#stop_0[el8-a01n01]: INFINITY -> (unset) > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Setting last-failure-srv01-test#stop_0[el8-a01n01]: 1610935435 -> (unset) > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > virsh_node2_pulsar is not eligible to fence (reboot) el8-a01n01: static-list > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > virsh_node1_pulsar is eligible to fence (reboot) el8-a01n01: static-list > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 62 aborted by deletion of > lrm_rsc_op[@id='srv01-test_last_failure_0']: Resource operation removal > Jan 18 02:03:55 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Requesting that el8-a01n02 perform 'reboot' action targeting el8-a01n01 > using 'virsh_node1_pulsar' > Jan 18 02:03:56 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Result of start operation for virsh_node2_pulsar on el8-a01n02: ok > Jan 18 02:03:57 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Operation 'reboot' [646769] (call 4 from pacemaker-controld.490050) for > host 'el8-a01n01' with device 'virsh_node1_pulsar' returned: 0 (OK) > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Node el8-a01n01 state is now lost > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Removing all el8-a01n01 attributes for peer loss > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Node el8-a01n01 state is now lost > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-based[490045]: notice: > Node el8-a01n01 state is now lost > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-based[490045]: notice: > Purged 1 peer with id=1 and/or uname=el8-a01n01 from the membership cache > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Node el8-a01n01 state is now lost > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Purged 1 peer with id=1 and/or uname=el8-a01n01 from the membership cache > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-attrd[490048]: notice: > Purged 1 peer with id=1 and/or uname=el8-a01n01 from the membership cache > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Action 'reboot' targeting el8-a01n01 using virsh_node1_pulsar on behalf > of pacemaker-controld.490050@el8-a01n02: OK > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-fenced[490046]: notice: > Operation 'reboot' targeting el8-a01n01 on el8-a01n02 for > [email protected]: OK > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Stonith operation 4/2:62:0:e827eea0-dedc-4200-a207-c4095621b3c6: > OK (0) > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Peer el8-a01n01 was terminated (reboot) by el8-a01n02 on behalf > of pacemaker-controld.490050: OK > Jan 18 02:03:58 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 62 (Complete=5, Pending=0, Fired=0, Skipped=1, > Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Stopped > Jan 18 02:03:59 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Removing srv01-test from el8-a01n02 > Jan 18 02:03:59 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Calculated transition 63, saving inputs in > /var/lib/pacemaker/pengine/pe-input-103.bz2 > Jan 18 02:03:59 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Initiating monitor operation virsh_node2_pulsar_monitor_60000 > locally on el8-a01n02 > Jan 18 02:03:59 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Initiating delete operation srv01-test_delete_0 locally on > el8-a01n02 > Jan 18 02:03:59 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 63 aborted by deletion of > lrm_resource[@id='srv01-test']: Resource state removal > Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Result of monitor operation for virsh_node2_pulsar on el8-a01n02: ok > Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 63 (Complete=2, Pending=0, Fired=0, Skipped=0, > Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-103.bz2): Complete > Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]: > notice: Calculated transition 64, saving inputs in > /var/lib/pacemaker/pengine/pe-input-104.bz2 > Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, > Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-104.bz2): Complete > Jan 18 02:04:00 el8-a01n02.alteeve.ca pacemaker-controld[490050]: > notice: State transition S_TRANSITION_ENGINE -> S_IDLE > ==== > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
