On Thu, 2017-09-28 at 10:49 +0000, Roberto Muñoz Gomez wrote: > > > > > > It is common to get a "Transition aborted" error when try to > > > disable > > > > a resource? > > > > > > Yes, "Transition aborted" is not an error (notice the log is at > > > "info:" > > > level), just an indication that something in the configuration or > > > status changed, and requires a new transition calculation. > > > > > > > > > If its not an error, then why the resource is not disable > > > afterwards? > > > The command seems to be forgotten. > > > > It is very well possible that there is an error. > > The transition being aborted is not an error because a > > configuration > > change would mean that pacemaker (pengine in particular) has to > > reconsider the decisions made before. The calculated transition is > > kind of a write-up was has to be done next. > > The fact that the transition is being aborted makes it actually > > very > > reasonable that the configuration had been updated. > > > > Did you check the config (like target-role) or did you just see > > the > > service still running by other means (systemctl status, ps, ...)? > > > > Possible errors that prevent a service from being stopped should > > appear in the logs after pacemaker has calculated a new transition > > and faces problems executing it (e.g. resource-agent having > > problems to stop the resource). > > > > > > > > > > Maybe I miss something, this is the full log when disabling the > > resource. We have several "disable" commands at the same time > > because we need to stop the application to perform some End-Of-Day > > tasks, so in the middle of one "disable" another comes and then I > > guess the "Transtion aborted" happens. > > > > > > Sep 25 23:50:07 [4492] vttwinformlrz1 cib: info: > > crm_compress_string: Compressed 1106686 bytes into 72331 (ratio > > 15:1) in 171ms > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: Diff: --- 0.18638.0 2 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: Diff: +++ 0.18639.0 (null) > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: -- > > /cib/configuration/resources/group[@id='ctpinet']/primitive[@id='ct > > pinetfh']/meta_attributes[@id='ctpinetfh-meta_attributes'] > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: + /cib: @epoch=18639 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: ++ > > /cib/configuration/resources/group[@id='cmecp2']/primitive[@id='cme > > cp2dv']/meta_attributes[@id='cmecp2dv-meta_attributes']: <nvpair > > id="cmecp2dv-meta_attributes-target-role" name="target-role" > > value="Stopped"/>
FYI, this line above is what "pcs resource disable" does ... set target-role to Stopped > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_process_request: Completed cib_replace operation for section > > configuration: OK (rc=0, origin=vttwinformlrz1/cibadmin/2, > > version=0.18639.0) > > Sep 25 23:50:08 [4501] vttwinformlrz1 crmd: info: > > abort_transition_graph: Transition aborted by deletion of > > meta_attributes[@id='ctpinetfh-meta_attributes']: Configuration > > change | cib=0.18639.0 source=te_update_diff:456 > > path=/cib/configuration/resources/group[@id='ctpinet']/primitive[@i > > d='ctpinetfh']/meta_attributes[@id='ctpinetfh-meta_attributes'] > > complete=false > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > stonith_device_remove: Device 'ctpinetfh' not found (1 active > > devices) Yes, these stonith_device_remove messages annoy me, but I haven't gotten around to improving them yet. They mean nothing significant ... stonithd just has to check if any resource modified in the CIB was a fencing device. If it wasn't, it logs this "not found" message. > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > update_cib_stonith_devices_v2: Updating device list from the > > cib: create meta_attributes[@id='cmecp2dv-meta_attributes'] > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > cib_devices_update: Updating devices to version 0.18639.0 > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > cib_device_update: Device fence1 has been disabled on > > vttwinformlrz1: score=-INFINITY > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_process_request: Forwarding cib_replace operation for section > > configuration to all (origin=local/cibadmin/2) > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > crm_compress_string: Compressed 1106837 bytes into 72490 (ratio > > 15:1) in 171ms > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: Diff: --- 0.18639.0 2 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: Diff: +++ 0.18640.0 (null) > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: -- > > /cib/configuration/resources/group[@id='cmecp2']/primitive[@id='cme > > cp2dv']/meta_attributes[@id='cmecp2dv- > > meta_attributes']/nvpair[@id='cmecp2dv-meta_attributes-target- > > role'] This is interesting: something just deleted the target-role that was set earlier in the same second. So, now the resource won't be stopped. > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: + /cib: @epoch=18640 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: ++ > > /cib/configuration/resources/group[@id='cftmd1']/primitive[@id='cft > > md1s1']: <meta_attributes id="cftmd1s1-meta_attributes"/> > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: > > ++ > > <nvpair id="cftmd1s1-meta_attributes-target-role" > > name="target-role" value="Stopped"/> Here, a different resource is disabled. > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: > > ++ > > </meta_attributes> > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: ++ > > /cib/configuration/resources/group[@id='cinet2']/primitive[@id='cin > > et2dv']/meta_attributes[@id='cinet2dv-meta_attributes']: <nvpair > > id="cinet2dv-meta_attributes-target-role" name="target-role" > > value="Stopped"/> > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_process_request: Completed cib_replace operation for section > > configuration: OK (rc=0, origin=vttwinformlrz1/cibadmin/2, > > version=0.18640.0) > > Sep 25 23:50:08 [4501] vttwinformlrz1 crmd: info: > > abort_transition_graph: Transition aborted by deletion of > > nvpair[@id='cmecp2dv-meta_attributes-target-role']: Configuration > > change | cib=0.18640.0 source=te_update_diff:456 > > path=/cib/configuration/resources/group[@id='cmecp2']/primitive[@id > > ='cmecp2dv']/meta_attributes[@id='cmecp2dv- > > meta_attributes']/nvpair[@id='cmecp2dv-meta_attributes-target- > > role'] complete=false Then the original resource's target-role is deleted again. This is what to investigate -- what is deleting the target-role. It could be something calling "pcs resource enable". > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > stonith_device_remove: Device 'cmecp2dv' not found (1 active > > devices) > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > update_cib_stonith_devices_v2: Updating device list from the > > cib: create primitive[@id='cftmd1s1'] > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > cib_devices_update: Updating devices to version 0.18640.0 > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > cib_device_update: Device fence1 has been disabled on > > vttwinformlrz1: score=-INFINITY > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_process_request: Forwarding cib_replace operation for section > > configuration to all (origin=local/cibadmin/2) > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > crm_compress_string: Compressed 1106885 bytes into 72394 (ratio > > 15:1) in 171ms > > Sep 25 23:50:08 [4497] vttwinformlrz1 lrmd: info: > > log_finished: finished - rsc:ctpinetibfh action:stop > > call_id:6205 pid:10248 exit-code:0 exec-time:1234ms queue-time:0ms > > Sep 25 23:50:08 [4501] vttwinformlrz1 crmd: notice: > > process_lrm_event: Result of stop operation for ctpinetibfh on > > vttwinformlrz1: 0 (ok) | call=6205 key=ctpinetibfh_stop_0 > > confirmed=true cib-update=24192 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: Diff: --- 0.18640.0 2 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: Diff: +++ 0.18640.1 (null) > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: + /cib: @num_updates=1 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_perform_op: + > > /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_reso > > urce[@id='cftmd1s1']/lrm_rsc_op[@id='cftmd1s1_last_0']: > > @operation_key=cftmd1s1_start_0, @operation=start, @transition- > > key=513:17221:0:d060d698-76d6-4a95-8f54-b0cd908aa999, @transition- > > magic=0:0;513:17221:0:d060d698-76d6-4a95-8f54-b0cd908aa999, @call- > > id=10460, @last-run=1506376207, @last-rc-change=1506376207, @exec- > > time=1190 > > Sep 25 23:50:08 [4492] vttwinformlrz1 cib: info: > > cib_process_request: Completed cib_modify operation for section > > status: OK (rc=0, origin=vttwinformlrz2/crmd/9922, > > version=0.18640.1) > > Sep 25 23:50:08 [4501] vttwinformlrz1 crmd: info: > > match_graph_event: Action cftmd1s1_start_0 (513) confirmed on > > vttwinformlrz2 (rc=0) > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > update_cib_stonith_devices_v2: Updating device list from the > > cib: modify lrm_rsc_op[@id='cftmd1s1_last_0'] > > Sep 25 23:50:08 [4495] vttwinformlrz1 stonith-ng: info: > > cib_devices_update: Updating devices to version 0.18640.1 > > > > > > > > > > > > > > > > I noticed the service is still running because the EOD taks fails > > and the next day the program is still running and it should not. > > The forward lines in the log do not show anything regarding this > > resource, it only shows it is 'Started' > > Well we see that the target-role is properly set to 'Stopped' in the > CIB. > As we don't see the pengine running triggered by this config change I > would > suppose that the logs you have attached are not from the DC where the > pengine would be running. (e.g. 'Current DC' shown by crm_mon) > > > Yes it was the DC becasuse the log in the other node is even shorter. > I have noticed the line: "Sep 25 23:50:08 [4495] vttwinformlrz1 > stonith-ng: info: stonith_device_remove: Device 'ctpinetfh' > not found (1 active devices)" > > The resource is not a stonith device. Why try to remove it? > > I have set up some logs in the resource agent to trace if the stop > method is called. > > > Regards, > Klaus > > > Regards > > Roberto -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org