On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais wrote: > Hi list, > > I was explaining how to use crm_simulate to a colleague when he > pointed to me a > non expected and buggy output. > > Here are some simple steps to reproduce: > > $ pcs cluster setup --name usecase srv1 srv2 srv3 > $ pcs cluster start --all > $ pcs property set stonith-enabled=false > $ pcs resource create dummy1 ocf:heartbeat:Dummy \ > state=/tmp/dummy1.state \ > op monitor interval=10s \ > meta migration-threshold=3 resource-stickiness=1 > > Now, we are injecting 2 monitor soft errors, triggering 2 local > recovery > (stop/start): > > $ crm_simulate -S -L -i dummy1_monitor_10@srv1=1 -O /tmp/step1.xml > $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10@srv1=1 > -O /tmp/step2.xml > > > So far so good. A third soft error on monitor push dummy1 out of > srv1, this > was expected. However, the final status of the cluster shows dummy1 > as > started on both srv1 and srv2! > > $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10@srv1=1 > -O /tmp/step3.xml > > Current cluster status: > Online: [ srv1 srv2 srv3 ] > > dummy1 (ocf::heartbeat:Dummy): Started srv1 > > Performing requested modifications > + Injecting dummy1_monitor_10@srv1=1 into the configuration > + Injecting attribute fail-count-dummy1=value++ into /node_state > '1' > + Injecting attribute last-failure-dummy1=1516287891 into > /node_state '1' > > Transition Summary: > * Recover dummy1 ( srv1 -> srv2 ) > > Executing cluster transition: > * Cluster action: clear_failcount for dummy1 on srv1 > * Resource action: dummy1 stop on srv1 > * Resource action: dummy1 cancel=10 on srv1 > * Pseudo action: all_stopped > * Resource action: dummy1 start on srv2 > * Resource action: dummy1 monitor=10000 on srv2 > > Revised cluster status: > Online: [ srv1 srv2 srv3 ] > > dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] > > I suppose this is a bug from crm_simulate? Why is it considering > dummy1 is > started on srv1 when the transition execution stopped it on srv1?
It's definitely a bug, either in crm_simulate or the policy engine itself. Can you attach step2.xml? > > Taking the step3.xml output of this weird result force the cluster to > stop > dummy1 everywhere and start it on srv2 only: > > $ crm_simulate -S -x /tmp/step3.xml > > Current cluster status: > Online: [ srv1 srv2 srv3 ] > > dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] > > Transition Summary: > * Move dummy1 ( srv1 -> srv2 ) > > Executing cluster transition: > * Resource action: dummy1 stop on srv2 > * Resource action: dummy1 stop on srv1 > * Pseudo action: all_stopped > * Resource action: dummy1 start on srv2 > * Resource action: dummy1 monitor=10000 on srv2 > > Revised cluster status: > Online: [ srv1 srv2 srv3 ] > > dummy1 (ocf::heartbeat:Dummy): Started srv2 > > > > Thoughts? > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org