On Fri, 2018-01-19 at 00:37 +0100, Jehan-Guillaume de Rorthais wrote: > On Thu, 18 Jan 2018 10:54:33 -0600 > Ken Gaillot <[email protected]> wrote: > > > On Thu, 2018-01-18 at 16:15 +0100, Jehan-Guillaume de Rorthais > > wrote: > > > Hi list, > > > > > > I was explaining how to use crm_simulate to a colleague when he > > > pointed to me a > > > non expected and buggy output. > > > > > > Here are some simple steps to reproduce: > > > > > > $ pcs cluster setup --name usecase srv1 srv2 srv3 > > > $ pcs cluster start --all > > > $ pcs property set stonith-enabled=false > > > $ pcs resource create dummy1 ocf:heartbeat:Dummy \ > > > state=/tmp/dummy1.state \ > > > op monitor interval=10s \ > > > meta migration-threshold=3 resource-stickiness=1 > > > > > > Now, we are injecting 2 monitor soft errors, triggering 2 local > > > recovery > > > (stop/start): > > > > > > $ crm_simulate -S -L -i dummy1_monitor_10@srv1=1 -O > > > /tmp/step1.xml > > > $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10@srv1=1 > > > -O /tmp/step2.xml > > > > > > > > > So far so good. A third soft error on monitor push dummy1 out of > > > srv1, this > > > was expected. However, the final status of the cluster shows > > > dummy1 > > > as > > > started on both srv1 and srv2! > > > > > > $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10@srv1=1 > > > -O /tmp/step3.xml > > > > > > Current cluster status: > > > Online: [ srv1 srv2 srv3 ] > > > > > > dummy1 (ocf::heartbeat:Dummy): Started srv1 > > > > > > Performing requested modifications > > > + Injecting dummy1_monitor_10@srv1=1 into the configuration > > > + Injecting attribute fail-count-dummy1=value++ into > > > /node_state > > > '1' > > > + Injecting attribute last-failure-dummy1=1516287891 into > > > /node_state '1' > > > > > > Transition Summary: > > > * Recover dummy1 ( srv1 -> srv2 ) > > > > > > Executing cluster transition: > > > * Cluster action: clear_failcount for dummy1 on srv1 > > > * Resource action: dummy1 stop on srv1 > > > * Resource action: dummy1 cancel=10 on srv1 > > > * Pseudo action: all_stopped > > > * Resource action: dummy1 start on srv2 > > > * Resource action: dummy1 monitor=10000 on srv2 > > > > > > Revised cluster status: > > > Online: [ srv1 srv2 srv3 ] > > > > > > dummy1 (ocf::heartbeat:Dummy): Started[ srv1 > > > srv2 ] > > > > > > I suppose this is a bug from crm_simulate? Why is it considering > > > dummy1 is > > > started on srv1 when the transition execution stopped it on > > > srv1? > > > > It's definitely a bug, either in crm_simulate or the policy engine > > itself. Can you attach step2.xml? > > Sure, please, find in attachment step2.xml.
I can reproduce the issue with 1.1.16 but not 1.1.17 or later, so whatever it was, it got fixed. -- Ken Gaillot <[email protected]> _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
