On 11/10/2016 19:37, Ken Gaillot wrote:
> On 11/09/2016 12:27 PM, CART Andreas wrote:
>> I again started with all resources located at ventsi-clst1 and issued a
>> 'pcs resource move DRBD_global_clst' (the resource next collocated next
>> to the DRBDClone).
>> 
>> With that I end up with all primitive resources stopped and the
>> DRBDClone resource still being master at ventsi-clst1.
>> 
>> Transition Summary:
>> * Start   IPaddrNFS    (ventsi-clst2-sync)
>> * Start   NFSServer    (ventsi-clst2-sync)
>> * Demote  DRBD:0       (Master -> Slave ventsi-clst1-sync)    <=== this
>> demote never happens
>> * Promote DRBD:1       (Slave -> Master ventsi-clst2-sync)
>> * Start   DRBD_global_clst     (ventsi-clst2-sync)
>> * Start   NFS_global_clst      (ventsi-clst1-sync)
>> * Start   BIND_global_clst     (ventsi-clst2-sync)
>
> Strangely, this sequence appears to be ignoring the constraint "start
> DRBD_global_clst then start IPaddrNFS".
>
> Can you open a bug report at http://bugs.clusterlabs.org/ and attach the
> CIB (or pe-input file) in use at this time?
>
> For testing purposes, you may want to try replacing the "start
> DRBD_global_clst then start IPaddrNFS" constraint with "promote
> DRBDClone then start IPaddrNFS" to see whether that makes a difference.

I reproduced the problem in a test environment and hopefully can now provide 
some more information.

The problem seems to be the same with stopping the resources but not demoting 
the master resource (in time).
But this time I noticed that the problem is cleaned up by the cluster after 15 
minutes.
The original transaction had again too less actions. (This time not only the 
demote is missing but stopping other resources as well.)
(Additionally I had some files open at the mounted filesystems. So the unmount 
did not succeed immediately but it took some time to disconnect.)

Exactly the same behavior on the attempt to move the NFS server back (without 
any open files).
This time I tried 'pcs resource cleanup' afterwards, which resolved things to 
the correct state immediately.
(Just to note: In contrast if I perform a 'pcs resource clear DRBD_global_clst' 
in the problematic interim state everything returns to the origin state; i.e. 
as if no 'move' command had been applied.)

Furthermore I tried to add order constraints for the complete "stop"-chain, but 
unfortunately this didn't help either.

In another attempt I added colocation constraints to make each other individual 
resource depend on the master role of the DRBD clone - as well without any 
change in behavior.

All resources move immediately and successfully if I delete the 2 filesystem 
resources on top of the NFS server and then move the NFS server.
(But still not demoting the DRBD master if I try to move the filesystem on top 
of it (which is below the IP addr and the NFS server).)

If I try to move any of the 2 filesystem resources on top of the NFS server 
only this filesystem is stopped but no other resource.
Even more strange 'crm_simulate -Ls' does in this case not show any missing 
actions in the transition summary. This state does not resolve to the intended 
state even after 15 minutes.


So finally I reported a bug for this behavior: "Bug 5305 - part of the resource 
chain not being considered (in time)"

Kind regards
Andi


_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to