Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration
Here is what pacemaker says right after node1 comes back after standby: Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: All nodes for resource VM_VM1 are unavailable, unclean or shutting down (CLUSTER-1: 1, -100) Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: Could not allocate a node for VM_VM1 Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: Processing VM_VM1_monitor_1 Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color: Resource VM_VM1 cannot run anywhere VM_VM1 gets immediately stopped as soon as node1 re-appears and stays down until its "order/colocation AA resource" comes up on node1. The curious part is that in the opposite case (node2 comes from standby), the failback is ok. Any ideas? Regards, On 17.12.2015 14:51:21 Ulrich Windl wrote: >>> Klechomirschrieb am 17.12.2015 um 14:16 in Nachricht <2102747.TPh6pTdk8c@bobo>: > Hi Ulrich, > This is only a part of the config, which concerns the problem. > Even with dummy resources, the behaviour will be identical, so don't think > that dlm/clvmd res. config will help solving the problem. You could send logs with the actual startup sequence then. > Regards, > KIecho > > On 17.12.2015 08:19:43 Ulrich Windl wrote: >> >>> Klechomir schrieb am 16.12.2015 um 17:30 in >> >>> Nachricht >> >> <5671918e.40...@gmail.com>: >> > On 16.12.2015 17:52, Ken Gaillot wrote: >> >> On 12/16/2015 02:09 AM, Klechomir wrote: >> >>> Hi list, >> >>> I have a cluster with VM resources on a cloned active-active storage. >> >>> >> >>> VirtualDomain resource migrates properly during failover (node >> >>> standby), >> >>> but tries to migrate back too early, during failback, ignoring the >> >>> "order" constraint, telling it to start when the cloned storage is >> >>> up. >> >>> This causes unnecessary VM restart. >> >>> >> >>> Is there any way to make it wait, until its storage resource is up? >> >> >> >> Hi Klecho, >> >> >> >> If you have an order constraint, the cluster will not try to start the >> >> VM until the storage resource agent returns success for its start. If >> >> the storage isn't fully up at that point, then the agent is faulty, >> >> and >> >> should be modified to wait until the storage is truly available before >> >> returning success. >> >> >> >> If you post all your constraints, I can look for anything that might >> >> affect the behavior. >> > >> > Thanks for the reply, Ken >> > >> > Seems to me that that the constraints for a cloned resources act a a >> > bit >> > different. >> > >> > Here is my config: >> > >> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \ >> > >> > params device="/dev/CSD_CDrive1/AA_CDrive1" >> > >> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime" >> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \ >> > >> > params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml" >> > >> > hypervisor="qemu:///system" migration_transport="tcp" \ >> > >> > meta allow-migrate="true" target-role="Started" >> > >> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \ >> > >> > meta interleave="true" resource-stickiness="0" >> > >> > target-role="Started" >> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1 >> > VM_VM1 >> > >> > Every time when a node comes back from standby, the VM tries to live >> > migrate to it long before the filesystem is up. >> >> Hi! >> >> To me your config looks rather incomplete: What about DLM, O2CB, cLVM, >> etc.?>> >> >> ___ >> >> Users mailing list: Users@clusterlabs.org >> >> http://clusterlabs.org/mailman/listinfo/users >> >> >> >> Project Home: http://www.clusterlabs.org >> >> Getting started: >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> Bugs: http://bugs.clusterlabs.org >> > >> > ___ >> > Users mailing list: Users@clusterlabs.org >> > http://clusterlabs.org/mailman/listinfo/users >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration
On 01/08/2016 07:03 AM, Klechomir wrote: > Here is what pacemaker says right after node1 comes back after standby: > > Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: > native_assign_node: All nodes for resource VM_VM1 are unavailable, > unclean or shutting down (CLUSTER-1: 1, -100) > > Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: > native_assign_node: Could not allocate a node for VM_VM1 > > Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: > native_assign_node: Processing VM_VM1_monitor_1 > > Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color: > Resource VM_VM1 cannot run anywhere > > > > VM_VM1 gets immediately stopped as soon as node1 re-appears and stays > down until its "order/colocation AA resource" comes up on node1. > > The curious part is that in the opposite case (node2 comes from > standby), the failback is ok. > > Any ideas? This might be a bug. Can you open a report at http://bugs.clusterlabs.org/ and attach your full CIB and logs from all nodes both when the issue occurs and when node2 handles it correctly? > Regards, > > On 17.12.2015 14:51:21 Ulrich Windl wrote: >> >>> Klechomirschrieb am 17.12.2015 um 14:16 in >> Nachricht >> >> <2102747.TPh6pTdk8c@bobo>: >> > Hi Ulrich, >> > This is only a part of the config, which concerns the problem. >> > Even with dummy resources, the behaviour will be identical, so don't >> think >> > that dlm/clvmd res. config will help solving the problem. >> >> You could send logs with the actual startup sequence then. >> >> > Regards, >> > KIecho >> > >> > On 17.12.2015 08:19:43 Ulrich Windl wrote: >> >> >>> Klechomir schrieb am 16.12.2015 um 17:30 in >> >> >>> Nachricht >> >> >> >> <5671918e.40...@gmail.com>: >> >> > On 16.12.2015 17:52, Ken Gaillot wrote: >> >> >> On 12/16/2015 02:09 AM, Klechomir wrote: >> >> >>> Hi list, >> >> >>> I have a cluster with VM resources on a cloned active-active >> storage. >> >> >>> >> >> >>> VirtualDomain resource migrates properly during failover (node >> >> >>> standby), >> >> >>> but tries to migrate back too early, during failback, ignoring the >> >> >>> "order" constraint, telling it to start when the cloned storage is >> >> >>> up. >> >> >>> This causes unnecessary VM restart. >> >> >>> >> >> >>> Is there any way to make it wait, until its storage resource is >> up? >> >> >> >> >> >> Hi Klecho, >> >> >> >> >> >> If you have an order constraint, the cluster will not try to >> start the >> >> >> VM until the storage resource agent returns success for its >> start. If >> >> >> the storage isn't fully up at that point, then the agent is faulty, >> >> >> and >> >> >> should be modified to wait until the storage is truly available >> before >> >> >> returning success. >> >> >> >> >> >> If you post all your constraints, I can look for anything that >> might >> >> >> affect the behavior. >> >> > >> >> > Thanks for the reply, Ken >> >> > >> >> > Seems to me that that the constraints for a cloned resources act a a >> >> > bit >> >> > different. >> >> > >> >> > Here is my config: >> >> > >> >> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \ >> >> > >> >> > params device="/dev/CSD_CDrive1/AA_CDrive1" >> >> > >> >> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime" >> >> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \ >> >> > >> >> > params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml" >> >> > >> >> > hypervisor="qemu:///system" migration_transport="tcp" \ >> >> > >> >> > meta allow-migrate="true" target-role="Started" >> >> > >> >> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \ >> >> > >> >> > meta interleave="true" resource-stickiness="0" >> >> > >> >> > target-role="Started" >> >> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1 >> >> > VM_VM1 >> >> > >> >> > Every time when a node comes back from standby, the VM tries to live >> >> > migrate to it long before the filesystem is up. >> >> >> >> Hi! >> >> >> >> To me your config looks rather incomplete: What about DLM, O2CB, cLVM, >> >> etc.?>> ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration
Here is what pacemaker says right after node1 comes back after standby: Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: All nodes for resource VM_VM1 are unavailable, unclean or shutting down (CLUSTER-1: 1, -100) Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: Could not allocate a node for VM_VM1 Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: Processing VM_VM1_monitor_1 Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color: Resource VM_VM1 cannot run anywhere VM_VM1 gets immediately stopped as soon as node1 re-appears and stays down until its "order/colocation AA resource" comes up on node1. The curious part is that in the opposite case (node2 comes from standby), the failback is ok. Regards, On 17.12.2015 14:51:21 Ulrich Windl wrote: > >>> Klechomirschrieb am 17.12.2015 um 14:16 in Nachricht > > <2102747.TPh6pTdk8c@bobo>: > > Hi Ulrich, > > This is only a part of the config, which concerns the problem. > > Even with dummy resources, the behaviour will be identical, so don't think > > that dlm/clvmd res. config will help solving the problem. > > You could send logs with the actual startup sequence then. > > > Regards, > > KIecho > > > > On 17.12.2015 08:19:43 Ulrich Windl wrote: > >> >>> Klechomir schrieb am 16.12.2015 um 17:30 in > >> >>> Nachricht > >> > >> <5671918e.40...@gmail.com>: > >> > On 16.12.2015 17:52, Ken Gaillot wrote: > >> >> On 12/16/2015 02:09 AM, Klechomir wrote: > >> >>> Hi list, > >> >>> I have a cluster with VM resources on a cloned active-active storage. > >> >>> > >> >>> VirtualDomain resource migrates properly during failover (node > >> >>> standby), > >> >>> but tries to migrate back too early, during failback, ignoring the > >> >>> "order" constraint, telling it to start when the cloned storage is > >> >>> up. > >> >>> This causes unnecessary VM restart. > >> >>> > >> >>> Is there any way to make it wait, until its storage resource is up? > >> >> > >> >> Hi Klecho, > >> >> > >> >> If you have an order constraint, the cluster will not try to start the > >> >> VM until the storage resource agent returns success for its start. If > >> >> the storage isn't fully up at that point, then the agent is faulty, > >> >> and > >> >> should be modified to wait until the storage is truly available before > >> >> returning success. > >> >> > >> >> If you post all your constraints, I can look for anything that might > >> >> affect the behavior. > >> > > >> > Thanks for the reply, Ken > >> > > >> > Seems to me that that the constraints for a cloned resources act a a > >> > bit > >> > different. > >> > > >> > Here is my config: > >> > > >> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \ > >> > > >> > params device="/dev/CSD_CDrive1/AA_CDrive1" > >> > > >> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime" > >> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \ > >> > > >> > params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml" > >> > > >> > hypervisor="qemu:///system" migration_transport="tcp" \ > >> > > >> > meta allow-migrate="true" target-role="Started" > >> > > >> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \ > >> > > >> > meta interleave="true" resource-stickiness="0" > >> > > >> > target-role="Started" > >> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1 > >> > VM_VM1 > >> > > >> > Every time when a node comes back from standby, the VM tries to live > >> > migrate to it long before the filesystem is up. > >> > >> Hi! > >> > >> To me your config looks rather incomplete: What about DLM, O2CB, cLVM, > >> etc.?>> > >> >> ___ > >> >> Users mailing list: Users@clusterlabs.org > >> >> http://clusterlabs.org/mailman/listinfo/users > >> >> > >> >> Project Home: http://www.clusterlabs.org > >> >> Getting started: > >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> >> Bugs: http://bugs.clusterlabs.org > >> > > >> > ___ > >> > Users mailing list: Users@clusterlabs.org > >> > http://clusterlabs.org/mailman/listinfo/users > >> > > >> > Project Home: http://www.clusterlabs.org > >> > Getting started: > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs: http://bugs.clusterlabs.org > >> > >> ___ > >> Users mailing list: Users@clusterlabs.org > >> http://clusterlabs.org/mailman/listinfo/users > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: