Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration

2016-01-08 Thread Klechomir

Here is what pacemaker says right after node1 comes back after standby:

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: 
native_assign_node: All nodes for resource VM_VM1 are unavailable, 
unclean or shutting down (CLUSTER-1: 1, -100)


Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: 
native_assign_node:  Could not allocate a node for VM_VM1


Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: 
native_assign_node:  Processing VM_VM1_monitor_1


Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color: 
Resource VM_VM1 cannot run anywhere




VM_VM1 gets immediately stopped as soon as node1 re-appears and stays 
down until its "order/colocation AA resource" comes up on node1.


The curious part is that in the opposite case (node2 comes from 
standby), the failback is ok.


Any ideas?

Regards,

On 17.12.2015 14:51:21 Ulrich Windl wrote:

>>> Klechomir  schrieb am 17.12.2015 um 14:16 in Nachricht

<2102747.TPh6pTdk8c@bobo>:
> Hi Ulrich,
> This is only a part of the config, which concerns the problem.
> Even with dummy resources, the behaviour will be identical, so don't think
> that dlm/clvmd res. config will help solving the problem.

You could send logs with the actual startup sequence then.

> Regards,
> KIecho
>
> On 17.12.2015 08:19:43 Ulrich Windl wrote:
>> >>> Klechomir  schrieb am 16.12.2015 um 17:30 in
>> >>> Nachricht
>>
>> <5671918e.40...@gmail.com>:
>> > On 16.12.2015 17:52, Ken Gaillot wrote:
>> >> On 12/16/2015 02:09 AM, Klechomir wrote:
>> >>> Hi list,
>> >>> I have a cluster with VM resources on a cloned active-active storage.
>> >>>
>> >>> VirtualDomain resource migrates properly during failover (node
>> >>> standby),
>> >>> but tries to migrate back too early, during failback, ignoring the
>> >>> "order" constraint, telling it to start when the cloned storage is
>> >>> up.
>> >>> This causes unnecessary VM restart.
>> >>>
>> >>> Is there any way to make it wait, until its storage resource is up?
>> >>
>> >> Hi Klecho,
>> >>
>> >> If you have an order constraint, the cluster will not try to start the
>> >> VM until the storage resource agent returns success for its start. If
>> >> the storage isn't fully up at that point, then the agent is faulty,
>> >> and
>> >> should be modified to wait until the storage is truly available before
>> >> returning success.
>> >>
>> >> If you post all your constraints, I can look for anything that might
>> >> affect the behavior.
>> >
>> > Thanks for the reply, Ken
>> >
>> > Seems to me that that the constraints for a cloned resources act a a
>> > bit
>> > different.
>> >
>> > Here is my config:
>> >
>> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
>> >
>> >  params device="/dev/CSD_CDrive1/AA_CDrive1"
>> >
>> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
>> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \
>> >
>> >  params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
>> >
>> > hypervisor="qemu:///system" migration_transport="tcp" \
>> >
>> >  meta allow-migrate="true" target-role="Started"
>> >
>> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
>> >
>> >  meta interleave="true" resource-stickiness="0"
>> >
>> > target-role="Started"
>> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1
>> > VM_VM1
>> >
>> > Every time when a node comes back from standby, the VM tries to live
>> > migrate to it long before the filesystem is up.
>>
>> Hi!
>>
>> To me your config looks rather incomplete: What about DLM, O2CB, cLVM,
>> etc.?>>
>> >> ___
>> >> Users mailing list: Users@clusterlabs.org
>> >> http://clusterlabs.org/mailman/listinfo/users
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration

2016-01-08 Thread Ken Gaillot
On 01/08/2016 07:03 AM, Klechomir wrote:
> Here is what pacemaker says right after node1 comes back after standby:
> 
> Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug:
> native_assign_node: All nodes for resource VM_VM1 are unavailable,
> unclean or shutting down (CLUSTER-1: 1, -100)
> 
> Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug:
> native_assign_node:  Could not allocate a node for VM_VM1
> 
> Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug:
> native_assign_node:  Processing VM_VM1_monitor_1
> 
> Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color:
> Resource VM_VM1 cannot run anywhere
> 
> 
> 
> VM_VM1 gets immediately stopped as soon as node1 re-appears and stays
> down until its "order/colocation AA resource" comes up on node1.
> 
> The curious part is that in the opposite case (node2 comes from
> standby), the failback is ok.
> 
> Any ideas?

This might be a bug. Can you open a report at
http://bugs.clusterlabs.org/ and attach your full CIB and logs from all
nodes both when the issue occurs and when node2 handles it correctly?

> Regards,
> 
> On 17.12.2015 14:51:21 Ulrich Windl wrote:
>> >>> Klechomir  schrieb am 17.12.2015 um 14:16 in
>> Nachricht
>>
>> <2102747.TPh6pTdk8c@bobo>:
>> > Hi Ulrich,
>> > This is only a part of the config, which concerns the problem.
>> > Even with dummy resources, the behaviour will be identical, so don't
>> think
>> > that dlm/clvmd res. config will help solving the problem.
>>
>> You could send logs with the actual startup sequence then.
>>
>> > Regards,
>> > KIecho
>> >
>> > On 17.12.2015 08:19:43 Ulrich Windl wrote:
>> >> >>> Klechomir  schrieb am 16.12.2015 um 17:30 in
>> >> >>> Nachricht
>> >>
>> >> <5671918e.40...@gmail.com>:
>> >> > On 16.12.2015 17:52, Ken Gaillot wrote:
>> >> >> On 12/16/2015 02:09 AM, Klechomir wrote:
>> >> >>> Hi list,
>> >> >>> I have a cluster with VM resources on a cloned active-active
>> storage.
>> >> >>>
>> >> >>> VirtualDomain resource migrates properly during failover (node
>> >> >>> standby),
>> >> >>> but tries to migrate back too early, during failback, ignoring the
>> >> >>> "order" constraint, telling it to start when the cloned storage is
>> >> >>> up.
>> >> >>> This causes unnecessary VM restart.
>> >> >>>
>> >> >>> Is there any way to make it wait, until its storage resource is
>> up?
>> >> >>
>> >> >> Hi Klecho,
>> >> >>
>> >> >> If you have an order constraint, the cluster will not try to
>> start the
>> >> >> VM until the storage resource agent returns success for its
>> start. If
>> >> >> the storage isn't fully up at that point, then the agent is faulty,
>> >> >> and
>> >> >> should be modified to wait until the storage is truly available
>> before
>> >> >> returning success.
>> >> >>
>> >> >> If you post all your constraints, I can look for anything that
>> might
>> >> >> affect the behavior.
>> >> >
>> >> > Thanks for the reply, Ken
>> >> >
>> >> > Seems to me that that the constraints for a cloned resources act a a
>> >> > bit
>> >> > different.
>> >> >
>> >> > Here is my config:
>> >> >
>> >> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
>> >> >
>> >> >  params device="/dev/CSD_CDrive1/AA_CDrive1"
>> >> >
>> >> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
>> >> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \
>> >> >
>> >> >  params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
>> >> >
>> >> > hypervisor="qemu:///system" migration_transport="tcp" \
>> >> >
>> >> >  meta allow-migrate="true" target-role="Started"
>> >> >
>> >> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
>> >> >
>> >> >  meta interleave="true" resource-stickiness="0"
>> >> >
>> >> > target-role="Started"
>> >> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1
>> >> > VM_VM1
>> >> >
>> >> > Every time when a node comes back from standby, the VM tries to live
>> >> > migrate to it long before the filesystem is up.
>> >>
>> >> Hi!
>> >>
>> >> To me your config looks rather incomplete: What about DLM, O2CB, cLVM,
>> >> etc.?>>


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration

2015-12-17 Thread Klechomir
Here is what pacemaker says right after node1 comes back after standby:

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node: 
All nodes for resource VM_VM1 are unavailable, unclean or shutting down 
(CLUSTER-1: 1, -100)

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node:  
Could not allocate a node for VM_VM1

Dec 16 16:11:41 [4512] CLUSTER-2pengine:debug: native_assign_node:  
Processing VM_VM1_monitor_1

Dec 16 16:11:41 [4512] CLUSTER-2pengine: info: native_color:
Resource VM_VM1 cannot run anywhere



VM_VM1 gets immediately stopped as soon as node1 re-appears and stays down 
until its "order/colocation AA resource" comes up on node1.

The curious part is that in the opposite case (node2 comes from standby), the 
failback is ok.

Regards,

On 17.12.2015 14:51:21 Ulrich Windl wrote:
> >>> Klechomir  schrieb am 17.12.2015 um 14:16 in Nachricht
> 
> <2102747.TPh6pTdk8c@bobo>:
> > Hi Ulrich,
> > This is only a part of the config, which concerns the problem.
> > Even with dummy resources, the behaviour will be identical, so don't think
> > that dlm/clvmd res. config will help solving the problem.
> 
> You could send logs with the actual startup sequence then.
> 
> > Regards,
> > KIecho
> > 
> > On 17.12.2015 08:19:43 Ulrich Windl wrote:
> >> >>> Klechomir  schrieb am 16.12.2015 um 17:30 in
> >> >>> Nachricht
> >> 
> >> <5671918e.40...@gmail.com>:
> >> > On 16.12.2015 17:52, Ken Gaillot wrote:
> >> >> On 12/16/2015 02:09 AM, Klechomir wrote:
> >> >>> Hi list,
> >> >>> I have a cluster with VM resources on a cloned active-active storage.
> >> >>> 
> >> >>> VirtualDomain resource migrates properly during failover (node
> >> >>> standby),
> >> >>> but tries to migrate back too early, during failback, ignoring the
> >> >>> "order" constraint, telling it to start when the cloned storage is
> >> >>> up.
> >> >>> This causes unnecessary VM restart.
> >> >>> 
> >> >>> Is there any way to make it wait, until its storage resource is up?
> >> >> 
> >> >> Hi Klecho,
> >> >> 
> >> >> If you have an order constraint, the cluster will not try to start the
> >> >> VM until the storage resource agent returns success for its start. If
> >> >> the storage isn't fully up at that point, then the agent is faulty,
> >> >> and
> >> >> should be modified to wait until the storage is truly available before
> >> >> returning success.
> >> >> 
> >> >> If you post all your constraints, I can look for anything that might
> >> >> affect the behavior.
> >> > 
> >> > Thanks for the reply, Ken
> >> > 
> >> > Seems to me that that the constraints for a cloned resources act a a
> >> > bit
> >> > different.
> >> > 
> >> > Here is my config:
> >> > 
> >> > primitive p_AA_Filesystem_CDrive1 ocf:heartbeat:Filesystem \
> >> > 
> >> >  params device="/dev/CSD_CDrive1/AA_CDrive1"
> >> > 
> >> > directory="/volumes/AA_CDrive1" fstype="ocfs2" options="rw,noatime"
> >> > primitive VM_VM1 ocf:heartbeat:VirtualDomain \
> >> > 
> >> >  params config="/volumes/AA_CDrive1/VM_VM1/VM1.xml"
> >> > 
> >> > hypervisor="qemu:///system" migration_transport="tcp" \
> >> > 
> >> >  meta allow-migrate="true" target-role="Started"
> >> > 
> >> > clone AA_Filesystem_CDrive1 p_AA_Filesystem_CDrive1 \
> >> > 
> >> >  meta interleave="true" resource-stickiness="0"
> >> > 
> >> > target-role="Started"
> >> > order VM_VM1_after_AA_Filesystem_CDrive1 inf: AA_Filesystem_CDrive1
> >> > VM_VM1
> >> > 
> >> > Every time when a node comes back from standby, the VM tries to live
> >> > migrate to it long before the filesystem is up.
> >> 
> >> Hi!
> >> 
> >> To me your config looks rather incomplete: What about DLM, O2CB, cLVM,
> >> etc.?>> 
> >> >> ___
> >> >> Users mailing list: Users@clusterlabs.org
> >> >> http://clusterlabs.org/mailman/listinfo/users
> >> >> 
> >> >> Project Home: http://www.clusterlabs.org
> >> >> Getting started:
> >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> >> Bugs: http://bugs.clusterlabs.org
> >> > 
> >> > ___
> >> > Users mailing list: Users@clusterlabs.org
> >> > http://clusterlabs.org/mailman/listinfo/users
> >> > 
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started:
> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >> 
> >> ___
> >> Users mailing list: Users@clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >> 
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: