Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-24 Thread Vladislav Bogdanov

24.10.2016 14:22, Nikhil Utane wrote:

I had set resource utilization to 1. Even then it scheduled 2 resources.
Doesn't it honor utilization resources if it doesn't find a free node?


To make utilization work you need to set both:
* node overall capacity (per-node utilization attribute)
* capacity usage by a resource (per-resource utilization attribute)



-Nikhil

On Mon, Oct 24, 2016 at 4:43 PM, Vladislav Bogdanov
mailto:bub...@hoster-ok.com>> wrote:

24.10.2016 14:04, Nikhil Utane wrote:

That is what happened here :(.
When 2 nodes went down, two resources got scheduled on single node.
Isn't there any way to stop this from happening. Colocation
constraint
is not helping.


If it is ok to have some instances not running in such outage cases,
you can limit them to 1-per-node with utilization attributes (as was
suggested earlier). Then, when nodes return, resource instances will
return with (and on!) them.



-Regards
Nikhil

On Sat, Oct 22, 2016 at 12:57 AM, Vladislav Bogdanov
mailto:bub...@hoster-ok.com>
>> wrote:

21.10.2016 19:34, Andrei Borzenkov wrote:

14.10.2016 10:39, Vladislav Bogdanov пишет:


use of utilization (balanced strategy) has one caveat:
resources are
not moved just because of utilization of one node is
less,
when nodes
have the same allocation score for the resource. So,
after the
simultaneus outage of two nodes in a 5-node cluster,
it may
appear
that one node runs two resources and two recovered
nodes run
nothing.


I call this a feature. Every resource move potentially
means service
outage, so it should not happen without explicit action.


In a case I describe that moves could be easily prevented by
using
stickiness (it increases allocation score on a current node).
The issue is that it is impossible to "re-balance" resources in
time-frames when stickiness is zero (over-night maintenance
window).



Original 'utilization' strategy only limits resource
placement, it is
not considered when choosing a node for a resource.



___
Users mailing list: Users@clusterlabs.org

>
http://clusterlabs.org/mailman/listinfo/users

>

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

>
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
 >
http://clusterlabs.org/mailman/listinfo/users

>

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

>
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org

http://clusterlabs.org/mailman/listinfo/users


Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org 
http://clusterlabs.org/mailman/listinfo/users


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-24 Thread Nikhil Utane
I had set resource utilization to 1. Even then it scheduled 2 resources.
Doesn't it honor utilization resources if it doesn't find a free node?

-Nikhil

On Mon, Oct 24, 2016 at 4:43 PM, Vladislav Bogdanov 
wrote:

> 24.10.2016 14:04, Nikhil Utane wrote:
>
>> That is what happened here :(.
>> When 2 nodes went down, two resources got scheduled on single node.
>> Isn't there any way to stop this from happening. Colocation constraint
>> is not helping.
>>
>
> If it is ok to have some instances not running in such outage cases, you
> can limit them to 1-per-node with utilization attributes (as was suggested
> earlier). Then, when nodes return, resource instances will return with (and
> on!) them.
>
>
>
>> -Regards
>> Nikhil
>>
>> On Sat, Oct 22, 2016 at 12:57 AM, Vladislav Bogdanov
>> mailto:bub...@hoster-ok.com>> wrote:
>>
>> 21.10.2016 19:34, Andrei Borzenkov wrote:
>>
>> 14.10.2016 10:39, Vladislav Bogdanov пишет:
>>
>>
>> use of utilization (balanced strategy) has one caveat:
>> resources are
>> not moved just because of utilization of one node is less,
>> when nodes
>> have the same allocation score for the resource. So, after the
>> simultaneus outage of two nodes in a 5-node cluster, it may
>> appear
>> that one node runs two resources and two recovered nodes run
>> nothing.
>>
>>
>> I call this a feature. Every resource move potentially means
>> service
>> outage, so it should not happen without explicit action.
>>
>>
>> In a case I describe that moves could be easily prevented by using
>> stickiness (it increases allocation score on a current node).
>> The issue is that it is impossible to "re-balance" resources in
>> time-frames when stickiness is zero (over-night maintenance window).
>>
>>
>>
>> Original 'utilization' strategy only limits resource
>> placement, it is
>> not considered when choosing a node for a resource.
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> 
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> 
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org > Users@clusterlabs.org>
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> 
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-24 Thread Vladislav Bogdanov

24.10.2016 14:04, Nikhil Utane wrote:

That is what happened here :(.
When 2 nodes went down, two resources got scheduled on single node.
Isn't there any way to stop this from happening. Colocation constraint
is not helping.


If it is ok to have some instances not running in such outage cases, you 
can limit them to 1-per-node with utilization attributes (as was 
suggested earlier). Then, when nodes return, resource instances will 
return with (and on!) them.





-Regards
Nikhil

On Sat, Oct 22, 2016 at 12:57 AM, Vladislav Bogdanov
mailto:bub...@hoster-ok.com>> wrote:

21.10.2016 19:34, Andrei Borzenkov wrote:

14.10.2016 10:39, Vladislav Bogdanov пишет:


use of utilization (balanced strategy) has one caveat:
resources are
not moved just because of utilization of one node is less,
when nodes
have the same allocation score for the resource. So, after the
simultaneus outage of two nodes in a 5-node cluster, it may
appear
that one node runs two resources and two recovered nodes run
nothing.


I call this a feature. Every resource move potentially means service
outage, so it should not happen without explicit action.


In a case I describe that moves could be easily prevented by using
stickiness (it increases allocation score on a current node).
The issue is that it is impossible to "re-balance" resources in
time-frames when stickiness is zero (over-night maintenance window).



Original 'utilization' strategy only limits resource
placement, it is
not considered when choosing a node for a resource.



___
Users mailing list: Users@clusterlabs.org

http://clusterlabs.org/mailman/listinfo/users


Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org 
http://clusterlabs.org/mailman/listinfo/users


Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-24 Thread Nikhil Utane
That is what happened here :(.
When 2 nodes went down, two resources got scheduled on single node.
Isn't there any way to stop this from happening. Colocation constraint is
not helping.

-Regards
Nikhil

On Sat, Oct 22, 2016 at 12:57 AM, Vladislav Bogdanov 
wrote:

> 21.10.2016 19:34, Andrei Borzenkov wrote:
>
>> 14.10.2016 10:39, Vladislav Bogdanov пишет:
>>
>>>
>>> use of utilization (balanced strategy) has one caveat: resources are
>>> not moved just because of utilization of one node is less, when nodes
>>> have the same allocation score for the resource. So, after the
>>> simultaneus outage of two nodes in a 5-node cluster, it may appear
>>> that one node runs two resources and two recovered nodes run
>>> nothing.
>>>
>>>
>> I call this a feature. Every resource move potentially means service
>> outage, so it should not happen without explicit action.
>>
>>
> In a case I describe that moves could be easily prevented by using
> stickiness (it increases allocation score on a current node).
> The issue is that it is impossible to "re-balance" resources in
> time-frames when stickiness is zero (over-night maintenance window).
>
>
>
> Original 'utilization' strategy only limits resource placement, it is
>>> not considered when choosing a node for a resource.
>>>
>>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-21 Thread Vladislav Bogdanov

21.10.2016 19:34, Andrei Borzenkov wrote:

14.10.2016 10:39, Vladislav Bogdanov пишет:


use of utilization (balanced strategy) has one caveat: resources are
not moved just because of utilization of one node is less, when nodes
have the same allocation score for the resource. So, after the
simultaneus outage of two nodes in a 5-node cluster, it may appear
that one node runs two resources and two recovered nodes run
nothing.



I call this a feature. Every resource move potentially means service
outage, so it should not happen without explicit action.



In a case I describe that moves could be easily prevented by using 
stickiness (it increases allocation score on a current node).
The issue is that it is impossible to "re-balance" resources in 
time-frames when stickiness is zero (over-night maintenance window).




Original 'utilization' strategy only limits resource placement, it is
not considered when choosing a node for a resource.




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-21 Thread Andrei Borzenkov
14.10.2016 10:39, Vladislav Bogdanov пишет:
> 
> use of utilization (balanced strategy) has one caveat: resources are
> not moved just because of utilization of one node is less, when nodes
> have the same allocation score for the resource. So, after the
> simultaneus outage of two nodes in a 5-node cluster, it may appear
> that one node runs two resources and two recovered nodes run
> nothing.
> 

I call this a feature. Every resource move potentially means service
outage, so it should not happen without explicit action.

> Original 'utilization' strategy only limits resource placement, it is
> not considered when choosing a node for a resource.
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-18 Thread Ken Gaillot
On 10/17/2016 11:29 PM, Nikhil Utane wrote:
> Thanks Ken.
> I will give it a shot.
> 
> http://oss.clusterlabs.org/pipermail/pacemaker/2011-August/011271.html
> On this thread, if I interpret it correctly, his problem was solved when
> he swapped the anti-location constraint 
> 
> From (mapping to my example)
> cu_2 with cu_4 (score:-INFINITY)
> cu_3 with cu_4 (score:-INFINITY)
> cu_2 with cu_3 (score:-INFINITY)
> 
> To
> cu_2 with cu_4 (score:-INFINITY)
> cu_4 with cu_3 (score:-INFINITY)
> cu_3 with cu_2 (score:-INFINITY)
> 
> Do you think that would make any difference? The way you explained it,
> sounds to me it might.

It would create a dependency loop:

cu_2 must be placed before cu_3
cu_3 must be placed before cu_4
cu_4 must be placed before cu_2
(loop)

The cluster tries to detect and break such loops, but I wouldn't rely on
that resulting in a particular behavior.

> -Regards
> Nikhil
> 
> On Mon, Oct 17, 2016 at 11:36 PM, Ken Gaillot  > wrote:
> 
> On 10/17/2016 09:55 AM, Nikhil Utane wrote:
> > I see these prints.
> >
> > pengine: info: rsc_merge_weights:cu_4: Rolling back scores from cu_3
> > pengine:debug: native_assign_node:Assigning Redun_CU4_Wb30 to cu_4
> > pengine: info: rsc_merge_weights:cu_3: Rolling back scores from cu_2
> > pengine:debug: native_assign_node:Assigning Redund_CU5_WB30 to cu_3
> >
> > Looks like rolling back the scores is causing the new decision to
> > relocate the resources.
> > Am I using the scores incorrectly?
> 
> No, I think this is expected.
> 
> Your anti-colocation constraints place cu_2 and cu_3 relative to cu_4,
> so that means the cluster will place cu_4 first if possible, before
> deciding where the others should go. Similarly, cu_2 has a constraint
> relative to cu_3, so cu_3 gets placed next, and cu_2 is the one left
> out.
> 
> The anti-colocation scores of -INFINITY outweigh the stickiness of 100.
> I'm not sure whether setting stickiness to INFINITY would change
> anything; hopefully, it would stop cu_3 from moving, but cu_2 would
> still be stopped.
> 
> I don't see a good way around this. The cluster has to place some
> resource first, in order to know not to place some other resource on the
> same node. I don't think there's a way to make them "equal", because
> then none of them could be placed to begin with -- unless you went with
> utilization attributes, as someone else suggested, with
> placement-strategy=balanced:
> 
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140521708557280
> 
> 
> 
> >
> > [root@Redund_CU5_WB30 root]# pcs constraint
> > Location Constraints:
> >   Resource: cu_2
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> >   Resource: cu_3
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> >   Resource: cu_4
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> > Ordering Constraints:
> > Colocation Constraints:
> >   cu_2 with cu_4 (score:-INFINITY)
> >   cu_3 with cu_4 (score:-INFINITY)
> >   cu_2 with cu_3 (score:-INFINITY)
> >
> >
> > On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane
> > mailto:nikhil.subscri...@gmail.com>
>  >> wrote:
> >
> > This is driving me insane.
> >
> > This is how the resources were started. Redund_CU1_WB30  was the DC
> > which I rebooted.
> >  cu_4(ocf::redundancy:RedundancyRA):Started Redund_CU1_WB30
> >  cu_2(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
> >  cu_3(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
> >
> > Since the standby node was not UP. I was expecting resource cu_4 to
> > be waiting to be scheduled.
> > But then it re-arranged everything as below.
> >  cu_4(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
> >  cu_2(ocf::redundancy:RedundancyRA):Stopped
> >  cu_3(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
> >
> > There is not much information available in the logs on new DC. It
> > just shows what it has decided to do but nothing to suggest why it
> > did it that way.
> >
> > notice: Start   cu_4(Redun_CU4_Wb30)

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-17 Thread Nikhil Utane
Thanks Ken.
I will give it a shot.

http://oss.clusterlabs.org/pipermail/pacemaker/2011-August/011271.html
On this thread, if I interpret it correctly, his problem was solved when he
swapped the anti-location constraint

>From (mapping to my example)
cu_2 with cu_4 (score:-INFINITY)
cu_3 with cu_4 (score:-INFINITY)
cu_2 with cu_3 (score:-INFINITY)

To
cu_2 with cu_4 (score:-INFINITY)
cu_4 with cu_3 (score:-INFINITY)
cu_3 with cu_2 (score:-INFINITY)

Do you think that would make any difference? The way you explained it,
sounds to me it might.

-Regards
Nikhil

On Mon, Oct 17, 2016 at 11:36 PM, Ken Gaillot  wrote:

> On 10/17/2016 09:55 AM, Nikhil Utane wrote:
> > I see these prints.
> >
> > pengine: info: rsc_merge_weights:cu_4: Rolling back scores from cu_3
> > pengine:debug: native_assign_node:Assigning Redun_CU4_Wb30 to cu_4
> > pengine: info: rsc_merge_weights:cu_3: Rolling back scores from cu_2
> > pengine:debug: native_assign_node:Assigning Redund_CU5_WB30 to cu_3
> >
> > Looks like rolling back the scores is causing the new decision to
> > relocate the resources.
> > Am I using the scores incorrectly?
>
> No, I think this is expected.
>
> Your anti-colocation constraints place cu_2 and cu_3 relative to cu_4,
> so that means the cluster will place cu_4 first if possible, before
> deciding where the others should go. Similarly, cu_2 has a constraint
> relative to cu_3, so cu_3 gets placed next, and cu_2 is the one left out.
>
> The anti-colocation scores of -INFINITY outweigh the stickiness of 100.
> I'm not sure whether setting stickiness to INFINITY would change
> anything; hopefully, it would stop cu_3 from moving, but cu_2 would
> still be stopped.
>
> I don't see a good way around this. The cluster has to place some
> resource first, in order to know not to place some other resource on the
> same node. I don't think there's a way to make them "equal", because
> then none of them could be placed to begin with -- unless you went with
> utilization attributes, as someone else suggested, with
> placement-strategy=balanced:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-
> single/Pacemaker_Explained/index.html#idm140521708557280
>
> >
> > [root@Redund_CU5_WB30 root]# pcs constraint
> > Location Constraints:
> >   Resource: cu_2
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> >   Resource: cu_3
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> >   Resource: cu_4
> > Enabled on: Redun_CU4_Wb30 (score:0)
> > Enabled on: Redund_CU5_WB30 (score:0)
> > Enabled on: Redund_CU3_WB30 (score:0)
> > Enabled on: Redund_CU1_WB30 (score:0)
> > Ordering Constraints:
> > Colocation Constraints:
> >   cu_2 with cu_4 (score:-INFINITY)
> >   cu_3 with cu_4 (score:-INFINITY)
> >   cu_2 with cu_3 (score:-INFINITY)
> >
> >
> > On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane
> > mailto:nikhil.subscri...@gmail.com>>
> wrote:
> >
> > This is driving me insane.
> >
> > This is how the resources were started. Redund_CU1_WB30  was the DC
> > which I rebooted.
> >  cu_4(ocf::redundancy:RedundancyRA):Started Redund_CU1_WB30
> >  cu_2(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
> >  cu_3(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
> >
> > Since the standby node was not UP. I was expecting resource cu_4 to
> > be waiting to be scheduled.
> > But then it re-arranged everything as below.
> >  cu_4(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30
> >  cu_2(ocf::redundancy:RedundancyRA):Stopped
> >  cu_3(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30
> >
> > There is not much information available in the logs on new DC. It
> > just shows what it has decided to do but nothing to suggest why it
> > did it that way.
> >
> > notice: Start   cu_4(Redun_CU4_Wb30)
> > notice: Stopcu_2(Redund_CU5_WB30)
> > notice: Movecu_3(Started Redun_CU4_Wb30 -> Redund_CU5_WB30)
> >
> > I have default stickiness set to 100 which is higher than any score
> > that I have configured.
> > I have migration_threshold set to 1. Should I bump that up instead?
> >
> > -Thanks
> > Nikhil
> >
> > On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot  > > wrote:
> >
> > On 10/14/2016 06:56 AM, Nikhil Utane wrote:
> > > Hi,
> > >
> > > Thank you for the responses so far.
> > > I added reverse colocation as well. However seeing some other
> issue in
> > > resource movement that I am analyzing.
> > >
> > > Thinking further on this, why doesn't "/a not with b" does not
> > imply "b
> > > not with a"?/
> > > Coz wouldn't putti

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-17 Thread Ken Gaillot
On 10/17/2016 09:55 AM, Nikhil Utane wrote:
> I see these prints. 
> 
> pengine: info: rsc_merge_weights:cu_4: Rolling back scores from cu_3 
> pengine:debug: native_assign_node:Assigning Redun_CU4_Wb30 to cu_4
> pengine: info: rsc_merge_weights:cu_3: Rolling back scores from cu_2 
> pengine:debug: native_assign_node:Assigning Redund_CU5_WB30 to cu_3   
> 
> Looks like rolling back the scores is causing the new decision to
> relocate the resources.
> Am I using the scores incorrectly?

No, I think this is expected.

Your anti-colocation constraints place cu_2 and cu_3 relative to cu_4,
so that means the cluster will place cu_4 first if possible, before
deciding where the others should go. Similarly, cu_2 has a constraint
relative to cu_3, so cu_3 gets placed next, and cu_2 is the one left out.

The anti-colocation scores of -INFINITY outweigh the stickiness of 100.
I'm not sure whether setting stickiness to INFINITY would change
anything; hopefully, it would stop cu_3 from moving, but cu_2 would
still be stopped.

I don't see a good way around this. The cluster has to place some
resource first, in order to know not to place some other resource on the
same node. I don't think there's a way to make them "equal", because
then none of them could be placed to begin with -- unless you went with
utilization attributes, as someone else suggested, with
placement-strategy=balanced:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140521708557280

> 
> [root@Redund_CU5_WB30 root]# pcs constraint
> Location Constraints:
>   Resource: cu_2
> Enabled on: Redun_CU4_Wb30 (score:0)
> Enabled on: Redund_CU5_WB30 (score:0)
> Enabled on: Redund_CU3_WB30 (score:0)
> Enabled on: Redund_CU1_WB30 (score:0)
>   Resource: cu_3
> Enabled on: Redun_CU4_Wb30 (score:0)
> Enabled on: Redund_CU5_WB30 (score:0)
> Enabled on: Redund_CU3_WB30 (score:0)
> Enabled on: Redund_CU1_WB30 (score:0)
>   Resource: cu_4
> Enabled on: Redun_CU4_Wb30 (score:0)
> Enabled on: Redund_CU5_WB30 (score:0)
> Enabled on: Redund_CU3_WB30 (score:0)
> Enabled on: Redund_CU1_WB30 (score:0)
> Ordering Constraints:
> Colocation Constraints:
>   cu_2 with cu_4 (score:-INFINITY)
>   cu_3 with cu_4 (score:-INFINITY)
>   cu_2 with cu_3 (score:-INFINITY)
> 
> 
> On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane
> mailto:nikhil.subscri...@gmail.com>> wrote:
> 
> This is driving me insane. 
> 
> This is how the resources were started. Redund_CU1_WB30  was the DC
> which I rebooted.
>  cu_4(ocf::redundancy:RedundancyRA):Started Redund_CU1_WB30 
>  cu_2(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30 
>  cu_3(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30 
> 
> Since the standby node was not UP. I was expecting resource cu_4 to
> be waiting to be scheduled.
> But then it re-arranged everything as below.
>  cu_4(ocf::redundancy:RedundancyRA):Started Redun_CU4_Wb30 
>  cu_2(ocf::redundancy:RedundancyRA):Stopped 
>  cu_3(ocf::redundancy:RedundancyRA):Started Redund_CU5_WB30 
> 
> There is not much information available in the logs on new DC. It
> just shows what it has decided to do but nothing to suggest why it
> did it that way.
> 
> notice: Start   cu_4(Redun_CU4_Wb30)   
> notice: Stopcu_2(Redund_CU5_WB30)  
> notice: Movecu_3(Started Redun_CU4_Wb30 -> Redund_CU5_WB30)
> 
> I have default stickiness set to 100 which is higher than any score
> that I have configured.
> I have migration_threshold set to 1. Should I bump that up instead?
> 
> -Thanks
> Nikhil
> 
> On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot  > wrote:
> 
> On 10/14/2016 06:56 AM, Nikhil Utane wrote:
> > Hi,
> >
> > Thank you for the responses so far.
> > I added reverse colocation as well. However seeing some other issue 
> in
> > resource movement that I am analyzing.
> >
> > Thinking further on this, why doesn't "/a not with b" does not
> imply "b
> > not with a"?/
> > Coz wouldn't putting "b with a" violate "a not with b"?
> >
> > Can someone confirm that colocation is required to be configured 
> both ways?
> 
> The anti-colocation should only be defined one-way. Otherwise,
> you get a
> dependency loop (as seen in logs you showed elsewhere).
> 
> The one-way constraint is enough to keep the resources apart.
> However,
> the question is whether the cluster might move resources around
> unnecessarily.
> 
> For example, "A not with B" means that the cluster will place B
> first,
> then place A somewhere else. So, if B's node fails, can the cluster
> decide that A's node is now the best place for B, and move A to
>  

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-17 Thread Nikhil Utane
I see these prints.

pengine: info: rsc_merge_weights: cu_4: Rolling back scores from cu_3
pengine:debug: native_assign_node: Assigning Redun_CU4_Wb30 to cu_4
pengine: info: rsc_merge_weights: cu_3: Rolling back scores from cu_2
pengine:debug: native_assign_node: Assigning Redund_CU5_WB30 to cu_3

Looks like rolling back the scores is causing the new decision to relocate
the resources.
Am I using the scores incorrectly?

[root@Redund_CU5_WB30 root]# pcs constraint
Location Constraints:
  Resource: cu_2
Enabled on: Redun_CU4_Wb30 (score:0)
Enabled on: Redund_CU5_WB30 (score:0)
Enabled on: Redund_CU3_WB30 (score:0)
Enabled on: Redund_CU1_WB30 (score:0)
  Resource: cu_3
Enabled on: Redun_CU4_Wb30 (score:0)
Enabled on: Redund_CU5_WB30 (score:0)
Enabled on: Redund_CU3_WB30 (score:0)
Enabled on: Redund_CU1_WB30 (score:0)
  Resource: cu_4
Enabled on: Redun_CU4_Wb30 (score:0)
Enabled on: Redund_CU5_WB30 (score:0)
Enabled on: Redund_CU3_WB30 (score:0)
Enabled on: Redund_CU1_WB30 (score:0)
Ordering Constraints:
Colocation Constraints:
  cu_2 with cu_4 (score:-INFINITY)
  cu_3 with cu_4 (score:-INFINITY)
  cu_2 with cu_3 (score:-INFINITY)


On Mon, Oct 17, 2016 at 8:16 PM, Nikhil Utane 
wrote:

> This is driving me insane.
>
> This is how the resources were started. Redund_CU1_WB30  was the DC which
> I rebooted.
>  cu_4 (ocf::redundancy:RedundancyRA): Started Redund_CU1_WB30
>  cu_2 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
>  cu_3 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30
>
> Since the standby node was not UP. I was expecting resource cu_4 to be
> waiting to be scheduled.
> But then it re-arranged everything as below.
>  cu_4 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30
>  cu_2 (ocf::redundancy:RedundancyRA): Stopped
>  cu_3 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
>
> There is not much information available in the logs on new DC. It just
> shows what it has decided to do but nothing to suggest why it did it that
> way.
>
> notice: Start   cu_4 (Redun_CU4_Wb30)
> notice: Stopcu_2 (Redund_CU5_WB30)
> notice: Movecu_3 (Started Redun_CU4_Wb30 -> Redund_CU5_WB30)
>
> I have default stickiness set to 100 which is higher than any score that I
> have configured.
> I have migration_threshold set to 1. Should I bump that up instead?
>
> -Thanks
> Nikhil
>
> On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot  wrote:
>
>> On 10/14/2016 06:56 AM, Nikhil Utane wrote:
>> > Hi,
>> >
>> > Thank you for the responses so far.
>> > I added reverse colocation as well. However seeing some other issue in
>> > resource movement that I am analyzing.
>> >
>> > Thinking further on this, why doesn't "/a not with b" does not imply "b
>> > not with a"?/
>> > Coz wouldn't putting "b with a" violate "a not with b"?
>> >
>> > Can someone confirm that colocation is required to be configured both
>> ways?
>>
>> The anti-colocation should only be defined one-way. Otherwise, you get a
>> dependency loop (as seen in logs you showed elsewhere).
>>
>> The one-way constraint is enough to keep the resources apart. However,
>> the question is whether the cluster might move resources around
>> unnecessarily.
>>
>> For example, "A not with B" means that the cluster will place B first,
>> then place A somewhere else. So, if B's node fails, can the cluster
>> decide that A's node is now the best place for B, and move A to a free
>> node, rather than simply start B on the free node?
>>
>> The cluster does take dependencies into account when placing a resource,
>> so I would hope that wouldn't happen. But I'm not sure. Having some
>> stickiness might help, so that A has some preference against moving.
>>
>> > -Thanks
>> > Nikhil
>> >
>> > /
>> > /
>> >
>> > On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
>> > mailto:bub...@hoster-ok.com>> wrote:
>> >
>> > On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
>> > > > > wrote:
>> >  Nikhil Utane > > > schrieb am 13.10.2016 um
>> > >16:43 in
>> > >Nachricht
>> > >> gmail.com
>> > > @mail.gmail.com>>:
>> > >> Ulrich,
>> > >>
>> > >> I have 4 resources only (not 5, nodes are 5). So then I only
>> need 6
>> > >> constraints, right?
>> > >>
>> > >>  [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
>> > >> [1,] "A"  "A"  "A""B"   "B""C"
>> > >> [2,] "B"  "C"  "D"   "C"  "D""D"
>> > >
>> > >Sorry for my confusion. As Andrei Borzenkovsaid in
>> > >> gmail.com
>> > > %2...@mail.gmail.com>>
>> > >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
>> > >wonder whether an easier solution would be using "utilization": If
>> > >every node has one token to give, and every resource needs on

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-17 Thread Nikhil Utane
This is driving me insane.

This is how the resources were started. Redund_CU1_WB30  was the DC which I
rebooted.
 cu_4 (ocf::redundancy:RedundancyRA): Started Redund_CU1_WB30
 cu_2 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
 cu_3 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30

Since the standby node was not UP. I was expecting resource cu_4 to be
waiting to be scheduled.
But then it re-arranged everything as below.
 cu_4 (ocf::redundancy:RedundancyRA): Started Redun_CU4_Wb30
 cu_2 (ocf::redundancy:RedundancyRA): Stopped
 cu_3 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30

There is not much information available in the logs on new DC. It just
shows what it has decided to do but nothing to suggest why it did it that
way.

notice: Start   cu_4 (Redun_CU4_Wb30)
notice: Stopcu_2 (Redund_CU5_WB30)
notice: Movecu_3 (Started Redun_CU4_Wb30 -> Redund_CU5_WB30)

I have default stickiness set to 100 which is higher than any score that I
have configured.
I have migration_threshold set to 1. Should I bump that up instead?

-Thanks
Nikhil

On Sat, Oct 15, 2016 at 12:36 AM, Ken Gaillot  wrote:

> On 10/14/2016 06:56 AM, Nikhil Utane wrote:
> > Hi,
> >
> > Thank you for the responses so far.
> > I added reverse colocation as well. However seeing some other issue in
> > resource movement that I am analyzing.
> >
> > Thinking further on this, why doesn't "/a not with b" does not imply "b
> > not with a"?/
> > Coz wouldn't putting "b with a" violate "a not with b"?
> >
> > Can someone confirm that colocation is required to be configured both
> ways?
>
> The anti-colocation should only be defined one-way. Otherwise, you get a
> dependency loop (as seen in logs you showed elsewhere).
>
> The one-way constraint is enough to keep the resources apart. However,
> the question is whether the cluster might move resources around
> unnecessarily.
>
> For example, "A not with B" means that the cluster will place B first,
> then place A somewhere else. So, if B's node fails, can the cluster
> decide that A's node is now the best place for B, and move A to a free
> node, rather than simply start B on the free node?
>
> The cluster does take dependencies into account when placing a resource,
> so I would hope that wouldn't happen. But I'm not sure. Having some
> stickiness might help, so that A has some preference against moving.
>
> > -Thanks
> > Nikhil
> >
> > /
> > /
> >
> > On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
> > mailto:bub...@hoster-ok.com>> wrote:
> >
> > On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
> >  > > wrote:
> >  Nikhil Utane  > > schrieb am 13.10.2016 um
> > >16:43 in
> > >Nachricht
> > > >  mail.gmail.com>>:
> > >> Ulrich,
> > >>
> > >> I have 4 resources only (not 5, nodes are 5). So then I only need
> 6
> > >> constraints, right?
> > >>
> > >>  [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
> > >> [1,] "A"  "A"  "A""B"   "B""C"
> > >> [2,] "B"  "C"  "D"   "C"  "D""D"
> > >
> > >Sorry for my confusion. As Andrei Borzenkovsaid in
> > > >  2...@mail.gmail.com>>
> > >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
> > >wonder whether an easier solution would be using "utilization": If
> > >every node has one token to give, and every resource needs on
> token, no
> > >two resources will run on one node. Sounds like an easier solution
> to
> > >me.
> > >
> > >Regards,
> > >Ulrich
> > >
> > >
> > >>
> > >> I understand that if I configure constraint of R1 with R2 score as
> > >> -infinity, then the same applies for R2 with R1 score as -infinity
> > >(don't
> > >> have to configure it explicitly).
> > >> I am not having a problem of multiple resources getting schedule
> on
> > >the
> > >> same node. Rather, one working resource is unnecessarily getting
> > >relocated.
> > >>
> > >> -Thanks
> > >> Nikhil
> > >>
> > >>
> > >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
> > >> ulrich.wi...@rz.uni-regensburg.de
> > > wrote:
> > >>
> > >>> Hi!
> > >>>
> > >>> Don't you need 10 constraints, excluding every possible pair of
> your
> > >5
> > >>> resources (named A-E here), like in this table (produced with R):
> > >>>
> > >>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> > >>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
> > >>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
> > >>>
> > >>> Ulrich
> > >>>
> > >>> >>> Nikhil Utane  > > schrieb am 13.10.2016
> > >um
> > >>> 15:59 i

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-14 Thread Ken Gaillot
On 10/14/2016 06:56 AM, Nikhil Utane wrote:
> Hi,
> 
> Thank you for the responses so far.
> I added reverse colocation as well. However seeing some other issue in
> resource movement that I am analyzing.
> 
> Thinking further on this, why doesn't "/a not with b" does not imply "b
> not with a"?/
> Coz wouldn't putting "b with a" violate "a not with b"?
> 
> Can someone confirm that colocation is required to be configured both ways?

The anti-colocation should only be defined one-way. Otherwise, you get a
dependency loop (as seen in logs you showed elsewhere).

The one-way constraint is enough to keep the resources apart. However,
the question is whether the cluster might move resources around
unnecessarily.

For example, "A not with B" means that the cluster will place B first,
then place A somewhere else. So, if B's node fails, can the cluster
decide that A's node is now the best place for B, and move A to a free
node, rather than simply start B on the free node?

The cluster does take dependencies into account when placing a resource,
so I would hope that wouldn't happen. But I'm not sure. Having some
stickiness might help, so that A has some preference against moving.

> -Thanks
> Nikhil
> 
> /
> /
> 
> On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov
> mailto:bub...@hoster-ok.com>> wrote:
> 
> On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl
>  > wrote:
>  Nikhil Utane  > schrieb am 13.10.2016 um
> >16:43 in
> >Nachricht
> > 
> >:
> >> Ulrich,
> >>
> >> I have 4 resources only (not 5, nodes are 5). So then I only need 6
> >> constraints, right?
> >>
> >>  [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
> >> [1,] "A"  "A"  "A""B"   "B""C"
> >> [2,] "B"  "C"  "D"   "C"  "D""D"
> >
> >Sorry for my confusion. As Andrei Borzenkovsaid in
> > 
> >
> >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
> >wonder whether an easier solution would be using "utilization": If
> >every node has one token to give, and every resource needs on token, no
> >two resources will run on one node. Sounds like an easier solution to
> >me.
> >
> >Regards,
> >Ulrich
> >
> >
> >>
> >> I understand that if I configure constraint of R1 with R2 score as
> >> -infinity, then the same applies for R2 with R1 score as -infinity
> >(don't
> >> have to configure it explicitly).
> >> I am not having a problem of multiple resources getting schedule on
> >the
> >> same node. Rather, one working resource is unnecessarily getting
> >relocated.
> >>
> >> -Thanks
> >> Nikhil
> >>
> >>
> >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
> >> ulrich.wi...@rz.uni-regensburg.de
> > wrote:
> >>
> >>> Hi!
> >>>
> >>> Don't you need 10 constraints, excluding every possible pair of your
> >5
> >>> resources (named A-E here), like in this table (produced with R):
> >>>
> >>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> >>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
> >>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
> >>>
> >>> Ulrich
> >>>
> >>> >>> Nikhil Utane  > schrieb am 13.10.2016
> >um
> >>> 15:59 in
> >>> Nachricht
> >>>
> > 
> >:
> >>> > Hi,
> >>> >
> >>> > I have 5 nodes and 4 resources configured.
> >>> > I have configured constraint such that no two resources can be
> >>> co-located.
> >>> > I brought down a node (which happened to be DC). I was expecting
> >the
> >>> > resource on the failed node would be migrated to the 5th waiting
> >node
> >>> (that
> >>> > is not running any resource).
> >>> > However what happened was the failed node resource was started on
> >another
> >>> > active node (after stopping it's existing resource) and that
> >node's
> >>> > resource was moved to the waiting node.
> >>> >
> >>> > What could I be doing wrong?
> >>> >
> >>> >  >>> > name="have-watchdog"/>
> >>> >  >value="1.1.14-5a6cdd1"
> >>> > name="dc-version"/>
> >>> >  >>> value="corosync"
> >>> > name="cluster-infrastructure"/>
> >>> >  >>> > name="stonith-enabled"/>
> >>> >  >>> > name="no-quorum-policy"/>
> >>> >  >value="240"
> >>> > name="default-action-timeout"/>
> >>> >  >>> > name="symmetric-cluster"/>
> >>> >
> >>> > # pcs constraint
> >>> > Location Constraints:
> >>>

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-14 Thread Nikhil Utane
I feel the behavior has become worse after adding reverse co-location
constraint.
I started with this. And it was all I wanted it to be.
cu_5 <-> Redund_CU1_WB30
cu_4 <-> Redund_CU2_WB30
cu_3 <-> Redund_CU3_WB30
cu_2 <-> Redund_CU5_WB30

However for some reason pacemaker decided to move cu_2 from Redund_CU5_WB30
to Redund_CU2_WB30.

Any obvious mis-configuration?

*Logs on DC:*
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info: native_print:
cu_5 (ocf::redundancy:RedundancyRA): Started Redund_CU1_WB30
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info: native_print:
cu_4 (ocf::redundancy:RedundancyRA): Started Redund_CU2_WB30
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info: native_print:
cu_3 (ocf::redundancy:RedundancyRA): Started Redund_CU3_WB30
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info: native_print:
cu_2 (ocf::redundancy:RedundancyRA): Started Redund_CU5_WB30
Oct 14 16:30:52 [7362] Redund_CU1_WB30cib: info:
cib_file_backup: Archived previous version as
/dev/shm/lib/pacemaker/cib/cib-65.raw
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_4: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Rolling back scores from cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_3: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Rolling back scores from cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Rolling back scores from cu_3
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_5
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info:
rsc_merge_weights: cu_5: Breaking dependency loop at cu_4
Oct 14 16:30:52 [7366] Redund_CU1_WB30pengine: info: LogActions: Leave
  cu_5 (Started Redund_CU1_WB30)
Oct 14 16:30:52 [7

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-14 Thread Nikhil Utane
Hi,

Thank you for the responses so far.
I added reverse colocation as well. However seeing some other issue in
resource movement that I am analyzing.

Thinking further on this, why doesn't "*a not with b" does not imply "b not
with a"?*
Coz wouldn't putting "b with a" violate "a not with b"?

Can someone confirm that colocation is required to be configured both ways?

-Thanks
Nikhil



On Fri, Oct 14, 2016 at 1:09 PM, Vladislav Bogdanov 
wrote:

> On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl <
> ulrich.wi...@rz.uni-regensburg.de> wrote:
>  Nikhil Utane  schrieb am 13.10.2016 um
> >16:43 in
> >Nachricht
> >:
> >> Ulrich,
> >>
> >> I have 4 resources only (not 5, nodes are 5). So then I only need 6
> >> constraints, right?
> >>
> >>  [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
> >> [1,] "A"  "A"  "A""B"   "B""C"
> >> [2,] "B"  "C"  "D"   "C"  "D""D"
> >
> >Sorry for my confusion. As Andrei Borzenkovsaid in
> >
> >you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
> >wonder whether an easier solution would be using "utilization": If
> >every node has one token to give, and every resource needs on token, no
> >two resources will run on one node. Sounds like an easier solution to
> >me.
> >
> >Regards,
> >Ulrich
> >
> >
> >>
> >> I understand that if I configure constraint of R1 with R2 score as
> >> -infinity, then the same applies for R2 with R1 score as -infinity
> >(don't
> >> have to configure it explicitly).
> >> I am not having a problem of multiple resources getting schedule on
> >the
> >> same node. Rather, one working resource is unnecessarily getting
> >relocated.
> >>
> >> -Thanks
> >> Nikhil
> >>
> >>
> >> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
> >> ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>
> >>> Hi!
> >>>
> >>> Don't you need 10 constraints, excluding every possible pair of your
> >5
> >>> resources (named A-E here), like in this table (produced with R):
> >>>
> >>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> >>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
> >>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
> >>>
> >>> Ulrich
> >>>
> >>> >>> Nikhil Utane  schrieb am 13.10.2016
> >um
> >>> 15:59 in
> >>> Nachricht
> >>>
> >:
> >>> > Hi,
> >>> >
> >>> > I have 5 nodes and 4 resources configured.
> >>> > I have configured constraint such that no two resources can be
> >>> co-located.
> >>> > I brought down a node (which happened to be DC). I was expecting
> >the
> >>> > resource on the failed node would be migrated to the 5th waiting
> >node
> >>> (that
> >>> > is not running any resource).
> >>> > However what happened was the failed node resource was started on
> >another
> >>> > active node (after stopping it's existing resource) and that
> >node's
> >>> > resource was moved to the waiting node.
> >>> >
> >>> > What could I be doing wrong?
> >>> >
> >>> >  >>> > name="have-watchdog"/>
> >>> >  >value="1.1.14-5a6cdd1"
> >>> > name="dc-version"/>
> >>> >  >>> value="corosync"
> >>> > name="cluster-infrastructure"/>
> >>> >  >>> > name="stonith-enabled"/>
> >>> >  >>> > name="no-quorum-policy"/>
> >>> >  >value="240"
> >>> > name="default-action-timeout"/>
> >>> >  >>> > name="symmetric-cluster"/>
> >>> >
> >>> > # pcs constraint
> >>> > Location Constraints:
> >>> >   Resource: cu_2
> >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> >>> > Enabled on: Redund_CU2_WB30 (score:0)
> >>> > Enabled on: Redund_CU3_WB30 (score:0)
> >>> > Enabled on: Redund_CU5_WB30 (score:0)
> >>> > Enabled on: Redund_CU1_WB30 (score:0)
> >>> >   Resource: cu_3
> >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> >>> > Enabled on: Redund_CU2_WB30 (score:0)
> >>> > Enabled on: Redund_CU3_WB30 (score:0)
> >>> > Enabled on: Redund_CU5_WB30 (score:0)
> >>> > Enabled on: Redund_CU1_WB30 (score:0)
> >>> >   Resource: cu_4
> >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> >>> > Enabled on: Redund_CU2_WB30 (score:0)
> >>> > Enabled on: Redund_CU3_WB30 (score:0)
> >>> > Enabled on: Redund_CU5_WB30 (score:0)
> >>> > Enabled on: Redund_CU1_WB30 (score:0)
> >>> >   Resource: cu_5
> >>> > Enabled on: Redun_CU4_Wb30 (score:0)
> >>> > Enabled on: Redund_CU2_WB30 (score:0)
> >>> > Enabled on: Redund_CU3_WB30 (score:0)
> >>> > Enabled on: Redund_CU5_WB30 (score:0)
> >>> > Enabled on: Redund_CU1_WB30 (score:0)
> >>> > Ordering Constraints:
> >>> > Colocation Constraints:
> >>> >   cu_3 with cu_2 (score:-INFINITY)
> >>> >   cu_4 with cu_2 (score:-INFINITY)
> >>> >   cu_4 with cu_3 (score:-INFINITY)
> >>> >   cu_5 with cu_2 (score:-INFINITY)
> >>> >   cu_5 with cu_3 (score:-INFINITY)
> >>> >   cu_5 with cu_4 (score:-INFINITY)
> >>> >
> >>> > -Thanks
> >>> > Nikhil
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> Users mailing list: Users@clusterlabs.org
> >>> http://clusterlabs.org/mailman/listinfo/users
> >>>
> >>> Project Home: http://www.clusterlab

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-14 Thread Vladislav Bogdanov
On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl 
 wrote:
 Nikhil Utane  schrieb am 13.10.2016 um
>16:43 in
>Nachricht
>:
>> Ulrich,
>> 
>> I have 4 resources only (not 5, nodes are 5). So then I only need 6
>> constraints, right?
>> 
>>  [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
>> [1,] "A"  "A"  "A""B"   "B""C"
>> [2,] "B"  "C"  "D"   "C"  "D""D"
>
>Sorry for my confusion. As Andrei Borzenkovsaid in
>
>you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
>wonder whether an easier solution would be using "utilization": If
>every node has one token to give, and every resource needs on token, no
>two resources will run on one node. Sounds like an easier solution to
>me.
>
>Regards,
>Ulrich
>
>
>> 
>> I understand that if I configure constraint of R1 with R2 score as
>> -infinity, then the same applies for R2 with R1 score as -infinity
>(don't
>> have to configure it explicitly).
>> I am not having a problem of multiple resources getting schedule on
>the
>> same node. Rather, one working resource is unnecessarily getting
>relocated.
>> 
>> -Thanks
>> Nikhil
>> 
>> 
>> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
>> ulrich.wi...@rz.uni-regensburg.de> wrote:
>> 
>>> Hi!
>>>
>>> Don't you need 10 constraints, excluding every possible pair of your
>5
>>> resources (named A-E here), like in this table (produced with R):
>>>
>>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
>>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
>>>
>>> Ulrich
>>>
>>> >>> Nikhil Utane  schrieb am 13.10.2016
>um
>>> 15:59 in
>>> Nachricht
>>>
>:
>>> > Hi,
>>> >
>>> > I have 5 nodes and 4 resources configured.
>>> > I have configured constraint such that no two resources can be
>>> co-located.
>>> > I brought down a node (which happened to be DC). I was expecting
>the
>>> > resource on the failed node would be migrated to the 5th waiting
>node
>>> (that
>>> > is not running any resource).
>>> > However what happened was the failed node resource was started on
>another
>>> > active node (after stopping it's existing resource) and that
>node's
>>> > resource was moved to the waiting node.
>>> >
>>> > What could I be doing wrong?
>>> >
>>> > >> > name="have-watchdog"/>
>>> > value="1.1.14-5a6cdd1"
>>> > name="dc-version"/>
>>> > >> value="corosync"
>>> > name="cluster-infrastructure"/>
>>> > >> > name="stonith-enabled"/>
>>> > >> > name="no-quorum-policy"/>
>>> > value="240"
>>> > name="default-action-timeout"/>
>>> > >> > name="symmetric-cluster"/>
>>> >
>>> > # pcs constraint
>>> > Location Constraints:
>>> >   Resource: cu_2
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> >   Resource: cu_3
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> >   Resource: cu_4
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> >   Resource: cu_5
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> > Ordering Constraints:
>>> > Colocation Constraints:
>>> >   cu_3 with cu_2 (score:-INFINITY)
>>> >   cu_4 with cu_2 (score:-INFINITY)
>>> >   cu_4 with cu_3 (score:-INFINITY)
>>> >   cu_5 with cu_2 (score:-INFINITY)
>>> >   cu_5 with cu_3 (score:-INFINITY)
>>> >   cu_5 with cu_4 (score:-INFINITY)
>>> >
>>> > -Thanks
>>> > Nikhil
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>>
>
>
>
>
>___
>Users mailing list: Users@clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

Hi,

use of utilization (balanced strategy) has one caveat: resources are not moved 
just because of utilization of one node is less, when nodes have the same 
allocation score for the resource.
So, after the simultaneus outage of two nodes in a 5-node cluster, it may 
appear that one node runs two resources and two recovered nodes run