Hello again

After performing some million try&error testcases (at least it feels like that) 
I finally come to the following conclusion:
I necessarily need to understand the scoring and allocation algorithm much 
better!
Unfortunately I could not find any good documentation  - just some articles 
containing hints, but that proved insufficient.
Any pointers are very welcome.

So first my original request contained a wrong colocation constraint (missing 
master role) as Ken pointed out correctly.
This would have been fixed successfully with the correct constraint.

But in the meantime I had advanced to a more complex cluster configuration 
which exhibited another problem - although it looked at first very much alike.

The new cluster config was for a system like that:

         node1:               node2:
    ---------------       ---------------
   |  FS_at_NFS    |     | FS_avoid_NFS  |
    ---------------       ---------------
   |  NFS_server   |
    ---------------
   |  NFS_ip       |
    ---------------
   |  DRBD_fs      |
    ---------------       ---------------
   |  DRBD-master  |     |  DRBD-slave   |
    ---------------       ---------------

Again I tried to move the NFS server to the other node but DRBD-master was not 
promoted to the other node.

As I now know there are 2 conditions necessary to see that problem:
  * the -INFINITY colocation constraint for the FS_avoid_NFS
  * a rather high value for resource stickiness (which was originally set to 
INFINITY)

If I remove the FS_avoid_NFS resource the problem can no longer be reproduced.
As well if I remove the resource stickiness completely (or set it to a small 
value) the problem is gone.

The maximum value I can use for resource stickiness without a problem is the 
score which the DRBD-slave returns minus "2".
(At least that's what my experiments seem to show.)

So apparently there is/was a configuration problem with the INFINITY score for 
stickiness and the -INFINITY score for the colocation constraint.

Moreover I noticed that with my latter configuration there were some pending 
transitions (which were automatically resolved after the 
cluster-recheck-interval, which resulted in my first work-around to reduce this 
interval significantly and thus resolve the problem more quickly).
These pending transitions can only be seen with my order constraints. If I 
remove these then there are no pending transitions (although the demote does 
not happen, with the stickiness set too high).
So maybe this could be a bug ... but I will no further investigate on this.

------------------------------------------------------------------------------------------------

And for all those curious out there here are my scores and what I guess how 
they are calculated.

All primitive resources are colocated with DRBD-master (not one on top of the 
other).
My DRBD-RA returns 10000 for master and 1000 for slave.
I have set resource stickiness to 995.

Here is the idle state before movement:
[root@deneb682 ~]# crm_simulate -LVs

Current cluster status:
Online: [ deneb682 deneb683 ]

 Master/Slave Set: DRBD-master [DRBD]
     Masters: [ deneb682 ]
     Slaves: [ deneb683 ]
 DRBD_fs        (ocf::heartbeat:Dummy): Started deneb682
 NFS_ip (ocf::heartbeat:Dummy): Started deneb682
 NFS_server     (ocf::heartbeat:Dummy): Started deneb682
 FS_at_server   (ocf::heartbeat:Dummy): Started deneb682
 FS_avoid_server        (ocf::heartbeat:Dummy): Started deneb683

Allocation scores:
clone_color: DRBD-master allocation score on deneb682: 3980     <= 4 * 
stickiness (once for each resource on top in the chain)
clone_color: DRBD-master allocation score on deneb683: 0
clone_color: DRBD:0 allocation score on deneb682: 10001         <= master + 1 
for 'started on this node'
clone_color: DRBD:0 allocation score on deneb683: 0
clone_color: DRBD:1 allocation score on deneb682: 0
clone_color: DRBD:1 allocation score on deneb683: 1001          <= slave + 1 
for 'started on this node'
native_color: DRBD:0 allocation score on deneb682: 10001        <= same as 
clone_color above
native_color: DRBD:0 allocation score on deneb683: -995         <= same as 
clone_color above - stickiness
native_color: DRBD:1 allocation score on deneb682: -INFINITY    <= due to 
DRBD:0 already allocated for this node
native_color: DRBD:1 allocation score on deneb683: 6            <= same as 
clone_color above - stickiness
DRBD:0 promotion score on deneb682: 17960                       <= master + 2 * 
4 * stickiness
DRBD:1 promotion score on deneb683: 5                           <= slave - 
stickiness
native_color: DRBD_fs allocation score on deneb682: 10996       <= inherit from 
DRBD:0 (master) + stickiness
native_color: DRBD_fs allocation score on deneb683: -INFINITY   <= due to no 
DRBD master allocated for this node
native_color: NFS_ip allocation score on deneb682: 10996
native_color: NFS_ip allocation score on deneb683: -INFINITY
native_color: NFS_server allocation score on deneb682: 10996
native_color: NFS_server allocation score on deneb683: -INFINITY
native_color: FS_at_server allocation score on deneb682: 10996
native_color: FS_at_server allocation score on deneb683: -INFINITY
native_color: FS_avoid_server allocation score on deneb682: -INFINITY  <= due 
to DRBD master allocated for this node
native_color: FS_avoid_server allocation score on deneb683: 995        <= 
stickiness


And here is the successful transition following the 'pcs resource move 
NFS_server':
Allocation scores:
clone_color: DRBD-master allocation score on deneb682: 1        <= stickiness 
no longer considered; just 1 for 'master on this node'
clone_color: DRBD-master allocation score on deneb683: 0
clone_color: DRBD:0 allocation score on deneb682: 10001         <= no change
clone_color: DRBD:0 allocation score on deneb683: 0
clone_color: DRBD:1 allocation score on deneb682: 0
clone_color: DRBD:1 allocation score on deneb683: 1001          <= no change
native_color: DRBD:0 allocation score on deneb682: 10001        <= no change
native_color: DRBD:0 allocation score on deneb683: -995         <= no change
native_color: DRBD:1 allocation score on deneb682: -INFINITY    <= no change
native_color: DRBD:1 allocation score on deneb683: 6            <= no change
DRBD:1 promotion score on deneb683: 5                           <= no change
DRBD:0 promotion score on deneb682: 1                           <= stickiness 
no longer considered; just 1 for 'master on this node'
native_color: DRBD_fs allocation score on deneb682: -INFINITY
native_color: DRBD_fs allocation score on deneb683: 6
native_color: NFS_ip allocation score on deneb682: -INFINITY
native_color: NFS_ip allocation score on deneb683: 6
native_color: NFS_server allocation score on deneb682: -INFINITY   <= due to 
move constraint
native_color: NFS_server allocation score on deneb683: 6           <= inherit 
from DRBD:1 (slave) - stickiness
native_color: FS_at_server allocation score on deneb682: -INFINITY
native_color: FS_at_server allocation score on deneb683: 6
native_color: FS_avoid_server allocation score on deneb682: 0      <= 
stickiness no longer considered
native_color: FS_avoid_server allocation score on deneb683: -INFINITY


In contrast here the same transition, which does not result in a demote with 
resource stickiness set to 2002:
Allocation scores:
clone_color: DRBD-master allocation score on deneb682: 1        <= similar to 
above
clone_color: DRBD-master allocation score on deneb683: 0
clone_color: DRBD:0 allocation score on deneb682: 10001         <= similar to 
above
clone_color: DRBD:0 allocation score on deneb683: 0
clone_color: DRBD:1 allocation score on deneb682: 0
clone_color: DRBD:1 allocation score on deneb683: 1001          <= similar to 
above
native_color: DRBD:0 allocation score on deneb682: 10001        <= similar to 
above
native_color: DRBD:0 allocation score on deneb683: -2002        <= similar to 
above
native_color: DRBD:1 allocation score on deneb682: -INFINITY    <= similar to 
above
native_color: DRBD:1 allocation score on deneb683: 1001         <= similar to 
above but with changed sign!?
DRBD:0 promotion score on deneb682: 1                           <= similar to 
above
DRBD:1 promotion score on deneb683: 1                           <= I have no 
clue where this could come from!?
native_color: DRBD_fs allocation score on deneb682: 12003       <= inherit from 
DRBD:0 (master) + stickiness
native_color: DRBD_fs allocation score on deneb683: -INFINITY
native_color: NFS_ip allocation score on deneb682: 12003
native_color: NFS_ip allocation score on deneb683: -INFINITY
native_color: NFS_server allocation score on deneb682: -INFINITY   <= due to 
move constraint
native_color: NFS_server allocation score on deneb683: -INFINITY   <= due to no 
DRBD master allocated for this node
native_color: FS_at_server allocation score on deneb682: 12003
native_color: FS_at_server allocation score on deneb683: -INFINITY
native_color: FS_avoid_server allocation score on deneb682: -INFINITY
native_color: FS_avoid_server allocation score on deneb683: 2002   <= stickiness

>From these I can guess how (most of) the scores are calculated in this 
>situation.
But unfortunately that does only little help to understand scoring and 
allocation in advance.
(It's always much easier to devise an explanation afterwards, but sometimes you 
should know in advance.)

Kind regards
Andi


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to