Re: [Pacemaker] Corosync fails to start when NIC is absent

2015-01-20 Thread Jan Friesse
Kostiantyn,


 One more thing to clarify.
 You said rebind can be avoided - what does it mean?

By that I mean that as long as you don't shutdown interface everything
will work as expected. Interface shutdown is administrator decision,
system doesn't do it automagically :)

Regards,
  Honza

 
 Thank you,
 Kostya
 
 On Wed, Jan 14, 2015 at 1:31 PM, Kostiantyn Ponomarenko 
 konstantin.ponomare...@gmail.com wrote:
 
 Thank you. Now I am aware of it.

 Thank you,
 Kostya

 On Wed, Jan 14, 2015 at 12:59 PM, Jan Friesse jfrie...@redhat.com wrote:

 Kostiantyn,

 Honza,

 Thank you for helping me.
 So, there is no defined behavior in case one of the interfaces is not in
 the system?

 You are right. There is no defined behavior.

 Regards,
   Honza




 Thank you,
 Kostya

 On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse jfrie...@redhat.com
 wrote:

 Kostiantyn,


 According to the https://access.redhat.com/solutions/638843 , the
 interface, that is defined in the corosync.conf, must be present in
 the
 system (see at the bottom of the article, section ROOT CAUSE).
 To confirm that I made a couple of tests.

 Here is a part of the corosync.conf file (in a free-write form) (also
 attached the origin config file):
 ===
 rrp_mode: passive
 ring0_addr is defined in corosync.conf
 ring1_addr is defined in corosync.conf
 ===

 ---

 Two-node cluster

 ---

 Test #1:
 --
 IP for ring0 is not defines in the system:
 --
 Start Corosync simultaneously on both nodes.
 Corosync fails to start.
 From the logs:
 Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in
 config: No interfaces defined
 Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync
 Cluster
 Engine exiting with status 8 at main.c:1343.
 Result: Corosync and Pacemaker are not running.

 Test #2:
 --
 IP for ring1 is not defines in the system:
 --
 Start Corosync simultaneously on both nodes.
 Corosync starts.
 Start Pacemaker simultaneously on both nodes.
 Pacemaker fails to start.
 From the logs, the last writes from the corosync:
 Jan 8 16:31:29 daemon.err27 corosync[3728]: [TOTEM ] Marking ringid
 0
 interface 169.254.1.3 FAULTY
 Jan 8 16:31:30 daemon.notice29 corosync[3728]: [TOTEM ]
 Automatically
 recovered ring 0
 Result: Corosync and Pacemaker are not running.


 Test #3:

 rrp_mode: active leads to the same result, except Corosync and
 Pacemaker
 init scripts return status running.
 But still vim /var/log/cluster/corosync.log shows a lot of errors
 like:
 Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch:
 Connection
 to the CPG API failed: Library error (2)

 Result: Corosync and Pacemaker show their statuses as running, but
 crm_mon cannot connect to the cluster database. And half of the
 Pacemaker's services are not running (including Cluster Information
 Base
 (CIB)).


 ---

 For a single node mode

 ---

 IP for ring0 is not defines in the system:

 Corosync fails to start.

 IP for ring1 is not defines in the system:

 Corosync and Pacemaker are started.

 It is possible that configuration will be applied successfully (50%),

 and it is possible that the cluster is not running any resources,

 and it is possible that the node cannot be put in a standby mode
 (shows:
 communication error),

 and it is possible that the cluster is running all resources, but
 applied
 configuration is not guaranteed to be fully loaded (some rules can be
 missed).


 ---

 Conclusions:

 ---

 It is possible that in some rare cases (see comments to the bug) the
 cluster will work, but in that case its working state is unstable and
 the
 cluster can stop working every moment.


 So, is it correct? Does my assumptions make any sense? I didn't any
 other
 explanation in the network ... .

 Corosync needs all interfaces during start and runtime. This doesn't
 mean they must be connected (this would make corosync unusable for
 physical NIC/Switch or cable failure), but they must be up and have
 correct ip.

 When this is not the case, corosync rebinds to localhost and weird
 things happens. Removal of this rebinding is long time TODO, but there
 are still more important bugs (especially because rebind can be
 avoided).

 Regards,
   Honza




 Thank you,
 Kostya

 On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko 
 konstantin.ponomare...@gmail.com wrote:

 Hi guys,

 Corosync fails to start if there is no such network interface
 configured
 in the system.
 Even with rrp_mode: passive the problem is the same when at least
 one
 network interface is not configured in the system.

 Is this the expected behavior?
 I 

Re: [Pacemaker] Corosync fails to start when NIC is absent

2015-01-20 Thread Andrei Borzenkov
On Tue, Jan 20, 2015 at 11:50 AM, Jan Friesse jfrie...@redhat.com wrote:
 Kostiantyn,


 One more thing to clarify.
 You said rebind can be avoided - what does it mean?

 By that I mean that as long as you don't shutdown interface everything
 will work as expected. Interface shutdown is administrator decision,
 system doesn't do it automagically :)


What is possible that e.g. during reboot interface (hardware) fails
and is not detected. This would lead to complete outage of node that
could be avoided.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Corosync fails to start when NIC is absent

2015-01-20 Thread Kostiantyn Ponomarenko
Got it. Thank you =)
I just thought about possibility of a NIC to burn down.

Thank you,
Kostya

On Tue, Jan 20, 2015 at 10:50 AM, Jan Friesse jfrie...@redhat.com wrote:

 Kostiantyn,


  One more thing to clarify.
  You said rebind can be avoided - what does it mean?

 By that I mean that as long as you don't shutdown interface everything
 will work as expected. Interface shutdown is administrator decision,
 system doesn't do it automagically :)

 Regards,
   Honza

 
  Thank you,
  Kostya
 
  On Wed, Jan 14, 2015 at 1:31 PM, Kostiantyn Ponomarenko 
  konstantin.ponomare...@gmail.com wrote:
 
  Thank you. Now I am aware of it.
 
  Thank you,
  Kostya
 
  On Wed, Jan 14, 2015 at 12:59 PM, Jan Friesse jfrie...@redhat.com
 wrote:
 
  Kostiantyn,
 
  Honza,
 
  Thank you for helping me.
  So, there is no defined behavior in case one of the interfaces is not
 in
  the system?
 
  You are right. There is no defined behavior.
 
  Regards,
Honza
 
 
 
 
  Thank you,
  Kostya
 
  On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse jfrie...@redhat.com
  wrote:
 
  Kostiantyn,
 
 
  According to the https://access.redhat.com/solutions/638843 , the
  interface, that is defined in the corosync.conf, must be present in
  the
  system (see at the bottom of the article, section ROOT CAUSE).
  To confirm that I made a couple of tests.
 
  Here is a part of the corosync.conf file (in a free-write form)
 (also
  attached the origin config file):
  ===
  rrp_mode: passive
  ring0_addr is defined in corosync.conf
  ring1_addr is defined in corosync.conf
  ===
 
  ---
 
  Two-node cluster
 
  ---
 
  Test #1:
  --
  IP for ring0 is not defines in the system:
  --
  Start Corosync simultaneously on both nodes.
  Corosync fails to start.
  From the logs:
  Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error
 in
  config: No interfaces defined
  Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync
  Cluster
  Engine exiting with status 8 at main.c:1343.
  Result: Corosync and Pacemaker are not running.
 
  Test #2:
  --
  IP for ring1 is not defines in the system:
  --
  Start Corosync simultaneously on both nodes.
  Corosync starts.
  Start Pacemaker simultaneously on both nodes.
  Pacemaker fails to start.
  From the logs, the last writes from the corosync:
  Jan 8 16:31:29 daemon.err27 corosync[3728]: [TOTEM ] Marking
 ringid
  0
  interface 169.254.1.3 FAULTY
  Jan 8 16:31:30 daemon.notice29 corosync[3728]: [TOTEM ]
  Automatically
  recovered ring 0
  Result: Corosync and Pacemaker are not running.
 
 
  Test #3:
 
  rrp_mode: active leads to the same result, except Corosync and
  Pacemaker
  init scripts return status running.
  But still vim /var/log/cluster/corosync.log shows a lot of errors
  like:
  Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch:
  Connection
  to the CPG API failed: Library error (2)
 
  Result: Corosync and Pacemaker show their statuses as running, but
  crm_mon cannot connect to the cluster database. And half of the
  Pacemaker's services are not running (including Cluster Information
  Base
  (CIB)).
 
 
  ---
 
  For a single node mode
 
  ---
 
  IP for ring0 is not defines in the system:
 
  Corosync fails to start.
 
  IP for ring1 is not defines in the system:
 
  Corosync and Pacemaker are started.
 
  It is possible that configuration will be applied successfully
 (50%),
 
  and it is possible that the cluster is not running any resources,
 
  and it is possible that the node cannot be put in a standby mode
  (shows:
  communication error),
 
  and it is possible that the cluster is running all resources, but
  applied
  configuration is not guaranteed to be fully loaded (some rules can
 be
  missed).
 
 
  ---
 
  Conclusions:
 
  ---
 
  It is possible that in some rare cases (see comments to the bug) the
  cluster will work, but in that case its working state is unstable
 and
  the
  cluster can stop working every moment.
 
 
  So, is it correct? Does my assumptions make any sense? I didn't any
  other
  explanation in the network ... .
 
  Corosync needs all interfaces during start and runtime. This doesn't
  mean they must be connected (this would make corosync unusable for
  physical NIC/Switch or cable failure), but they must be up and have
  correct ip.
 
  When this is not the case, corosync rebinds to localhost and weird
  things happens. Removal of this rebinding is long time TODO, but
 there
  are still more important bugs (especially because rebind can be
  avoided).
 
  Regards,
Honza
 
 
 
 
  Thank you,
  Kostya
 
  On Fri, 

[Pacemaker] breaking resource dependencies by replacing resource group by co-location constrains

2015-01-20 Thread Sven Moeller
Hi,

i have a Cluster running on SLES 11 SP3. A resource group is defined. So every 
entry in that resource group is a hard dependency for the folliowing resource 
in that group. If just only one resource fails on both nodes, the resource 
res_MyApplication won't be started. I want to change that behavior. I want the 
cluster to start the resources on just one host. The local resources should be 
started on just the same host, regardless of the status of the CIFS Shares. If 
the CIFS Shares are available they have to be mounted on the same node running 
the application. I would try to accomplish this by deleting the resource group 
grp_application, creating colocation constrains for local fs, service ip and 
application. Additionally I would create co-location constrains for each CIFS 
Share to mount it on the node running the application. Any hints/thoughts on 
that? Is that the right way to achieve what I want?

Here's the currently running config:

node mynodea\
attributes standby=off
node mynodeb \
attributes standby=off
primitive res_Service-IP ocf:heartbeat:IPaddr2 \
params ip=192.168.10.120 cidr_netmask=24 \
op monitor interval=10s timeout=20s depth=0 \
meta target-role=started
primitive res_MyApplication lsb:myapp \
operations $id=res_MyApplication-operations \
op start interval=0 timeout=180 \
op stop interval=0 timeout=180 \
op status interval=0 timeout=600 \
op monitor interval=15 timeout=15 start-delay=120 \
op meta-data interval=0 timeout=600 \
meta target-role=Started
primitive res_mount-CIFSshareData ocf:heartbeat:Filesystem \
params device=//CIFS/Share1 directory=/Datafstype=cifs 
options=uid=30666,gid=30666 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
primitive res_mount-CIFSshareData2 ocf:heartbeat:Filesystem \
params device=//CIFS/Share2 directory=/Data2 fstype=cifs 
options=uid=30666,gid=30666 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
primitive res_mount-CIFSshareData3 ocf:heartbeat:Filesystem \
params device=//CIFS/Share3 directory=/Data3 fstype=cifs 
options=uid=30666,gid=30666 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
primitive res_mount-application ocf:heartbeat:Filesystem \
params 
device=/dev/disk/by-id/scsi-36000d56e324561d9dcae19034e90-part1 
directory=/MyApp fstype=ext3 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
primitive stonith-sbd stonith:external/sbd \
params 
sbd_device=/dev/disk/by-id/scsi-36000d56ea31a36d8dcaebaf5a439 \
meta target-role=Started
group grp_application res_mount-application res_Service-IP 
res_mount-CIFSshareData res_mount-CIFSshareData2  res_mount-CIFSshareData3 
res_MyApplication
location prefer-application grp_application 50: nodea
property $id=cib-bootstrap-options \
stonith-enabled=true \
no-quorum-policy=ignore \
placement-strategy=balanced \
dc-version=1.1.9-2db99f1 \
cluster-infrastructure=classic openais (with plugin) \
expected-quorum-votes=2 \
last-lrm-refresh=1412758622 \
stonith-action=poweroff \
stonith-timeout=216s \
maintenance-mode=true
rsc_defaults $id=rsc-options \
resource-stickiness=100 \
migration-threshold=3
op_defaults $id=op-options \
timeout=600 \
record-pending=true

Thanks and kind regards

Sven

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] breaking resource dependencies by replacing resource group by co-location constrains

2015-01-20 Thread Andrew Beekhof
group grp_application res_mount-application res_Service-IP 
res_mount-CIFSshareData res_mount-CIFSshareData2  res_mount-CIFSshareData3 
res_MyApplication

is just a shortcut for 

colocation res_Service-IP with res_mount-application
colocation res_mount-CIFSshareData with res_Service-IP
...

and

colocation res_mount-application then res_Service-IP
colocation res_Service-IP then res_mount-CIFSshareData
...

just re-order and remove any bit you don't need anymore



 On 21 Jan 2015, at 3:43 am, Sven Moeller smoel...@nichthelfer.de wrote:
 
 Hi,
 
 i have a Cluster running on SLES 11 SP3. A resource group is defined. So 
 every entry in that resource group is a hard dependency for the folliowing 
 resource in that group. If just only one resource fails on both nodes, the 
 resource res_MyApplication won't be started. I want to change that behavior. 
 I want the cluster to start the resources on just one host. The local 
 resources should be started on just the same host, regardless of the status 
 of the CIFS Shares. If the CIFS Shares are available they have to be mounted 
 on the same node running the application. I would try to accomplish this by 
 deleting the resource group grp_application, creating colocation constrains 
 for local fs, service ip and application. Additionally I would create 
 co-location constrains for each CIFS Share to mount it on the node running 
 the application. Any hints/thoughts on that? Is that the right way to 
 achieve what I want?
 
 Here's the currently running config:
 
 node mynodea\
attributes standby=off
 node mynodeb \
attributes standby=off
 primitive res_Service-IP ocf:heartbeat:IPaddr2 \
params ip=192.168.10.120 cidr_netmask=24 \
op monitor interval=10s timeout=20s depth=0 \
meta target-role=started
 primitive res_MyApplication lsb:myapp \
operations $id=res_MyApplication-operations \
op start interval=0 timeout=180 \
op stop interval=0 timeout=180 \
op status interval=0 timeout=600 \
op monitor interval=15 timeout=15 start-delay=120 \
op meta-data interval=0 timeout=600 \
meta target-role=Started
 primitive res_mount-CIFSshareData ocf:heartbeat:Filesystem \
params device=//CIFS/Share1 directory=/Datafstype=cifs 
 options=uid=30666,gid=30666 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
 primitive res_mount-CIFSshareData2 ocf:heartbeat:Filesystem \
params device=//CIFS/Share2 directory=/Data2 fstype=cifs 
 options=uid=30666,gid=30666 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
 primitive res_mount-CIFSshareData3 ocf:heartbeat:Filesystem \
params device=//CIFS/Share3 directory=/Data3 fstype=cifs 
 options=uid=30666,gid=30666 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
 primitive res_mount-application ocf:heartbeat:Filesystem \
params 
 device=/dev/disk/by-id/scsi-36000d56e324561d9dcae19034e90-part1 
 directory=/MyApp fstype=ext3 \
op monitor interval=20 timeout=40 depth=0 \
meta target-role=started
 primitive stonith-sbd stonith:external/sbd \
params 
 sbd_device=/dev/disk/by-id/scsi-36000d56ea31a36d8dcaebaf5a439 \
meta target-role=Started
 group grp_application res_mount-application res_Service-IP 
 res_mount-CIFSshareData res_mount-CIFSshareData2  res_mount-CIFSshareData3 
 res_MyApplication
 location prefer-application grp_application 50: nodea
 property $id=cib-bootstrap-options \
stonith-enabled=true \
no-quorum-policy=ignore \
placement-strategy=balanced \
dc-version=1.1.9-2db99f1 \
cluster-infrastructure=classic openais (with plugin) \
expected-quorum-votes=2 \
last-lrm-refresh=1412758622 \
stonith-action=poweroff \
stonith-timeout=216s \
maintenance-mode=true
 rsc_defaults $id=rsc-options \
resource-stickiness=100 \
migration-threshold=3
 op_defaults $id=op-options \
timeout=600 \
record-pending=true
 
 Thanks and kind regards
 
 Sven
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] One more globally-unique clone question

2015-01-20 Thread Andrew Beekhof

 On 20 Jan 2015, at 4:13 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 20.01.2015 02:47, Andrew Beekhof wrote:
 
 On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov
 bub...@hoster-ok.com wrote:
 
 Hi all,
 
 Trying to reproduce problem with early stop of globally-unique
 clone instances during move to another node I found one more
 interesting problem.
 
 Due to the different order of resources in the CIB and extensive
 use of constraints between other resources (odd number of resources
 cluster-wide) two CLUSTERIP instances are always allocated to the
 same node in the new testing cluster.
 
 Ah, so this is why broker-vips:1 was moving.
 
 That are two different 2-node clusters with different order of resources.
 In the first one broker-vips go after even number of resources, and one 
 instance wants to return to a mother-node after it is brought back online, 
 thus broker-vips:1 is moving.
 
 In the second one, broker-vips go after odd number of resources (actually 
 three more resources are allocated to one node due to constraints) and both 
 boker-vips go to another node.
 
 
 
 What would be the best/preferred way to make them run on different
 nodes by default?
 
 By default they will. I'm assuming its the constraints that are
 preventing this.
 
 I only see that they are allocated similar to any other resources.

Are they allocated in stages though?
Ie. Was there a point at which the mother-node was available but constraints 
prevented broker-vips:1 running there?

 
 
 Getting them to auto-rebalance is the harder problem
 
 I see. Should it be possible to solve it without priority or utilization use?

it meaning auto-rebalancing or your original issue?

 
 
 
 I see following options:
 * Raise priority of globally-unique clone so its instances are
  always allocated first of all.
 * Use utilization attributes (with high values for nodes and low values
 for cluster resources).
  * Anything else?
 
 If I configure virtual IPs one-by-one (without clone), I can add a
 colocation constraint with negative score between them. I do not
 see a way to scale that setup well though (5-10 IPs). So, what
 would be the best option to achieve the same with globally-unique
 cloned resource? May be there should be some internal
 preference/colocation not to place them together (like default
 stickiness=1 for clones)? Or even allow special negative colocation
 constraint and the same resource in both 'what' and 'with'
 (colocation col1 -1: clone clone)?
 
 Best, Vladislav
 
 
 ___ Pacemaker mailing
 list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
 http://bugs.clusterlabs.org
 
 
 ___ Pacemaker mailing
 list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
 http://bugs.clusterlabs.org
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] One more globally-unique clone question

2015-01-20 Thread Vladislav Bogdanov

21.01.2015 03:51, Andrew Beekhof wrote:



On 20 Jan 2015, at 4:13 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote:

20.01.2015 02:47, Andrew Beekhof wrote:



On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov
bub...@hoster-ok.com wrote:

Hi all,

Trying to reproduce problem with early stop of globally-unique
clone instances during move to another node I found one more
interesting problem.

Due to the different order of resources in the CIB and extensive
use of constraints between other resources (odd number of resources
cluster-wide) two CLUSTERIP instances are always allocated to the
same node in the new testing cluster.


Ah, so this is why broker-vips:1 was moving.


That are two different 2-node clusters with different order of resources.
In the first one broker-vips go after even number of resources, and one instance wants to 
return to a mother-node after it is brought back online, thus broker-vips:1 
is moving.

In the second one, broker-vips go after odd number of resources (actually three 
more resources are allocated to one node due to constraints) and both 
boker-vips go to another node.





What would be the best/preferred way to make them run on different
nodes by default?


By default they will. I'm assuming its the constraints that are
preventing this.


I only see that they are allocated similar to any other resources.


Are they allocated in stages though?
Ie. Was there a point at which the mother-node was available but constraints 
prevented broker-vips:1 running there?


There are three pe-inputs for the node start.
First one starts fence device for the other node, dlm+clvm+gfs and drbd 
on the online-back node.
Second one tries to start/promote/move everything else until it is 
interrupted (by the drbd RA?).

Third one finishes that attempt.

And yes, CTDB depends on GFS2 filesystem, so broker-vips:1 can't be 
allocated immediately due to constraints. It is allocated in the second 
pe-input.


May be it is worth sending crm-report to you in order to not overload 
list by long listings and you have complete information?








Getting them to auto-rebalance is the harder problem


I see. Should it be possible to solve it without priority or utilization use?


it meaning auto-rebalancing or your original issue?


I meant auto-rebalancing.










I see following options:
* Raise priority of globally-unique clone so its instances are
always allocated first of all.
* Use utilization attributes (with high values for nodes and low values
for cluster resources).
* Anything else?

If I configure virtual IPs one-by-one (without clone), I can add a
colocation constraint with negative score between them. I do not
see a way to scale that setup well though (5-10 IPs). So, what
would be the best option to achieve the same with globally-unique
cloned resource? May be there should be some internal
preference/colocation not to place them together (like default
stickiness=1 for clones)? Or even allow special negative colocation
constraint and the same resource in both 'what' and 'with'
(colocation col1 -1: clone clone)?

Best, Vladislav


___ Pacemaker mailing
list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
http://bugs.clusterlabs.org



___ Pacemaker mailing
list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org