Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Ken Gaillot
On 06/12/2017 09:23 AM, Klaus Wenninger wrote:
> On 06/12/2017 04:02 PM, Ken Gaillot wrote:
>> On 06/10/2017 10:53 AM, Dan Ragle wrote:
>>> So I guess my bottom line question is: How does one tell Pacemaker that
>>> the individual legs of globally unique clones should *always* be spread
>>> across the available nodes whenever possible, regardless of the number
>>> of processes on any one of the nodes? For kicks I did try:
>>>
>>> pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY
>>>
>>> but it responded with an error about an invalid character (:).
>> There isn't a way currently. It will try to do that when initially
>> placing them, but once they've moved together, there's no simple way to
>> tell them to move. I suppose a workaround might be to create a dummy
>> resource that you constrain to that node so it looks like the other node
>> is less busy.
> 
> Another ugly dummy resource idea - maybe less fragile -
> and not tried out:
> One could have 2 dummy resources that would rather like
> to live on different nodes - no issue with primitives - and
> do depend collocated on ClusterIP.
> Wouldn't that pull them apart once possible?

Sounds like a good idea

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Klaus Wenninger
On 06/12/2017 04:02 PM, Ken Gaillot wrote:
> On 06/10/2017 10:53 AM, Dan Ragle wrote:
>> So I guess my bottom line question is: How does one tell Pacemaker that
>> the individual legs of globally unique clones should *always* be spread
>> across the available nodes whenever possible, regardless of the number
>> of processes on any one of the nodes? For kicks I did try:
>>
>> pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY
>>
>> but it responded with an error about an invalid character (:).
> There isn't a way currently. It will try to do that when initially
> placing them, but once they've moved together, there's no simple way to
> tell them to move. I suppose a workaround might be to create a dummy
> resource that you constrain to that node so it looks like the other node
> is less busy.

Another ugly dummy resource idea - maybe less fragile -
and not tried out:
One could have 2 dummy resources that would rather like
to live on different nodes - no issue with primitives - and
do depend collocated on ClusterIP.
Wouldn't that pull them apart once possible?

>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Klaus Wenninger

Senior Software Engineer, EMEA ENG Openstack Infrastructure

Red Hat

kwenn...@redhat.com   


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Klaus Wenninger
On 06/12/2017 03:24 PM, Dan Ragle wrote:
>
>
> On 6/12/2017 2:03 AM, Klaus Wenninger wrote:
>> On 06/10/2017 05:53 PM, Dan Ragle wrote:
>>>
>>>
>>> On 5/25/2017 5:33 PM, Ken Gaillot wrote:
 On 05/24/2017 12:27 PM, Dan Ragle wrote:
> I suspect this has been asked before and apologize if so, a google
> search didn't seem to find anything that was helpful to me ...
>
> I'm setting up an active/active two-node cluster and am having an
> issue
> where one of my two defined clusterIPs will not return to the other
> node
> after it (the other node) has been recovered.
>
> I'm running on CentOS 7.3. My resource setups look like this:
>
> # cibadmin -Q|grep dc-version
>   name="dc-version"
> value="1.1.15-11.el7_3.4-e174ec8"/>
>
> # pcs resource show PublicIP-clone
>   Clone: PublicIP-clone
>Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
> interleave=true
>Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2)
> Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0
> Meta Attrs: resource-stickiness=0
> Operations: start interval=0s timeout=20s
> (PublicIP-start-interval-0s)
> stop interval=0s timeout=20s
> (PublicIP-stop-interval-0s)
> monitor interval=30s (PublicIP-monitor-interval-30s)
>
> # pcs resource show PrivateIP-clone
>   Clone: PrivateIP-clone
>Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
> interleave=true
>Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2)
> Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24
> Meta Attrs: resource-stickiness=0
> Operations: start interval=0s timeout=20s
> (PrivateIP-start-interval-0s)
> stop interval=0s timeout=20s
> (PrivateIP-stop-interval-0s)
> monitor interval=10s timeout=20s
> (PrivateIP-monitor-interval-10s)
>
> # pcs constraint --full | grep -i publicip
>start WEB-clone then start PublicIP-clone (kind:Mandatory)
> (id:order-WEB-clone-PublicIP-clone-mandatory)
> # pcs constraint --full | grep -i privateip
>start WEB-clone then start PrivateIP-clone (kind:Mandatory)
> (id:order-WEB-clone-PrivateIP-clone-mandatory)

 FYI These constraints cover ordering only. If you also want to be sure
 that the IPs only start on a node where the web service is functional,
 then you also need colocation constraints.

>
> When I first create the resources, they split across the two nodes as
> expected/desired:
>
>   Clone Set: PublicIP-clone [PublicIP] (unique)
>   PublicIP:0(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   PublicIP:1(ocf::heartbeat:IPaddr2):   Started
> node2-pcs
>   Clone Set: PrivateIP-clone [PrivateIP] (unique)
>   PrivateIP:0(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   PrivateIP:1(ocf::heartbeat:IPaddr2):   Started
> node2-pcs
>   Clone Set: WEB-clone [WEB]
>   Started: [ node1-pcs node2-pcs ]
>
> I then put the second node in standby:
>
> # pcs node standby node2-pcs
>
> And the IPs both jump to node1 as expected:
>
>   Clone Set: PublicIP-clone [PublicIP] (unique)
>   PublicIP:0(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   PublicIP:1(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   Clone Set: WEB-clone [WEB]
>   Started: [ node1-pcs ]
>   Stopped: [ node2-pcs ]
>   Clone Set: PrivateIP-clone [PrivateIP] (unique)
>   PrivateIP:0(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   PrivateIP:1(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>
> Then unstandby the second node:
>
> # pcs node unstandby node2-pcs
>
> The publicIP goes back, but the private does not:
>
>   Clone Set: PublicIP-clone [PublicIP] (unique)
>   PublicIP:0(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   PublicIP:1(ocf::heartbeat:IPaddr2):   Started
> node2-pcs
>   Clone Set: WEB-clone [WEB]
>   Started: [ node1-pcs node2-pcs ]
>   Clone Set: PrivateIP-clone [PrivateIP] (unique)
>   PrivateIP:0(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>   PrivateIP:1(ocf::heartbeat:IPaddr2):   Started
> node1-pcs
>
> Anybody see what I'm doing wrong? I'm not seeing anything in the
> logs to
> indicate that it tries node2 and then fails; but I'm fairly new to
> the
> software so it's possible I'm not looking in the right place.

 The pcs status would show any failed actions, and anything
 important in
 the logs would start with "error:" or

Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Ken Gaillot
On 06/10/2017 10:53 AM, Dan Ragle wrote:
> So I guess my bottom line question is: How does one tell Pacemaker that
> the individual legs of globally unique clones should *always* be spread
> across the available nodes whenever possible, regardless of the number
> of processes on any one of the nodes? For kicks I did try:
> 
> pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY
> 
> but it responded with an error about an invalid character (:).
There isn't a way currently. It will try to do that when initially
placing them, but once they've moved together, there's no simple way to
tell them to move. I suppose a workaround might be to create a dummy
resource that you constrain to that node so it looks like the other node
is less busy.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-06-12 Thread Dan Ragle



On 6/12/2017 2:03 AM, Klaus Wenninger wrote:

On 06/10/2017 05:53 PM, Dan Ragle wrote:



On 5/25/2017 5:33 PM, Ken Gaillot wrote:

On 05/24/2017 12:27 PM, Dan Ragle wrote:

I suspect this has been asked before and apologize if so, a google
search didn't seem to find anything that was helpful to me ...

I'm setting up an active/active two-node cluster and am having an issue
where one of my two defined clusterIPs will not return to the other
node
after it (the other node) has been recovered.

I'm running on CentOS 7.3. My resource setups look like this:

# cibadmin -Q|grep dc-version
 

# pcs resource show PublicIP-clone
  Clone: PublicIP-clone
   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
interleave=true
   Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0
Meta Attrs: resource-stickiness=0
Operations: start interval=0s timeout=20s
(PublicIP-start-interval-0s)
stop interval=0s timeout=20s
(PublicIP-stop-interval-0s)
monitor interval=30s (PublicIP-monitor-interval-30s)

# pcs resource show PrivateIP-clone
  Clone: PrivateIP-clone
   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
interleave=true
   Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24
Meta Attrs: resource-stickiness=0
Operations: start interval=0s timeout=20s
(PrivateIP-start-interval-0s)
stop interval=0s timeout=20s
(PrivateIP-stop-interval-0s)
monitor interval=10s timeout=20s
(PrivateIP-monitor-interval-10s)

# pcs constraint --full | grep -i publicip
   start WEB-clone then start PublicIP-clone (kind:Mandatory)
(id:order-WEB-clone-PublicIP-clone-mandatory)
# pcs constraint --full | grep -i privateip
   start WEB-clone then start PrivateIP-clone (kind:Mandatory)
(id:order-WEB-clone-PrivateIP-clone-mandatory)


FYI These constraints cover ordering only. If you also want to be sure
that the IPs only start on a node where the web service is functional,
then you also need colocation constraints.



When I first create the resources, they split across the two nodes as
expected/desired:

  Clone Set: PublicIP-clone [PublicIP] (unique)
  PublicIP:0(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  PublicIP:1(ocf::heartbeat:IPaddr2):   Started
node2-pcs
  Clone Set: PrivateIP-clone [PrivateIP] (unique)
  PrivateIP:0(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  PrivateIP:1(ocf::heartbeat:IPaddr2):   Started
node2-pcs
  Clone Set: WEB-clone [WEB]
  Started: [ node1-pcs node2-pcs ]

I then put the second node in standby:

# pcs node standby node2-pcs

And the IPs both jump to node1 as expected:

  Clone Set: PublicIP-clone [PublicIP] (unique)
  PublicIP:0(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  PublicIP:1(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  Clone Set: WEB-clone [WEB]
  Started: [ node1-pcs ]
  Stopped: [ node2-pcs ]
  Clone Set: PrivateIP-clone [PrivateIP] (unique)
  PrivateIP:0(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  PrivateIP:1(ocf::heartbeat:IPaddr2):   Started
node1-pcs

Then unstandby the second node:

# pcs node unstandby node2-pcs

The publicIP goes back, but the private does not:

  Clone Set: PublicIP-clone [PublicIP] (unique)
  PublicIP:0(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  PublicIP:1(ocf::heartbeat:IPaddr2):   Started
node2-pcs
  Clone Set: WEB-clone [WEB]
  Started: [ node1-pcs node2-pcs ]
  Clone Set: PrivateIP-clone [PrivateIP] (unique)
  PrivateIP:0(ocf::heartbeat:IPaddr2):   Started
node1-pcs
  PrivateIP:1(ocf::heartbeat:IPaddr2):   Started
node1-pcs

Anybody see what I'm doing wrong? I'm not seeing anything in the
logs to
indicate that it tries node2 and then fails; but I'm fairly new to the
software so it's possible I'm not looking in the right place.


The pcs status would show any failed actions, and anything important in
the logs would start with "error:" or "warning:".

At any given time, one of the nodes is the DC, meaning it schedules
actions for the whole cluster. That node will have more "pengine:"
messages in its logs at the time. You can check those logs to see what
decisions were made, as well as a "saving inputs" message to get the
cluster state that was used to make those decisions. There is a
crm_simulate tool that you can run on that file to get more information.

By default, pacemaker will try to balance the number of resources
running on each node, so I'm not sure why in this case node1 has four
resources and node2 has two. crm_simulate might help explain it.

However, there's nothing here telling pacemaker that the instances of
PrivateIP should run on different nodes when possible. With your
existing constraints, pacema