Re: [ClusterLabs] ClusterIP won't return to recovered node
On 06/12/2017 09:23 AM, Klaus Wenninger wrote: > On 06/12/2017 04:02 PM, Ken Gaillot wrote: >> On 06/10/2017 10:53 AM, Dan Ragle wrote: >>> So I guess my bottom line question is: How does one tell Pacemaker that >>> the individual legs of globally unique clones should *always* be spread >>> across the available nodes whenever possible, regardless of the number >>> of processes on any one of the nodes? For kicks I did try: >>> >>> pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY >>> >>> but it responded with an error about an invalid character (:). >> There isn't a way currently. It will try to do that when initially >> placing them, but once they've moved together, there's no simple way to >> tell them to move. I suppose a workaround might be to create a dummy >> resource that you constrain to that node so it looks like the other node >> is less busy. > > Another ugly dummy resource idea - maybe less fragile - > and not tried out: > One could have 2 dummy resources that would rather like > to live on different nodes - no issue with primitives - and > do depend collocated on ClusterIP. > Wouldn't that pull them apart once possible? Sounds like a good idea ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 06/12/2017 04:02 PM, Ken Gaillot wrote: > On 06/10/2017 10:53 AM, Dan Ragle wrote: >> So I guess my bottom line question is: How does one tell Pacemaker that >> the individual legs of globally unique clones should *always* be spread >> across the available nodes whenever possible, regardless of the number >> of processes on any one of the nodes? For kicks I did try: >> >> pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY >> >> but it responded with an error about an invalid character (:). > There isn't a way currently. It will try to do that when initially > placing them, but once they've moved together, there's no simple way to > tell them to move. I suppose a workaround might be to create a dummy > resource that you constrain to that node so it looks like the other node > is less busy. Another ugly dummy resource idea - maybe less fragile - and not tried out: One could have 2 dummy resources that would rather like to live on different nodes - no issue with primitives - and do depend collocated on ClusterIP. Wouldn't that pull them apart once possible? > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure Red Hat kwenn...@redhat.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 06/12/2017 03:24 PM, Dan Ragle wrote: > > > On 6/12/2017 2:03 AM, Klaus Wenninger wrote: >> On 06/10/2017 05:53 PM, Dan Ragle wrote: >>> >>> >>> On 5/25/2017 5:33 PM, Ken Gaillot wrote: On 05/24/2017 12:27 PM, Dan Ragle wrote: > I suspect this has been asked before and apologize if so, a google > search didn't seem to find anything that was helpful to me ... > > I'm setting up an active/active two-node cluster and am having an > issue > where one of my two defined clusterIPs will not return to the other > node > after it (the other node) has been recovered. > > I'm running on CentOS 7.3. My resource setups look like this: > > # cibadmin -Q|grep dc-version > name="dc-version" > value="1.1.15-11.el7_3.4-e174ec8"/> > > # pcs resource show PublicIP-clone > Clone: PublicIP-clone >Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true > interleave=true >Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0 > Meta Attrs: resource-stickiness=0 > Operations: start interval=0s timeout=20s > (PublicIP-start-interval-0s) > stop interval=0s timeout=20s > (PublicIP-stop-interval-0s) > monitor interval=30s (PublicIP-monitor-interval-30s) > > # pcs resource show PrivateIP-clone > Clone: PrivateIP-clone >Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true > interleave=true >Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24 > Meta Attrs: resource-stickiness=0 > Operations: start interval=0s timeout=20s > (PrivateIP-start-interval-0s) > stop interval=0s timeout=20s > (PrivateIP-stop-interval-0s) > monitor interval=10s timeout=20s > (PrivateIP-monitor-interval-10s) > > # pcs constraint --full | grep -i publicip >start WEB-clone then start PublicIP-clone (kind:Mandatory) > (id:order-WEB-clone-PublicIP-clone-mandatory) > # pcs constraint --full | grep -i privateip >start WEB-clone then start PrivateIP-clone (kind:Mandatory) > (id:order-WEB-clone-PrivateIP-clone-mandatory) FYI These constraints cover ordering only. If you also want to be sure that the IPs only start on a node where the web service is functional, then you also need colocation constraints. > > When I first create the resources, they split across the two nodes as > expected/desired: > > Clone Set: PublicIP-clone [PublicIP] (unique) > PublicIP:0(ocf::heartbeat:IPaddr2): Started > node1-pcs > PublicIP:1(ocf::heartbeat:IPaddr2): Started > node2-pcs > Clone Set: PrivateIP-clone [PrivateIP] (unique) > PrivateIP:0(ocf::heartbeat:IPaddr2): Started > node1-pcs > PrivateIP:1(ocf::heartbeat:IPaddr2): Started > node2-pcs > Clone Set: WEB-clone [WEB] > Started: [ node1-pcs node2-pcs ] > > I then put the second node in standby: > > # pcs node standby node2-pcs > > And the IPs both jump to node1 as expected: > > Clone Set: PublicIP-clone [PublicIP] (unique) > PublicIP:0(ocf::heartbeat:IPaddr2): Started > node1-pcs > PublicIP:1(ocf::heartbeat:IPaddr2): Started > node1-pcs > Clone Set: WEB-clone [WEB] > Started: [ node1-pcs ] > Stopped: [ node2-pcs ] > Clone Set: PrivateIP-clone [PrivateIP] (unique) > PrivateIP:0(ocf::heartbeat:IPaddr2): Started > node1-pcs > PrivateIP:1(ocf::heartbeat:IPaddr2): Started > node1-pcs > > Then unstandby the second node: > > # pcs node unstandby node2-pcs > > The publicIP goes back, but the private does not: > > Clone Set: PublicIP-clone [PublicIP] (unique) > PublicIP:0(ocf::heartbeat:IPaddr2): Started > node1-pcs > PublicIP:1(ocf::heartbeat:IPaddr2): Started > node2-pcs > Clone Set: WEB-clone [WEB] > Started: [ node1-pcs node2-pcs ] > Clone Set: PrivateIP-clone [PrivateIP] (unique) > PrivateIP:0(ocf::heartbeat:IPaddr2): Started > node1-pcs > PrivateIP:1(ocf::heartbeat:IPaddr2): Started > node1-pcs > > Anybody see what I'm doing wrong? I'm not seeing anything in the > logs to > indicate that it tries node2 and then fails; but I'm fairly new to > the > software so it's possible I'm not looking in the right place. The pcs status would show any failed actions, and anything important in the logs would start with "error:" or
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 06/10/2017 10:53 AM, Dan Ragle wrote: > So I guess my bottom line question is: How does one tell Pacemaker that > the individual legs of globally unique clones should *always* be spread > across the available nodes whenever possible, regardless of the number > of processes on any one of the nodes? For kicks I did try: > > pcs constraint location ClusterIP:0 prefers node1-pcs=INFINITY > > but it responded with an error about an invalid character (:). There isn't a way currently. It will try to do that when initially placing them, but once they've moved together, there's no simple way to tell them to move. I suppose a workaround might be to create a dummy resource that you constrain to that node so it looks like the other node is less busy. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterIP won't return to recovered node
On 6/12/2017 2:03 AM, Klaus Wenninger wrote: On 06/10/2017 05:53 PM, Dan Ragle wrote: On 5/25/2017 5:33 PM, Ken Gaillot wrote: On 05/24/2017 12:27 PM, Dan Ragle wrote: I suspect this has been asked before and apologize if so, a google search didn't seem to find anything that was helpful to me ... I'm setting up an active/active two-node cluster and am having an issue where one of my two defined clusterIPs will not return to the other node after it (the other node) has been recovered. I'm running on CentOS 7.3. My resource setups look like this: # cibadmin -Q|grep dc-version # pcs resource show PublicIP-clone Clone: PublicIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s) stop interval=0s timeout=20s (PublicIP-stop-interval-0s) monitor interval=30s (PublicIP-monitor-interval-30s) # pcs resource show PrivateIP-clone Clone: PrivateIP-clone Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=true Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24 Meta Attrs: resource-stickiness=0 Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s) stop interval=0s timeout=20s (PrivateIP-stop-interval-0s) monitor interval=10s timeout=20s (PrivateIP-monitor-interval-10s) # pcs constraint --full | grep -i publicip start WEB-clone then start PublicIP-clone (kind:Mandatory) (id:order-WEB-clone-PublicIP-clone-mandatory) # pcs constraint --full | grep -i privateip start WEB-clone then start PrivateIP-clone (kind:Mandatory) (id:order-WEB-clone-PrivateIP-clone-mandatory) FYI These constraints cover ordering only. If you also want to be sure that the IPs only start on a node where the web service is functional, then you also need colocation constraints. When I first create the resources, they split across the two nodes as expected/desired: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] I then put the second node in standby: # pcs node standby node2-pcs And the IPs both jump to node1 as expected: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs ] Stopped: [ node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Then unstandby the second node: # pcs node unstandby node2-pcs The publicIP goes back, but the private does not: Clone Set: PublicIP-clone [PublicIP] (unique) PublicIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PublicIP:1(ocf::heartbeat:IPaddr2): Started node2-pcs Clone Set: WEB-clone [WEB] Started: [ node1-pcs node2-pcs ] Clone Set: PrivateIP-clone [PrivateIP] (unique) PrivateIP:0(ocf::heartbeat:IPaddr2): Started node1-pcs PrivateIP:1(ocf::heartbeat:IPaddr2): Started node1-pcs Anybody see what I'm doing wrong? I'm not seeing anything in the logs to indicate that it tries node2 and then fails; but I'm fairly new to the software so it's possible I'm not looking in the right place. The pcs status would show any failed actions, and anything important in the logs would start with "error:" or "warning:". At any given time, one of the nodes is the DC, meaning it schedules actions for the whole cluster. That node will have more "pengine:" messages in its logs at the time. You can check those logs to see what decisions were made, as well as a "saving inputs" message to get the cluster state that was used to make those decisions. There is a crm_simulate tool that you can run on that file to get more information. By default, pacemaker will try to balance the number of resources running on each node, so I'm not sure why in this case node1 has four resources and node2 has two. crm_simulate might help explain it. However, there's nothing here telling pacemaker that the instances of PrivateIP should run on different nodes when possible. With your existing constraints, pacema