Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Ken Gaillot
On Tue, 2018-11-13 at 18:41 +0100, Valentin Vidic wrote:
> On Tue, Nov 13, 2018 at 11:01:46AM -0600, Ken Gaillot wrote:
> > Clone instances have a default stickiness of 1 (instead of the
> > usual 0)
> > so that they aren't needlessly shuffled around nodes every
> > transition.
> > You can temporarily set an explicit stickiness of 0 to let them
> > rebalance, then unset it to go back to the default.
> 
> Thanks, this works as expected now:
> 
>   clone cip-clone cip \
> meta clone-max=2 clone-node-max=2 globally-unique=true
> interleave=true \
>  resource-stickiness=0 target-role=Started
> 
> Clone instance moves when a node is down but also returns when the
> node
> is back online.
> 
> Do you perhaps know if CLUSTERIP has any special network requirements
> to
> work properly?

Yes, the switch must support multicast MAC (which is different from
multicast IP). Sometimes this is an option that must be turned on.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic
On Tue, Nov 13, 2018 at 11:01:46AM -0600, Ken Gaillot wrote:
> Clone instances have a default stickiness of 1 (instead of the usual 0)
> so that they aren't needlessly shuffled around nodes every transition.
> You can temporarily set an explicit stickiness of 0 to let them
> rebalance, then unset it to go back to the default.

Thanks, this works as expected now:

  clone cip-clone cip \
meta clone-max=2 clone-node-max=2 globally-unique=true interleave=true \
 resource-stickiness=0 target-role=Started

Clone instance moves when a node is down but also returns when the node
is back online.

Do you perhaps know if CLUSTERIP has any special network requirements to
work properly?

-- 
Valentin
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Ken Gaillot
On Tue, 2018-11-13 at 17:27 +0100, Valentin Vidic wrote:
> On Tue, Nov 13, 2018 at 05:04:19PM +0100, Valentin Vidic wrote:
> > Also it seems to require multicast, so better check for that too :)
> 
> And while the CLUSTERIP resource seems to work for me in a test
> cluster, the following clone definition:
> 
>   clone cip-clone cip \
> meta clone-max=2 clone-node-max=2 globally-unique=true
> interleave=true target-role=Started
> 
> allows for both clone instances to end up on the same node:
> 
>  Clone Set: cip-clone [cip] (unique)
>  cip:0  (ocf::heartbeat:IPaddr2):   Started sid2
>  cip:1  (ocf::heartbeat:IPaddr2):   Started sid2
> 
> Is there a way to spread the resources other than setting
> clone-node-max=1 for a while?

Clone instances have a default stickiness of 1 (instead of the usual 0)
so that they aren't needlessly shuffled around nodes every transition.
You can temporarily set an explicit stickiness of 0 to let them
rebalance, then unset it to go back to the default.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic
On Tue, Nov 13, 2018 at 05:04:19PM +0100, Valentin Vidic wrote:
> Also it seems to require multicast, so better check for that too :)

And while the CLUSTERIP resource seems to work for me in a test
cluster, the following clone definition:

  clone cip-clone cip \
meta clone-max=2 clone-node-max=2 globally-unique=true interleave=true 
target-role=Started

allows for both clone instances to end up on the same node:

 Clone Set: cip-clone [cip] (unique)
 cip:0  (ocf::heartbeat:IPaddr2):   Started sid2
 cip:1  (ocf::heartbeat:IPaddr2):   Started sid2

Is there a way to spread the resources other than setting
clone-node-max=1 for a while?

-- 
Valentin
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic
On Tue, Nov 13, 2018 at 04:06:34PM +0100, Valentin Vidic wrote:
> Could be some kind of ARP inspection going on in the networking equipment,
> so check switch logs if you have access to that.

Also it seems to require multicast, so better check for that too :)

-- 
Valentin
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic
On Tue, Nov 13, 2018 at 09:06:56AM -0500, Daniel Ragle wrote:
> Thanks, finally getting back to this. Putting a tshark on both nodes and
> then restarting the VIP-clone resource shows the pings coming through for 12
> seconds, always on node2, then stop. I.E., before/after those 12 seconds
> nothing on either node from the server initiating the pings.

Could be some kind of ARP inspection going on in the networking equipment,
so check switch logs if you have access to that.

-- 
Valentin
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Daniel Ragle



On 10/11/2018 5:00 PM, Valentin Vidic wrote:

On Thu, Oct 11, 2018 at 01:25:52PM -0400, Daniel Ragle wrote:

For the 12 second window it *does* work in, it appears as though it works
only on one of the two servers (and always the same one). My twelve seconds
of pings runs continuously then stops; while attempts to hit the Web server
works hit or miss depending on my source port (I'm using
sourceip-sourceport). I.E., as if anything that would be handled by the
other server isn't making it through. But after the 12 seconds neither
server responds to the requests against the VIP (but they do respond fine to
their own static IPs at all times).

Could be that the switch in front of the servers does not like to see
the same MAC on two ports or something like that.


During the 12 seconds that it works I get these in the logs of the server
that *is* responding:

Oct 11 12:17:43 node2 kernel: ipt_CLUSTERIP: unknown protocol 1
Oct 11 12:17:44 node2 kernel: ipt_CLUSTERIP: unknown protocol 1
Oct 11 12:17:45 node2 kernel: ipt_CLUSTERIP: unknown protocol 1

Protocol 1 once per second should be ICMP PING so this is just CLUSTERIP
complaining that it can't calculate sourceip-sourceport for those packets
(ICMP has no source port).

So maybe try recording the traffic using tcpdump on both servers and
see if any requests are comming in at all from the network equipment.



Thanks, finally getting back to this. Putting a tshark on both nodes and 
then restarting the VIP-clone resource shows the pings coming through 
for 12 seconds, always on node2, then stop. I.E., before/after those 12 
seconds nothing on either node from the server initiating the pings.


Dan
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-10-11 Thread Valentin Vidic
On Thu, Oct 11, 2018 at 01:25:52PM -0400, Daniel Ragle wrote:
> For the 12 second window it *does* work in, it appears as though it works
> only on one of the two servers (and always the same one). My twelve seconds
> of pings runs continuously then stops; while attempts to hit the Web server
> works hit or miss depending on my source port (I'm using
> sourceip-sourceport). I.E., as if anything that would be handled by the
> other server isn't making it through. But after the 12 seconds neither
> server responds to the requests against the VIP (but they do respond fine to
> their own static IPs at all times).

Could be that the switch in front of the servers does not like to see
the same MAC on two ports or something like that.

> During the 12 seconds that it works I get these in the logs of the server
> that *is* responding:
> 
> Oct 11 12:17:43 node2 kernel: ipt_CLUSTERIP: unknown protocol 1
> Oct 11 12:17:44 node2 kernel: ipt_CLUSTERIP: unknown protocol 1
> Oct 11 12:17:45 node2 kernel: ipt_CLUSTERIP: unknown protocol 1

Protocol 1 once per second should be ICMP PING so this is just CLUSTERIP
complaining that it can't calculate sourceip-sourceport for those packets
(ICMP has no source port).

So maybe try recording the traffic using tcpdump on both servers and
see if any requests are comming in at all from the network equipment.

-- 
Valentin
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org