Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread alian
I change my setting from clusterip_hash="sourceip-sourceport"  to
clusterip_hash="sourceip".
And try to ping.
>From one host (not a node) on the network, I get no answer.
>From another host (not a node) on the network I get:
PING 10.0.0.97 (10.0.0.97) 56(84) bytes of data.
64 bytes from 10.0.0.97: icmp_seq=1 ttl=64 time=0.310 ms
64 bytes from 10.0.0.97: icmp_seq=1 ttl=64 time=0.320 ms (DUP!)
64 bytes from 10.0.0.97: icmp_seq=2 ttl=64 time=0.184 ms
64 bytes from 10.0.0.97: icmp_seq=2 ttl=64 time=0.544 ms (DUP!)
64 bytes from 10.0.0.97: icmp_seq=3 ttl=64 time=0.144 ms
64 bytes from 10.0.0.97: icmp_seq=3 ttl=64 time=0.173 ms (DUP!)
64 bytes from 10.0.0.97: icmp_seq=4 ttl=64 time=0.169 ms
64 bytes from 10.0.0.97: icmp_seq=4 ttl=64 time=0.627 ms (DUP!)

So for me it just like one node answer when this is not is turn and
doesn't answer when it's for him. No ?


>> 3 Nodes A B C.
>> If resource on:
>> A + B => ok
>> Only A => ok
>> Only B => ok
>> Only C => ok
>> A + C => random fail
>> B + C => random fail
>> A + B + C => random fail
>
> I use same corosync / pacemaker on three host, but:
> Host A & B have same kernel, C is different:
> A & B : 3.7.10-1.45-desktop
> C: 4.1.15-8-default
> I use same version, but no same binary:
> pacemaker-1.1.13-12.2.x86_64
> corosync-2.3.5-4.2.x86_64
> On C host it's native rpm. I use the src-rpm to rebuild it for host A & B.
> I check all the sysctl settings, but see no difference ...
>
>



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread Israel Brewster

> On Dec 19, 2016, at 11:36 AM, al...@amisw.com wrote:
> 
>> Maybe I'm missing something here, and if so, my apologies, but to me it
>> looks like you are trying to put the same IP address on three different
>> machines SIMULTANEOUSLY.
> 
> Yes it what I do. But it's seem normal for me, I just follow guide like
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_clone_the_ip_address.html
>  
> 

Ah, I see. I was missing something. Specifically this:

"The IPaddr2 resource agent has built-in intelligence for when it is configured 
as a clone..."

I was unaware of that. So yes, it does seem that it should be working, given 
the way the resource agent works. Sorry about that. Disregard my earlier 
comments.

> 
> and work fine in a 2 nodes configurations. For me, this work with arp
> multicast, who give same "virtual" arp to different hosts, and work with
> iptable CLUSTERIP special rule (in very shortcut). But may be I totally
> misunderstand the stuff, but I work fine with that for the last 4 years so
> ... ?
> 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread alian
> ... For me, this work with arp multicast, who give same "virtual" arp
> to different hosts

Every hosts in the cluster get the request, and a modulo choose which one
answer. It's just how I understand this shared ip.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread alian
> Maybe I'm missing something here, and if so, my apologies, but to me it
> looks like you are trying to put the same IP address on three different
> machines SIMULTANEOUSLY.

Yes it what I do. But it's seem normal for me, I just follow guide like
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_clone_the_ip_address.html

and work fine in a 2 nodes configurations. For me, this work with arp
multicast, who give same "virtual" arp to different hosts, and work with
iptable CLUSTERIP special rule (in very shortcut). But may be I totally
misunderstand the stuff, but I work fine with that for the last 4 years so
... ?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread Israel Brewster
Maybe I'm missing something here, and if so, my apologies, but to me it looks like you are trying to put the same IP address on three different machines SIMULTANEOUSLY. This will never work from a networking standpoint - it has nothing to do with pacemaker, etc, other than that it is responsible for creating the situation (since you told it to). The three machines will constantly be arguing over who is really responding to that IP address. Depending on your network hardware and IP stack, the result may vary, but "random failures" is a good description of the behavior I would expect.A given IP address should be assigned to one, and only one, machine at any given time. Feel free to move it around to other machines at will, but it should never be active on more than one machine (on a given network segment) at any given time, or you *will* have issues. As such, a clone set is not good for use with an IPAddr resource.
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


On Dec 19, 2016, at 5:41 AM, al...@amisw.com wrote:Hi,My problem is still here. I search but don't find. I try to change networkcable to put the 3 hosts together on same switch, but same problem.So with this:primitive ip_apache_localnet ocf:heartbeat:IPaddr2 \  params ip="10.0.0.99" \  cidr_netmask="32" op monitor interval="30s"clone cl_ip_apache_localnet ip_apache_localnet \  meta globally-unique="true" clone-max="3" clone-node-max="3"3 Nodes A B C.If resource on:A + B => okOnly A => okOnly B => okOnly C => okA + C => random failB + C => random failA + B + C => random failWhen I say random fail, I do a curl http://10.0.0.99. I can see requestwith tcpdump. I can reach all the three hosts. But 1 time on 6 or 7, thecurl request hang. I see with tcpdump the request get in, but no hostanswer. I suspect host C but can't find why he don't do the job. If Ictrl-c & redo the request, I got answer.I check all firewall / log and don't see any error msg. If someone have aclue, he's very welcome !___Users mailing list: Users@clusterlabs.orghttp://lists.clusterlabs.org/mailman/listinfo/usersProject Home: http://www.clusterlabs.orgGetting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdfBugs: http://bugs.clusterlabs.org___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread alian
> 3 Nodes A B C.
> If resource on:
> A + B => ok
> Only A => ok
> Only B => ok
> Only C => ok
> A + C => random fail
> B + C => random fail
> A + B + C => random fail

I use same corosync / pacemaker on three host, but:
Host A & B have same kernel, C is different:
A & B : 3.7.10-1.45-desktop
C: 4.1.15-8-default
I use same version, but no same binary:
pacemaker-1.1.13-12.2.x86_64
corosync-2.3.5-4.2.x86_64
On C host it's native rpm. I use the src-rpm to rebuild it for host A & B.
I check all the sysctl settings, but see no difference ...



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-19 Thread alian
Hi,

My problem is still here. I search but don't find. I try to change network
cable to put the 3 hosts together on same switch, but same problem.

So with this:

primitive ip_apache_localnet ocf:heartbeat:IPaddr2 \
  params ip="10.0.0.99" \
  cidr_netmask="32" op monitor interval="30s"
clone cl_ip_apache_localnet ip_apache_localnet \
  meta globally-unique="true" clone-max="3" clone-node-max="3"

3 Nodes A B C.
If resource on:
A + B => ok
Only A => ok
Only B => ok
Only C => ok
A + C => random fail
B + C => random fail
A + B + C => random fail

When I say random fail, I do a curl http://10.0.0.99. I can see request
with tcpdump. I can reach all the three hosts. But 1 time on 6 or 7, the
curl request hang. I see with tcpdump the request get in, but no host
answer. I suspect host C but can't find why he don't do the job. If I
ctrl-c & redo the request, I got answer.

I check all firewall / log and don't see any error msg. If someone have a
clue, he's very welcome !


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-15 Thread alian
> On 12/15/2016 02:02 PM, al...@amisw.com wrote:

>> primitive ip_apache_localnet ocf:heartbeat:IPaddr2 params ip="10.0.0.99"
>> cidr_netmask="32" op monitor interval="30s"
>> clone cl_ip_apache_localnet ip_apache_localnet \
>> meta globally-unique="true" clone-max="3" clone-node-max="1"
>
>
> ^^^ Here you have clone-node-max="1", which will prevent surviving nodes
> from picking up any failed node's share of requests. clone-max and
> clone-node-max should both stay at 3, regardless of whether you are
> intentionally taking down any node.

Thank you for the tip. It doesn't explain my problem, but it help me: I
can reach all the 3 node with my curl request, but sometime, one not
respond. And as he doesn't answer I don't know who don't answer :-) But
with the clone-node-max at 3, I can play to move my resource, and see that
problem happen when the resource is on one specific node. Not the one I
want remove.

So 2 nodes works well, and one node sometime don't answer.
I know that this node isn't on the same switch that the first two, there
is another switch between, (3 switch interconnected), can the multicast
arp lost in the way ?

If not, this is a firewall / systctl difference between hosts ...



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-15 Thread Ken Gaillot
On 12/15/2016 02:02 PM, al...@amisw.com wrote:
>>
>> Seeing your configuration might help. Did you set globally-unique=true
>> and clone-node-max=3 on the clone? If not, the other nodes can't pick up
>> the lost node's share of requests.
> 
> Yes for both, I have globally-unique=true, and I change clone-node-max=3
> to clone-node-max=2, and now, as I come back to old configuration, I come
> back to clone-node-max=3
> 
> So now I have three node in the cluster.
> Here my config:
> 
> primitive ip_apache_localnet ocf:heartbeat:IPaddr2 params ip="10.0.0.99" 
> cidr_netmask="32" op monitor interval="30s"
> clone cl_ip_apache_localnet ip_apache_localnet \
> meta globally-unique="true" clone-max="3" clone-node-max="1"


^^^ Here you have clone-node-max="1", which will prevent surviving nodes
from picking up any failed node's share of requests. clone-max and
clone-node-max should both stay at 3, regardless of whether you are
intentionally taking down any node.

> target-role="Started" is-managed="true"
> 
>  sudo  /usr/sbin/iptables -L
> CLUSTERIP  all  --  anywhere 10.0.0.99  CLUSTERIP
> hashmode=sourceip-sourceport clustermac=A1:99:D6:EA:43:77 total_nodes=3
> local_node=2 hash_init=0
> 
> and check I have different local_node on each node.
> And just a question. Is the mac adress "normal" ? Doesn't need to begin
> with 01-00-5E ?

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-15 Thread alian
>
> Seeing your configuration might help. Did you set globally-unique=true
> and clone-node-max=3 on the clone? If not, the other nodes can't pick up
> the lost node's share of requests.

Yes for both, I have globally-unique=true, and I change clone-node-max=3
to clone-node-max=2, and now, as I come back to old configuration, I come
back to clone-node-max=3

So now I have three node in the cluster.
Here my config:

primitive ip_apache_localnet ocf:heartbeat:IPaddr2 params ip="10.0.0.99" 
cidr_netmask="32" op monitor interval="30s"
clone cl_ip_apache_localnet ip_apache_localnet \
meta globally-unique="true" clone-max="3" clone-node-max="1"
target-role="Started" is-managed="true"

 sudo  /usr/sbin/iptables -L
CLUSTERIP  all  --  anywhere 10.0.0.99  CLUSTERIP
hashmode=sourceip-sourceport clustermac=A1:99:D6:EA:43:77 total_nodes=3
local_node=2 hash_init=0

and check I have different local_node on each node.
And just a question. Is the mac adress "normal" ? Doesn't need to begin
with 01-00-5E ?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Random failure with clone of IPaddr2

2016-12-15 Thread Ken Gaillot
On 12/15/2016 12:37 PM, al...@amisw.com wrote:
> Hi,
> 
> I got some trouble since one week and can't find solution by myself. Any
> help will be really appreciated !
> I use corosync / pacemaker for 3 or 4 years and all works well, for
> failover or load-balancing.
> 
> I have shared ip between 3 servers, and need to remove one for upgrade.
> But after I remove the server from the cluster i got random fail to access
> to my shared ip. I think first that some packet want go to the old server.
> So I put it again in the cluster, can reach it, but random failure is
> still here :-/
> 
> My test is just a curl http://my_ip (or ssh same stuff, random failed to
> connect).
> A ping didn't loose any packet.
> I can reach each of the three servers, but sometime, the request hang, and
> got a timeout.
> I see via tcpdump the packet coming, and resend, but no one respond. How I
> can diagnostic this ?
> I think one request on five fail. But I didn't see any messages in
> firewall or /var/log/message, nothing, just like the switch choose to
> remove random packet. I didn't see any counter on network interface, check
> the iptable setting, recheck the log, recheck all firewall ... Where go
> these packets ??
> 
> I try with another new ip, and same problem append. I try ip on two
> differents subnets (10.xxx and external ip) and same stuff.
> 
> I have no problem with virtual ip in failover mode.
> 
> If someone has any clue ...

Seeing your configuration might help. Did you set globally-unique=true
and clone-node-max=3 on the clone? If not, the other nodes can't pick up
the lost node's share of requests.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org