Hi David,

You said cannot reproduce on IPv4 lab, can I understand as the patch work for 
IPv4?
About IPv6, I try to make the V2 patch in attached file, also enable DTM trace 
from beginning.
Could you try with V2 patch?

Best Regards,
ThuanTr

From: Hoyt, David <dh...@rbbn.com>
Sent: Thursday, May 7, 2020 7:09 PM
To: Thuan Tran <thuan.t...@dektech.com.au>; Gary Lee <gary....@dektech.com.au>
Cc: Minh Hon Chau <minh.c...@dektech.com.au>; Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au>
Subject: RE: opensaf questions

Hi Thuan,

So, I’m not having any luck in trying to reproduce this issue in an IPv4 lab.
But in my IPv6 lab, I tried it 5 times and it happened in all 5.

I’ve attached snippets of the osafdtmd log files from each node at the time 
SC-2 joined the cluster.

Time reference:
May  6 15:59:25 sc-1 osafdtmd[1473]: NO Established contact with 'SC-2'
May  6 15:59:27 sc-1 osafamfd[1651]: NO Node 'SC-2' joined the cluster

Just curious, if we have to set the local node IP address in the dtmd.conf 
file, why doesn’t opensaf use this IP?
DTM_NODE_IP=fdcc:aacc:cef8:1000::55


Here’s our setup:

2 nodes: SC-1,SC-2

Running opensaf-5.19.10



Virtualization: kvm

Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo)

Kernel: Linux 3.10.0-1127.el7.x86_64

Architecture: x86-64

IPv6 IPs:
SC1: fdcc:aacc:cef8:1000::55
SC2: fdcc:aacc:cef8:1000::56
Alias IP: fdcc:aacc:cef8:1000::2

Test case:

  1.  start opensaf on SC-1
  2.  add alias IP to eth0
  3.  start opensaf on SC-2



Afterwards, the following shows the TCP connection for port 6700 is with the 
alias IP:
[sc1 ~]# ss -at | grep 6700
LISTEN  0  20  [fdcc:aacc:cef8:1000::55]:6700    [::]:*
ESTAB   0  0   [fdcc:aacc:cef8:1000::2]:15462    [fdcc:aacc:cef8:1000::56]:6700
[sc1 ~]#
[sc1 ~]# ip addr show eth0 | grep global
inet6 fdcc:aacc:cef8:1000::2/64 scope global
inet6 fdcc:aacc:cef8:1000::55/64 scope global


From SC-2’s viewpoint:
[sc2 ~]# ss -at | grep 6700
LISTEN  0  20  [fdcc:aacc:cef8:1000::56]:6700    [::]:*
ESTAB   0  0   [fdcc:aacc:cef8:1000::56]:6700    [fdcc:aacc:cef8:1000::2]:15462
[sc2 ~]#
[sc2 ~]# ip addr show eth0 | grep global
inet6 fdcc:aacc:cef8:1000::56/64 scope global


-David


From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Sent: Tuesday, May 5, 2020 8:54 AM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee 
<gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi David,

OK. At first please give a try.
Then we can continue with your result.

Best Regards,
Thuan
________________________________
From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: Tuesday, May 5, 2020 7:50 PM
To: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>; 
Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: RE: opensaf questions


Thanks Thuan,



I’ll give this a try once my lab comes up.

Looking at the diff, this looks like it handles the IPv4 case. What are the 
code changes for IPv6?

We need to support both and in fact, the issue appears to be worse in our IPv6 
lab.



Regards,

David



From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Sent: Tuesday, May 5, 2020 2:57 AM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee 
<gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi David,



Could you please give a try with attached patch?

Stand in opensaf repo then apply patch: patch -p1 < [patch full path]



Best Regards,

Thuan

________________________________

From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: Monday, May 4, 2020 9:19 PM
To: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>; 
Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: RE: opensaf questions



OK, so getting back to my original question.



If the Ethernet device has both, a node IP and an alias IP, how does the active 
opensaf controller’s DTM know which IP address to send back in the UDP message?



For example, using the following IPs that are assigned to the node’s eth0 
device:

SC-1 IP = 1.2.3.444

SC-2 IP = 1.2.3.777



The node with SC-1 is up and running with the active opensaf controller.

The aliasIP is added to SC-1’s eth0.

aliasIP = 1.2.3.99<http://1.2.3.99>



Opensaf is then started on the SC-2 node.





Sometimes, I see the following TCP connection, and when I do, everything works 
fine.

This shows that SC-1’s node IP (1.2.3.444) is connected to SC-2’s port 6700 
(1.2.3.777:6700).

[root@sc-1: ~]# ss -at | grep 6700

State      Recv-Q Send-Q Local Address:Port             Peer Address:Port

LISTEN     0      20     1.2.3.444:6700                 *:*

ESTAB      0      0      1.2.3.444:22606                1.2.3.777:6700





Other times, I see the connection as below, which shows the alias IP on 
SC-1(1.2.3.99<http://1.2.3.99>) connected to SC-2’s port 6700.

[root@sc-1: ~]# ss -at | grep 6700

State      Recv-Q Send-Q Local Address:Port             Peer Address:Port

LISTEN     0      20     1.2.3.444:6700                 *:*

ESTAB      0      0      1.2.3.99:15128                 1.2.3.777:6700



So, is there a way to enforce SC-1’s opensaf to respond with the node IP when 
establishing the TCP connection with SC-2?



Regards,

David



From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Sent: Monday, May 4, 2020 9:31 AM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee 
<gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi David,



See my answers inline.



Best Regards,

Thuan



________________________________

From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: Monday, May 4, 2020 8:08 PM
To: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>; 
Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: RE: opensaf questions



Ok, so the opensaf processes on each node initially communicate via UDP.



Sorry, I’m not a network person…when or how does the TCP connection get 
established.

[Thuan] After receiving UDP, a TCP connection is established.

That is, how does the newly started opensaf process know the IP address of the 
active opensaf controller?

[Thuan] UDP contain node IP of sender.

Is this something that’s done at the network layer?

[Thuan] No, it is done at DTM service.

Say, when the newly started opensaf process receives the UDP response from the 
active controller, is the IP address of the active controller now known?

[Thuan] No, when a node receive UDP message, it will make a TCP connection base 
on info in the message.



Regards,

David



From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Sent: Monday, May 4, 2020 8:51 AM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee 
<gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi David,



I just look through the source code.

The node will initially broadcast a UDP message, or in loop if you config 
DTM_CONTINUOUS_BCAST_INT.

This message will be received by DTM discovery thread on other nodes.

Then a connection (communication) is setup later base on that.



Best Regards,

Thuan

________________________________

From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: Monday, May 4, 2020 6:42 PM
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>; Thuan 
Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions



Hi Thuan,

Thank you for the response.



I'm trying to understand how the TCP connection between the 2 opensaf 
controllers is established.

Do they initially communicate via UDP?



That is, when opensaf starts, it broadcasts a message. Is this broadcast a UDP 
message?



Thanks,

David



Get Outlook for Android<https://aka.ms/ghei36>



________________________________

From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Sent: Monday, May 4, 2020 5:30:17 AM
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>; Hoyt, 
David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi David,



I am Thuan, working as Opensaf maintainer.



In my understanding, you are facing to split-brain due to lost connection b/w 2 
controllers.

You said you configure alias IP to eth0, then during si-swap delete it on old 
active and add on new active.

Is this "delete" bring eth0 down for a while? If so, I guess that's why 
connection lost b/w 2 controllers.

Can you add another ethernet then use it for your application purpose?



Best Regards,

Thuan



________________________________

From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Sent: Monday, May 4, 2020 3:35 PM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Thuan Tran 
<thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Subject: Re: opensaf questions



Hi David



It looks like the users mailing list is broken. I just tried a test message and 
it disappeared .... will try to fix.



I"ll get one of my colleagues who's maintaining dtm to reply to you.



/Gary

________________________________

From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: 01 May 2020 22:53
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Subject: opensaf questions



Hi Gary,



Sorry to contact you directly. I sent an email to 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> 
last week but never received a response or saw it posted.

Are messages to this email address still being monitored?



Another question:

With the tipc option disabled, as an osaf controller starts, it broadcasts a 
message, looking to see if it can become active.

If there's an active controller already running in the same cluster, it will 
respond.

Are the initial messages between the 2 controller processes done via UDP 
(outgoing port 6800, incoming port 6900)?

When or how is the TCP connection between the two established?



Thanks,

David





From: Hoyt, David
Sent: Friday, April 24, 2020 3:10 PM
To: 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>
Subject: alias IP causing issues



Hi all,



I have a 2-node system (SC-1, SC-2) that requires an alias IP (points to the 
active application that’s also runs on these nodes).

If SC-1 is up and running, as well as the application, the alias IP has been 
added to eth0.



When an application si-swap is performed, the application deletes the alias IP 
on SC-1.

Likewise, when it goes active on SC-2, it will add the alias IP to SC-2’s 
device (eth0).



The issue I have, is when SC-2 comes up, I found that the connection between 
SC-2’s TCP port 6700 is to SC-1’s alias IP.

So, when the si-swap is performed, the alias IP gets deleted as part of the 
application being stopped on SC-1.

As a result, the connection between the 2 nodes is gone: osafdtmd generates a 
log stating a loss of connection. End result is split-brain.



Now, if SC-2 comes up without the application enabled on SC-1, the alias IP has 
yet to be added, so the connection between the nodes’ port 6700 is actually 
with the node’s IP address.  I can enforce this on the initial setup, but if 
one of the opensaf processes on a nodes goes for a restart or if a node 
reboots, since the alias IP exists, upon coming up, the newly started opensaf 
will have a connection to the alias IP.



Long story short, is there anything within the DTM area that could help me in 
preventing opensaf sending outbound messages over the alias IP instead of the 
node IP?





Setup:

2 nodes: SC-1,SC-2

Running opensaf-5.19.10



Virtualization: kvm

Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo)

Kernel: Linux 3.10.0-1127.el7.x86_64

Architecture: x86-64



Regards,

David





________________________________

Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.

________________________________
________________________________

From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Sent: Monday, May 4, 2020 5:30:17 AM
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>; Hoyt, 
David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; 
Thang Duc Nguyen 
<thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>>
Subject: Re: opensaf questions



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi David,



I am Thuan, working as Opensaf maintainer.



In my understanding, you are facing to split-brain due to lost connection b/w 2 
controllers.

You said you configure alias IP to eth0, then during si-swap delete it on old 
active and add on new active.

Is this "delete" bring eth0 down for a while? If so, I guess that's why 
connection lost b/w 2 controllers.

Can you add another ethernet then use it for your application purpose?



Best Regards,

Thuan



________________________________

From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Sent: Monday, May 4, 2020 3:35 PM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Thuan Tran 
<thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>
Subject: Re: opensaf questions



Hi David



It looks like the users mailing list is broken. I just tried a test message and 
it disappeared .... will try to fix.



I"ll get one of my colleagues who's maintaining dtm to reply to you.



/Gary

________________________________

From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: 01 May 2020 22:53
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Subject: opensaf questions



Hi Gary,



Sorry to contact you directly. I sent an email to 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> 
last week but never received a response or saw it posted.

Are messages to this email address still being monitored?



Another question:

With the tipc option disabled, as an osaf controller starts, it broadcasts a 
message, looking to see if it can become active.

If there's an active controller already running in the same cluster, it will 
respond.

Are the initial messages between the 2 controller processes done via UDP 
(outgoing port 6800, incoming port 6900)?

When or how is the TCP connection between the two established?



Thanks,

David





From: Hoyt, David
Sent: Friday, April 24, 2020 3:10 PM
To: 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>
Subject: alias IP causing issues



Hi all,



I have a 2-node system (SC-1, SC-2) that requires an alias IP (points to the 
active application that’s also runs on these nodes).

If SC-1 is up and running, as well as the application, the alias IP has been 
added to eth0.



When an application si-swap is performed, the application deletes the alias IP 
on SC-1.

Likewise, when it goes active on SC-2, it will add the alias IP to SC-2’s 
device (eth0).



The issue I have, is when SC-2 comes up, I found that the connection between 
SC-2’s TCP port 6700 is to SC-1’s alias IP.

So, when the si-swap is performed, the alias IP gets deleted as part of the 
application being stopped on SC-1.

As a result, the connection between the 2 nodes is gone: osafdtmd generates a 
log stating a loss of connection. End result is split-brain.



Now, if SC-2 comes up without the application enabled on SC-1, the alias IP has 
yet to be added, so the connection between the nodes’ port 6700 is actually 
with the node’s IP address.  I can enforce this on the initial setup, but if 
one of the opensaf processes on a nodes goes for a restart or if a node 
reboots, since the alias IP exists, upon coming up, the newly started opensaf 
will have a connection to the alias IP.



Long story short, is there anything within the DTM area that could help me in 
preventing opensaf sending outbound messages over the alias IP instead of the 
node IP?





Setup:

2 nodes: SC-1,SC-2

Running opensaf-5.19.10



Virtualization: kvm

Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo)

Kernel: Linux 3.10.0-1127.el7.x86_64

Architecture: x86-64



Regards,

David





________________________________

Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.

________________________________

Attachment: bind_before_connect_v2.diff
Description: bind_before_connect_v2.diff

_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to