Hi David, You said cannot reproduce on IPv4 lab, can I understand as the patch work for IPv4? About IPv6, I try to make the V2 patch in attached file, also enable DTM trace from beginning. Could you try with V2 patch?
Best Regards, ThuanTr From: Hoyt, David <dh...@rbbn.com> Sent: Thursday, May 7, 2020 7:09 PM To: Thuan Tran <thuan.t...@dektech.com.au>; Gary Lee <gary....@dektech.com.au> Cc: Minh Hon Chau <minh.c...@dektech.com.au>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au> Subject: RE: opensaf questions Hi Thuan, So, I’m not having any luck in trying to reproduce this issue in an IPv4 lab. But in my IPv6 lab, I tried it 5 times and it happened in all 5. I’ve attached snippets of the osafdtmd log files from each node at the time SC-2 joined the cluster. Time reference: May 6 15:59:25 sc-1 osafdtmd[1473]: NO Established contact with 'SC-2' May 6 15:59:27 sc-1 osafamfd[1651]: NO Node 'SC-2' joined the cluster Just curious, if we have to set the local node IP address in the dtmd.conf file, why doesn’t opensaf use this IP? DTM_NODE_IP=fdcc:aacc:cef8:1000::55 Here’s our setup: 2 nodes: SC-1,SC-2 Running opensaf-5.19.10 Virtualization: kvm Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo) Kernel: Linux 3.10.0-1127.el7.x86_64 Architecture: x86-64 IPv6 IPs: SC1: fdcc:aacc:cef8:1000::55 SC2: fdcc:aacc:cef8:1000::56 Alias IP: fdcc:aacc:cef8:1000::2 Test case: 1. start opensaf on SC-1 2. add alias IP to eth0 3. start opensaf on SC-2 Afterwards, the following shows the TCP connection for port 6700 is with the alias IP: [sc1 ~]# ss -at | grep 6700 LISTEN 0 20 [fdcc:aacc:cef8:1000::55]:6700 [::]:* ESTAB 0 0 [fdcc:aacc:cef8:1000::2]:15462 [fdcc:aacc:cef8:1000::56]:6700 [sc1 ~]# [sc1 ~]# ip addr show eth0 | grep global inet6 fdcc:aacc:cef8:1000::2/64 scope global inet6 fdcc:aacc:cef8:1000::55/64 scope global From SC-2’s viewpoint: [sc2 ~]# ss -at | grep 6700 LISTEN 0 20 [fdcc:aacc:cef8:1000::56]:6700 [::]:* ESTAB 0 0 [fdcc:aacc:cef8:1000::56]:6700 [fdcc:aacc:cef8:1000::2]:15462 [sc2 ~]# [sc2 ~]# ip addr show eth0 | grep global inet6 fdcc:aacc:cef8:1000::56/64 scope global -David From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Sent: Tuesday, May 5, 2020 8:54 AM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, OK. At first please give a try. Then we can continue with your result. Best Regards, Thuan ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: Tuesday, May 5, 2020 7:50 PM To: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: RE: opensaf questions Thanks Thuan, I’ll give this a try once my lab comes up. Looking at the diff, this looks like it handles the IPv4 case. What are the code changes for IPv6? We need to support both and in fact, the issue appears to be worse in our IPv6 lab. Regards, David From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Sent: Tuesday, May 5, 2020 2:57 AM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, Could you please give a try with attached patch? Stand in opensaf repo then apply patch: patch -p1 < [patch full path] Best Regards, Thuan ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: Monday, May 4, 2020 9:19 PM To: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: RE: opensaf questions OK, so getting back to my original question. If the Ethernet device has both, a node IP and an alias IP, how does the active opensaf controller’s DTM know which IP address to send back in the UDP message? For example, using the following IPs that are assigned to the node’s eth0 device: SC-1 IP = 1.2.3.444 SC-2 IP = 1.2.3.777 The node with SC-1 is up and running with the active opensaf controller. The aliasIP is added to SC-1’s eth0. aliasIP = 1.2.3.99<http://1.2.3.99> Opensaf is then started on the SC-2 node. Sometimes, I see the following TCP connection, and when I do, everything works fine. This shows that SC-1’s node IP (1.2.3.444) is connected to SC-2’s port 6700 (1.2.3.777:6700). [root@sc-1: ~]# ss -at | grep 6700 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 20 1.2.3.444:6700 *:* ESTAB 0 0 1.2.3.444:22606 1.2.3.777:6700 Other times, I see the connection as below, which shows the alias IP on SC-1(1.2.3.99<http://1.2.3.99>) connected to SC-2’s port 6700. [root@sc-1: ~]# ss -at | grep 6700 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 20 1.2.3.444:6700 *:* ESTAB 0 0 1.2.3.99:15128 1.2.3.777:6700 So, is there a way to enforce SC-1’s opensaf to respond with the node IP when establishing the TCP connection with SC-2? Regards, David From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Sent: Monday, May 4, 2020 9:31 AM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, See my answers inline. Best Regards, Thuan ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: Monday, May 4, 2020 8:08 PM To: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: RE: opensaf questions Ok, so the opensaf processes on each node initially communicate via UDP. Sorry, I’m not a network person…when or how does the TCP connection get established. [Thuan] After receiving UDP, a TCP connection is established. That is, how does the newly started opensaf process know the IP address of the active opensaf controller? [Thuan] UDP contain node IP of sender. Is this something that’s done at the network layer? [Thuan] No, it is done at DTM service. Say, when the newly started opensaf process receives the UDP response from the active controller, is the IP address of the active controller now known? [Thuan] No, when a node receive UDP message, it will make a TCP connection base on info in the message. Regards, David From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Sent: Monday, May 4, 2020 8:51 AM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, I just look through the source code. The node will initially broadcast a UDP message, or in loop if you config DTM_CONTINUOUS_BCAST_INT. This message will be received by DTM discovery thread on other nodes. Then a connection (communication) is setup later base on that. Best Regards, Thuan ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: Monday, May 4, 2020 6:42 PM To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>; Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions Hi Thuan, Thank you for the response. I'm trying to understand how the TCP connection between the 2 opensaf controllers is established. Do they initially communicate via UDP? That is, when opensaf starts, it broadcasts a message. Is this broadcast a UDP message? Thanks, David Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Sent: Monday, May 4, 2020 5:30:17 AM To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>; Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, I am Thuan, working as Opensaf maintainer. In my understanding, you are facing to split-brain due to lost connection b/w 2 controllers. You said you configure alias IP to eth0, then during si-swap delete it on old active and add on new active. Is this "delete" bring eth0 down for a while? If so, I guess that's why connection lost b/w 2 controllers. Can you add another ethernet then use it for your application purpose? Best Regards, Thuan ________________________________ From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Sent: Monday, May 4, 2020 3:35 PM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Subject: Re: opensaf questions Hi David It looks like the users mailing list is broken. I just tried a test message and it disappeared .... will try to fix. I"ll get one of my colleagues who's maintaining dtm to reply to you. /Gary ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: 01 May 2020 22:53 To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Subject: opensaf questions Hi Gary, Sorry to contact you directly. I sent an email to Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> last week but never received a response or saw it posted. Are messages to this email address still being monitored? Another question: With the tipc option disabled, as an osaf controller starts, it broadcasts a message, looking to see if it can become active. If there's an active controller already running in the same cluster, it will respond. Are the initial messages between the 2 controller processes done via UDP (outgoing port 6800, incoming port 6900)? When or how is the TCP connection between the two established? Thanks, David From: Hoyt, David Sent: Friday, April 24, 2020 3:10 PM To: Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> Subject: alias IP causing issues Hi all, I have a 2-node system (SC-1, SC-2) that requires an alias IP (points to the active application that’s also runs on these nodes). If SC-1 is up and running, as well as the application, the alias IP has been added to eth0. When an application si-swap is performed, the application deletes the alias IP on SC-1. Likewise, when it goes active on SC-2, it will add the alias IP to SC-2’s device (eth0). The issue I have, is when SC-2 comes up, I found that the connection between SC-2’s TCP port 6700 is to SC-1’s alias IP. So, when the si-swap is performed, the alias IP gets deleted as part of the application being stopped on SC-1. As a result, the connection between the 2 nodes is gone: osafdtmd generates a log stating a loss of connection. End result is split-brain. Now, if SC-2 comes up without the application enabled on SC-1, the alias IP has yet to be added, so the connection between the nodes’ port 6700 is actually with the node’s IP address. I can enforce this on the initial setup, but if one of the opensaf processes on a nodes goes for a restart or if a node reboots, since the alias IP exists, upon coming up, the newly started opensaf will have a connection to the alias IP. Long story short, is there anything within the DTM area that could help me in preventing opensaf sending outbound messages over the alias IP instead of the node IP? Setup: 2 nodes: SC-1,SC-2 Running opensaf-5.19.10 Virtualization: kvm Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo) Kernel: Linux 3.10.0-1127.el7.x86_64 Architecture: x86-64 Regards, David ________________________________ Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. ________________________________ ________________________________ From: Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Sent: Monday, May 4, 2020 5:30:17 AM To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>; Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Cc: Minh Hon Chau <minh.c...@dektech.com.au<mailto:minh.c...@dektech.com.au>>; Thang Duc Nguyen <thang.d.ngu...@dektech.com.au<mailto:thang.d.ngu...@dektech.com.au>> Subject: Re: opensaf questions ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, I am Thuan, working as Opensaf maintainer. In my understanding, you are facing to split-brain due to lost connection b/w 2 controllers. You said you configure alias IP to eth0, then during si-swap delete it on old active and add on new active. Is this "delete" bring eth0 down for a while? If so, I guess that's why connection lost b/w 2 controllers. Can you add another ethernet then use it for your application purpose? Best Regards, Thuan ________________________________ From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Sent: Monday, May 4, 2020 3:35 PM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; Thuan Tran <thuan.t...@dektech.com.au<mailto:thuan.t...@dektech.com.au>> Subject: Re: opensaf questions Hi David It looks like the users mailing list is broken. I just tried a test message and it disappeared .... will try to fix. I"ll get one of my colleagues who's maintaining dtm to reply to you. /Gary ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: 01 May 2020 22:53 To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Subject: opensaf questions Hi Gary, Sorry to contact you directly. I sent an email to Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> last week but never received a response or saw it posted. Are messages to this email address still being monitored? Another question: With the tipc option disabled, as an osaf controller starts, it broadcasts a message, looking to see if it can become active. If there's an active controller already running in the same cluster, it will respond. Are the initial messages between the 2 controller processes done via UDP (outgoing port 6800, incoming port 6900)? When or how is the TCP connection between the two established? Thanks, David From: Hoyt, David Sent: Friday, April 24, 2020 3:10 PM To: Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> Subject: alias IP causing issues Hi all, I have a 2-node system (SC-1, SC-2) that requires an alias IP (points to the active application that’s also runs on these nodes). If SC-1 is up and running, as well as the application, the alias IP has been added to eth0. When an application si-swap is performed, the application deletes the alias IP on SC-1. Likewise, when it goes active on SC-2, it will add the alias IP to SC-2’s device (eth0). The issue I have, is when SC-2 comes up, I found that the connection between SC-2’s TCP port 6700 is to SC-1’s alias IP. So, when the si-swap is performed, the alias IP gets deleted as part of the application being stopped on SC-1. As a result, the connection between the 2 nodes is gone: osafdtmd generates a log stating a loss of connection. End result is split-brain. Now, if SC-2 comes up without the application enabled on SC-1, the alias IP has yet to be added, so the connection between the nodes’ port 6700 is actually with the node’s IP address. I can enforce this on the initial setup, but if one of the opensaf processes on a nodes goes for a restart or if a node reboots, since the alias IP exists, upon coming up, the newly started opensaf will have a connection to the alias IP. Long story short, is there anything within the DTM area that could help me in preventing opensaf sending outbound messages over the alias IP instead of the node IP? Setup: 2 nodes: SC-1,SC-2 Running opensaf-5.19.10 Virtualization: kvm Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo) Kernel: Linux 3.10.0-1127.el7.x86_64 Architecture: x86-64 Regards, David ________________________________ Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. ________________________________
bind_before_connect_v2.diff
Description: bind_before_connect_v2.diff
_______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users