Just an fyi, we found the issue occurs more frequently with IPv6. Regards, David
From: Mohan Kanakam <[email protected]> Sent: Tuesday, May 5, 2020 1:18 PM To: Hoyt, David <[email protected]> Cc: [email protected]; [email protected] Subject: RE: alias IP causing issues ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David, Thanks for the example. Let me give a try to reproduce it in our lab. Just as word of caution, in such scenarios, when the traffic will increase on the shared NIC because of application, then OpenSAF processes can miss their messages or timeout can occur. This could be Day 2 problems. Thanks & Regards Mohan Kanakam | +91-8333082448 Software Engineer High Availability Solutions [cid:[email protected]] www.GetHighAvailability.com<http://www.gethighavailability.com> Get High Availability Today ! NJ, USA: +1 508-422-7725 | Hyderabad, India: +91 798-992-5293 From: Hoyt, David [mailto:[email protected]] Sent: Monday, May 04, 2020 11:43 PM To: [email protected]<mailto:[email protected]> Cc: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: alias IP causing issues Hi Mohan, Thanks for your response. Maybe is I use an example. The following IPs are the fixed eth0 IP for each node: SC-1 IP = 1.2.3.444 SC-2 IP = 1.2.3.777 The node with SC-1 is up and running with the active opensaf controller. The aliasIP is then added to SC-1’s eth0. aliasIP = 1.2.3.99<http://1.2.3.99> Opensaf is started on the SC-2 node. After opensaf on SC-2 is up and running, sometimes, I see the following TCP connection. This shows that SC-1’s node IP (1.2.3.444:22606) is connected to SC-2’s port 6700 (1.2.3.777:6700). Opensaf communication between the nodes is good. [root@sc-1: ~]# ss -at | grep 6700 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 20 1.2.3.444:6700 *:* ESTAB 0 0 1.2.3.444:22606 1.2.3.777:6700 Other times, I see the connection as below, which shows the alias IP on SC-1(1.2.3.99.15128) connected to SC-2’s port 6700. In this scenario, an si-swap of the application running on the two nodes will cause opensaf to go split-brain. That’s because when the application running on SC-1 goes quiesced, it removes the aliasIP. Once that occurs, the TCP connection is lost.. [root@sc-1: ~]# ss -at | grep 6700 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 20 1.2.3.444:6700 *:* ESTAB 0 0 1.2.3.99:15128 1.2.3.777:6700 As I understand it, during the initial UDP message exchange between the two opensaf processes, the active opensaf controller will provide its IP address. This IP is then used to establish a TCP connection between the two. Problem is, sometimes it’s the alias IP address that is provided. So, is there a way to enforce opensaf to respond with the fixed IP address during that initial UDP message exchange? Regards, David ________________________________ Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. ________________________________ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
