Hello again, Sorry for the delay. Yes, there does not appear to be communications between at least SC-1 and SC-2. On SC-1, I did a tcpdump host <SC-2’s IP address>, and nothing appeared.
I did check whether the firewall was preventing this, but I have since opened ports 20 to 23 on each node, and restarted opensafd.service on each node. I have set “export MDS_TRANSPORT=TCP” in nid.conf on each node. However, I still have the same result. The opensaf processes started on SC-1 and SC-2, but failed on PL-3. Should there be at least an OpenSAF heartbeat between SC-1 and SC-2? Just to list the steps in which I set up this cluster, this is what I did: 1. On SC-1, installed as a controller: a. cd /usr/share/opensaf/immxml b. ./immxml-clustersize -s 2 -p 1 c. I edited the third column in nodes.cfg to the actual hostnames of the nodes (VMs): SC SC-1 linux-h8o1.site SC SC-2 linux-vzbw.site PL PL-3 linux-9qkx.site d. ./immxml-configure // this created imm.xml.20161006_0900 e. cp imm.xml.20161006_0900 /etc/opensaf/imm.xml f. In /etc/opensaf/dtmd.conf, changed DTM_NODE_IP to SC-1’s IP address. g. Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts. 2. On SC-2,installed as a controller: a. Transferred imm.xml from SC-1 to /etc/opensaf on SC-2. b. Changed DTM_NODE_IP to SC-2’s IP address. c. Changed slot_id to 2. d. Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts. 3. On PL-3, installed as a payload: a. Transferred imm.xml from SC-1 to /etc/opensaf on PL-3. I don’t think that I needed to do this and have since removed imm.xml from PL-3. b. Changed DTM_NODE_IP to PL-3’s IP address. c. Changed slot_id to 3. d. Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts. 4. Beginning with SC-1, then SC-2, and lastly PL-3, I entered “systemctl start opensafd.service”. Again it started on the controllers but not the payload. Have I missed anything in this setup? Thanks, Jeremy From: Neelakanta Reddy [mailto:[email protected]] Sent: Friday, October 7, 2016 8:07 AM To: Jeremy Matthews <[email protected]>; [email protected] Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start failure on payload node ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi, There seems to be TANSPORT, problem between SC-1 and SC-2, PL-3. Both SC-2 and PL-3 did not join SC-1. Please, check the TRANSPORT(TCP/TIPC) is working correctly between the nodes. Thanks, Neel. On 2016/10/07 11:36 AM, Jeremy Matthews wrote: Attached. For SC-1 and PL-3, they include /var/log/messages and the /var/log/opensaf contents. For SC-2, I accidentally wrote over /var/log/messages. It’s just got the /var/log/opensaf contents. Thank you, Jeremy From: Neelakanta Reddy [mailto:[email protected]] Sent: Thursday, October 6, 2016 9:51 PM To: Jeremy Matthews <[email protected]><mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start failure on payload node ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi , Share the syslog of all the nodes(SC-1, SC-2, PL-3). /Neel. On 2016/10/06 09:04 PM, Jeremy Matthews wrote: Hi, I've seen this issue for a payload node in another post which was attributed to a configuration error which was resolved by a reboot (?). I have rebooted my payload node, just in case, but to no effect. The logs in /var/log/messages when issuing the "systemctl start opensafd.service" command: Oct 6 09:38:35 linux-9qkx opensafd: Starting OpenSAF Services Oct 6 09:38:35 linux-9qkx osafdtmd[2987]: Started Oct 6 09:38:35 linux-9qkx osafimmnd[2999]: Started Oct 6 09:40:05 linux-9qkx systemd[1]: opensafd.service operation timed out. Terminating. Oct 6 09:40:05 linux-9qkx osafimmnd[2999]: MDTM:socket_recv() = 0, conn lost with dh server, exiting library err :Success Oct 6 09:40:05 linux-9qkx systemd[1]: Unit opensafd.service entered failed state. I had enabled the tracing in immnd.conf which caused these in /var/log/opensaf/osafimmnd: Oct 6 9:38:35.142143 osafimmnd [2999:immnd_main.c:0113] >> immnd_initialize Oct 6 9:38:35.142188 osafimmnd [2999:osaf_secutil.c:0193] >> osaf_auth_server_create Oct 6 9:38:35.142260 osafimmnd [2999:osaf_secutil.c:0215] << osaf_auth_server_create Oct 6 9:38:35.142270 osafimmnd [2999:ncs_main_pub.c:0223] TR NCS:PROCESS_ID=2999 Oct 6 9:38:35.142273 osafimmnd [2999:sysf_def.c:0090] TR INITIALIZING LEAP ENVIRONMENT Oct 6 9:38:35.142962 osafimmnd [2999:sysf_def.c:0123] TR DONE INITIALIZING LEAP ENVIRONMENT Oct 6 9:38:35.143088 osafimmnd [2999:ncs_main_pub.c:0755] TR NCS:NODE_ID=0x0002030F Oct 6 9:38:35.143309 osafimmnd [2999:mbcsv_dl_api.c:0059] >> mbcsv_lib_req Oct 6 9:38:35.143318 osafimmnd [2999:mbcsv_dl_api.c:0096] >> mbcsv_lib_init Oct 6 9:38:35.143322 osafimmnd [2999:mbcsv_mbx.c:0166] >> mbcsv_initialize_mbx_list Oct 6 9:38:35.143324 osafimmnd [2999:mbcsv_mbx.c:0180] << mbcsv_initialize_mbx_list Oct 6 9:38:35.143328 osafimmnd [2999:mbcsv_pwe_anc.c:0162] >> mbcsv_initialize_peer_list Oct 6 9:38:35.143331 osafimmnd [2999:mbcsv_pwe_anc.c:0176] << mbcsv_initialize_peer_list Oct 6 9:38:35.143332 osafimmnd [2999:mbcsv_dl_api.c:0075] << mbcsv_lib_req Oct 6 9:38:35.143334 osafimmnd [2999:ncs_main_pub.c:0393] TR MBCSV:MBCA:ON Oct 6 9:38:35.143342 osafimmnd [2999:immnd_main.c:0187] T2 Dir:/etc/opensaf File:imm.xml<File://imm.xml> ExpectedNodes:3 WaitSecs:3 Oct 6 9:38:35.143352 osafimmnd [2999:immnd_mds.c:0127] >> immnd_mds_register Oct 6 9:38:35.143457 osafimmnd [2999:immnd_mds.c:0192] T2 cb->node_id:2030f Oct 6 9:38:35.143461 osafimmnd [2999:immnd_mds.c:0194] << immnd_mds_register Oct 6 9:38:35.143469 osafimmnd [2999:immnd_main.c:0238] << immnd_initialize Oct 6 9:38:35.143478 osafimmnd [2999:osaf_secutil.c:0166] >> auth_server_main Oct 6 9:38:35.244792 osafimmnd [2999:ImmModel.cc:3381] << protocol43Allowed Oct 6 9:38:35.244836 osafimmnd [2999:immnd_proc.c:1626] T5 tmout:100 ste:1 ME:0 RE:0 crd:0 rim:FROM_FILE 4.3A:0 2Pbe:0 VetA/B: 0/0 othsc:0/0 Oct 6 9:38:35.244847 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 Oct 6 9:38:35.344974 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 Oct 6 9:38:35.445934 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 Oct 6 9:38:35.546974 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 . . . Oct 6 9:40:04.794307 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 Oct 6 9:40:04.895424 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 Oct 6 9:40:04.996499 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended intro from this IMMND pbeEnabled: 2 dirsize:0 Oct 6 9:40:05.081315 osafimmnd [2999:mds_dt_trans.c:0671] >> mdtm_process_poll_recv_data_tcp The start of opensafd.service eventually timed out and failed. It appears the function immnd_introduceMe in immnd_proc.c continually fails. If the problem is due to pbe, I don't understand why that would happen on a payload node. I thought pbe was just on system controller nodes. This is a 3 node cluster with SC-1, SC-2, and PL-3. The controller nodes (SC-1, SC-2) start up okay, but not the payload node (PL-3). These nodes are running on openSUSE 12.1 VirtualBox VMs. I have no application interacting with openSAF, just openSAF itself installed. Any assistance on this would be appreciated. Thanks in advance! Jeremy Matthews ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected]<mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
