I have since opened ports 6700, 6800, and 6900 in the firewall of each node/VM, 
but still no apparent communications between SC-1 and SC-2.

From: Jeremy Matthews
Sent: Monday, October 10, 2016 12:38 PM
To: 'Neelakanta Reddy' <[email protected]>; 
[email protected]
Subject: RE: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node

Hello again,

Sorry for the delay. Yes, there does not appear to be communications between at 
least SC-1 and SC-2. On SC-1, I did a tcpdump host <SC-2’s IP address>, and 
nothing appeared.

I did check whether the firewall was preventing this, but I have since opened 
ports 20 to 23 on each node, and restarted opensafd.service on each node. I 
have set
“export MDS_TRANSPORT=TCP” in nid.conf on each node. However, I still have the 
same result. The opensaf processes started on SC-1 and SC-2, but failed on PL-3.
Should there be at least an OpenSAF heartbeat between SC-1 and SC-2?

Just to list the steps in which I set up this cluster, this is what I did:

1.       On SC-1, installed as a controller:

a.       cd /usr/share/opensaf/immxml

b.       ./immxml-clustersize -s 2 -p 1

c.       I edited the third column in nodes.cfg to the actual hostnames of the 
nodes (VMs):
SC SC-1 linux-h8o1.site
SC SC-2 linux-vzbw.site
PL PL-3 linux-9qkx.site

d.       ./immxml-configure               // this created imm.xml.20161006_0900

e.       cp imm.xml.20161006_0900 /etc/opensaf/imm.xml

f.        In /etc/opensaf/dtmd.conf, changed DTM_NODE_IP to SC-1’s IP address.

g.       Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

2.       On SC-2,installed as a controller:

a.       Transferred imm.xml from SC-1 to /etc/opensaf on SC-2.

b.       Changed DTM_NODE_IP to SC-2’s IP address.

c.       Changed slot_id to 2.

d.       Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

3.       On PL-3, installed as a payload:

a.       Transferred imm.xml from SC-1 to /etc/opensaf on PL-3. I don’t think 
that I needed to do this and have since removed imm.xml from PL-3.

b.       Changed DTM_NODE_IP to PL-3’s IP address.

c.       Changed slot_id to 3.

d.       Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

4.       Beginning with SC-1, then SC-2, and lastly PL-3, I entered “systemctl 
start opensafd.service”. Again it started on the controllers but not the 
payload.

Have I missed anything in this setup?

Thanks,

Jeremy

From: Neelakanta Reddy [mailto:[email protected]]
Sent: Friday, October 7, 2016 8:07 AM
To: Jeremy Matthews 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi,

There seems to be TANSPORT, problem between SC-1 and SC-2, PL-3.
Both SC-2 and PL-3 did not join SC-1.

Please, check the TRANSPORT(TCP/TIPC) is working correctly between the nodes.

Thanks,
Neel.
On 2016/10/07 11:36 AM, Jeremy Matthews wrote:

Attached. For SC-1 and PL-3, they include /var/log/messages and the 
/var/log/opensaf contents.



For SC-2, I accidentally wrote over /var/log/messages. It’s just got the 
/var/log/opensaf contents.



Thank you,



Jeremy



From: Neelakanta Reddy [mailto:[email protected]]

Sent: Thursday, October 6, 2016 9:51 PM

To: Jeremy Matthews 
<[email protected]><mailto:[email protected]>; 
[email protected]<mailto:[email protected]>

Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi ,



Share the syslog of all the nodes(SC-1, SC-2, PL-3).



/Neel.



On 2016/10/06 09:04 PM, Jeremy Matthews wrote:

Hi,



I've seen this issue for a payload node in another post which was attributed to 
a configuration error which was resolved by a reboot (?).

I have rebooted my payload node, just in case, but to no effect.



The logs in /var/log/messages when issuing the "systemctl start 
opensafd.service" command:



Oct 6 09:38:35 linux-9qkx opensafd: Starting OpenSAF Services

Oct 6 09:38:35 linux-9qkx osafdtmd[2987]: Started

Oct 6 09:38:35 linux-9qkx osafimmnd[2999]: Started

Oct 6 09:40:05 linux-9qkx systemd[1]: opensafd.service operation timed out. 
Terminating.

Oct 6 09:40:05 linux-9qkx osafimmnd[2999]: MDTM:socket_recv() = 0, conn lost 
with dh server, exiting library err :Success

Oct 6 09:40:05 linux-9qkx systemd[1]: Unit opensafd.service entered failed 
state.



I had enabled the tracing in immnd.conf which caused these in 
/var/log/opensaf/osafimmnd:



Oct 6 9:38:35.142143 osafimmnd [2999:immnd_main.c:0113] >> immnd_initialize

Oct 6 9:38:35.142188 osafimmnd [2999:osaf_secutil.c:0193] >> 
osaf_auth_server_create

Oct 6 9:38:35.142260 osafimmnd [2999:osaf_secutil.c:0215] << 
osaf_auth_server_create

Oct 6 9:38:35.142270 osafimmnd [2999:ncs_main_pub.c:0223] TR

NCS:PROCESS_ID=2999

Oct 6 9:38:35.142273 osafimmnd [2999:sysf_def.c:0090] TR INITIALIZING LEAP 
ENVIRONMENT

Oct 6 9:38:35.142962 osafimmnd [2999:sysf_def.c:0123] TR DONE INITIALIZING LEAP 
ENVIRONMENT

Oct 6 9:38:35.143088 osafimmnd [2999:ncs_main_pub.c:0755] TR 
NCS:NODE_ID=0x0002030F

Oct 6 9:38:35.143309 osafimmnd [2999:mbcsv_dl_api.c:0059] >> mbcsv_lib_req

Oct 6 9:38:35.143318 osafimmnd [2999:mbcsv_dl_api.c:0096] >> mbcsv_lib_init

Oct 6 9:38:35.143322 osafimmnd [2999:mbcsv_mbx.c:0166] >> 
mbcsv_initialize_mbx_list

Oct 6 9:38:35.143324 osafimmnd [2999:mbcsv_mbx.c:0180] << 
mbcsv_initialize_mbx_list

Oct 6 9:38:35.143328 osafimmnd [2999:mbcsv_pwe_anc.c:0162] >> 
mbcsv_initialize_peer_list

Oct 6 9:38:35.143331 osafimmnd [2999:mbcsv_pwe_anc.c:0176] << 
mbcsv_initialize_peer_list

Oct 6 9:38:35.143332 osafimmnd [2999:mbcsv_dl_api.c:0075] << mbcsv_lib_req

Oct 6 9:38:35.143334 osafimmnd [2999:ncs_main_pub.c:0393] TR

MBCSV:MBCA:ON

Oct 6 9:38:35.143342 osafimmnd [2999:immnd_main.c:0187] T2 Dir:/etc/opensaf 
File:imm.xml<File://imm.xml> ExpectedNodes:3 WaitSecs:3

Oct 6 9:38:35.143352 osafimmnd [2999:immnd_mds.c:0127] >> immnd_mds_register

Oct 6 9:38:35.143457 osafimmnd [2999:immnd_mds.c:0192] T2 cb->node_id:2030f

Oct 6 9:38:35.143461 osafimmnd [2999:immnd_mds.c:0194] << immnd_mds_register

Oct 6 9:38:35.143469 osafimmnd [2999:immnd_main.c:0238] << immnd_initialize

Oct 6 9:38:35.143478 osafimmnd [2999:osaf_secutil.c:0166] >> auth_server_main

Oct 6 9:38:35.244792 osafimmnd [2999:ImmModel.cc:3381] << protocol43Allowed

Oct 6 9:38:35.244836 osafimmnd [2999:immnd_proc.c:1626] T5 tmout:100 ste:1 ME:0 
RE:0 crd:0 rim:FROM_FILE 4.3A:0 2Pbe:0 VetA/B: 0/0 othsc:0/0

Oct 6 9:38:35.244847 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

Oct 6 9:38:35.344974 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

Oct 6 9:38:35.445934 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

Oct 6 9:38:35.546974 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

.

.

.

Oct 6 9:40:04.794307 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

Oct 6 9:40:04.895424 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

Oct 6 9:40:04.996499 osafimmnd [2999:immnd_proc.c:0413] TR Possibly extended 
intro from this IMMND pbeEnabled: 2 dirsize:0

Oct 6 9:40:05.081315 osafimmnd [2999:mds_dt_trans.c:0671] >> 
mdtm_process_poll_recv_data_tcp



The start of opensafd.service eventually timed out and failed. It appears the 
function immnd_introduceMe in immnd_proc.c continually

fails. If the problem is due to pbe, I don't understand why that would happen 
on a payload node. I thought pbe was just on system

controller nodes.



This is a 3 node cluster with SC-1, SC-2, and PL-3. The controller nodes (SC-1, 
SC-2) start up okay, but not the payload node (PL-3).

These nodes are running on openSUSE 12.1 VirtualBox VMs.



I have no application interacting with openSAF, just openSAF itself installed.



Any assistance on this would be appreciated. Thanks in advance!





Jeremy Matthews





------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________

Opensaf-users mailing list

[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>

https://lists.sourceforge.net/lists/listinfo/opensaf-users



------------------------------------------------------------------------------

Check out the vibrant tech community on one of the world's most

engaging tech sites, SlashDot.org! http://sdm.link/slashdot



_______________________________________________

Opensaf-users mailing list

[email protected]<mailto:[email protected]>

https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to