Thanks. For immd and immnd log, please visit here: https://www.dropbox.com/s/n0hhzsmfi7p47fc/imm-log.zip?dl=0
Sent from Windows Mail From: 'Neelakanta Reddy'<mailto:[email protected]> Sent: Monday, May 4, 2015 10:41 PM To: Yao Cheng LIANG<mailto:[email protected]>, [email protected]<mailto:[email protected]> Hi, I suspect that link toggling is happening and active IMMD detected the MDS down, and broad casted the PL-4 node down. To confirm link loss, more logs are required at active controller: 1. IMMD traces 2. IMMND traces /Neel. On Monday 04 May 2015 07:44 PM, Yao Cheng LIANG wrote: It is here: https://www.dropbox.com/s/5ogpxqyv71ufhbw/messages.zip?dl=0 Sent from Windows Mail From: 'Neelakanta Reddy'<mailto:[email protected]> Sent: Monday, May 4, 2015 10:02 PM To: Yao Cheng LIANG<mailto:[email protected]>, [email protected]<mailto:[email protected]> Hi, share , the syslog of active controller. /Neel. On Monday 04 May 2015 07:23 PM, Yao Cheng LIANG wrote: This is strange. We tried a few time and it I impossible that there is link loss or link toggling happened at the same time we start opensafd on the payload. This payload is running MIPS with Busybox. If I run payload on another x86 there is no such issue. That payload connects to controller with the same switch as the mips one. Thanks. Ted Sent from Windows Mail From: 'Neelakanta Reddy'<mailto:[email protected]> Sent: Monday, May 4, 2015 9:31 PM To: Yao Cheng LIANG<mailto:[email protected]>, [email protected]<mailto:[email protected]> Hi, comments in-line. /Neel. On Monday 04 May 2015 05:33 PM, Yao Cheng LIANG wrote: The error was cause by line below as msq service register with immnd. Please see this line in immnd log Jan 1 8:06:49.884694 osafimmnd [936:immnd_evt.c:0729] WA immnd_evt_proc_imm_init: PID 0 (1012) for 2040f000003f4, MDS problem? This happens, when the own node receives node down. when there is link loss or link toggling happened, SC-1 or active controller detected link loss and sends immnd down message, by the time time message is sent the link has established again and the PL-3 received the message. Verify, the link loss messages in active controller syslog messages. From: Neelakanta Reddy [mailto:[email protected]] Sent: Monday, May 04, 2015 6:07 PM To: Yao Cheng LIANG; [email protected]<mailto:[email protected]> Subject: Re: [users] Fw: log Hi, Please,share the syslog and osafimmnd traces available at /var/log/opensaf on all the nodes. To enable immnd traces uncomment the below line in all the nodes of the cluster at /etc/opensaf/immnd.conf: # Uncomment the next line to enable trace args="--tracemask=0xffffffff" /Neel. On Monday 04 May 2015 03:20 PM, Yao Cheng LIANG wrote: Nope. All nodes upgraded to 4.6. /Ted Sent from Samsung Mobile -------- Original message -------- From: Neelakanta Reddy Date:2015/05/04 5:03 PM (GMT+08:00) To: [email protected]<mailto:[email protected]> Subject: Re: [users] Fw: log Hi, Two controllers are upgraded to 4.6 and payloads are still have 4.2 . In general upgrade the nodes must be rolling upgrade. If some nodes are upgraded manually, and some nodes are still in older releases then IMM flags need to be toggled accordingly. comments below. /Neel. On Monday 04 May 2015 04:55 AM, Yao Cheng LIANG wrote: > Dear all, > > I recently upgraded my opensaf from 4.2.2 to u4.6.0 for checkpoint service > performance improvement. I have successfully started on both controllers but > I can not do same thing on the payload. From the log below from playload > node, seems Opensaf started successfully but later on shut itself down for > some errors. I am using imm.xml for my 4.2.2 version. May anyone help? > > Thanks. > > Ted > > Jan 1 08:08:11 (none) user.notice opensafd: Starting OpenSAF Services (Using > TCP) > Jan 1 08:08:11 (none) local0.notice osafdtmd[914]: Started > Jan 1 08:08:11 (none) local0.notice osafimmnd[931]: Started > Jan 1 08:08:11 (none) local0.notice osafdtmd[914]: NO Established contact > with 'WR20-64_32' > Jan 1 08:08:11 (none) local0.notice osafdtmd[914]: NO Established contact > with 'WR20-64_25' > Jan 1 08:08:11 (none) local0.notice osafimmnd[931]: NO SERVER STATE: > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING > Jan 1 08:08:11 (none) local0.notice osafimmnd[931]: NO SERVER STATE: > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING > Jan 1 08:08:11 (none) local0.notice osafimmnd[931]: NO SERVER STATE: > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING > Jan 1 08:08:11 (none) local0.notice osafimmnd[931]: NO NODE STATE-> > IMM_NODE_ISOLATED > Jan 1 08:08:12 (none) local0.notice osafimmnd[931]: NO NODE STATE-> > IMM_NODE_W_AVAILABLE > Jan 1 08:08:12 (none) local0.notice osafimmnd[931]: NO SERVER STATE: > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT > Jan 1 08:08:12 (none) local0.notice osafimmnd[931]: NO NODE STATE-> > IMM_NODE_FULLY_AVAILABLE 2578 > Jan 1 08:08:12 (none) local0.notice osafimmnd[931]: NO RepositoryInitModeT > is SA_IMM_INIT_FROM_FILE > Jan 1 08:08:12 (none) local0.warn osafimmnd[931]: WA IMM Access Control mode > is DISABLED! > Jan 1 08:08:12 (none) local0.notice osafimmnd[931]: NO Epoch set to 18 in > ImmModel > Jan 1 08:08:12 (none) local0.notice osafimmnd[931]: NO SERVER STATE: > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY > Jan 1 08:08:12 (none) local0.notice osafclmna[943]: Started > Jan 1 08:08:12 (none) local0.notice osafclmna[943]: NO > safNode=PL-4,safCluster=myClmCluster Joined cluster, nodeid=a040f > Jan 1 08:08:13 (none) local0.notice osafamfnd[953]: Started > Jan 1 08:08:13 (none) local0.notice osafamfnd[953]: NO > 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State UNINSTANTIATED => > INSTANTIATING > Jan 1 08:08:13 (none) local0.notice osafsmfnd[964]: Started > Jan 1 08:08:13 (none) local0.notice osafmsgnd[974]: Started > Jan 1 08:08:13 (none) local0.notice osafimmnd[931]: NO Implementer > connected: 38 (MsgQueueService656399) <51, a040f> IMMND asserted and restarted again, this is because of some information added in 4.6, which may not be compatible with older releases. since the cluster is mixed version. Please go through osaf/services/saf/immsv/README( particularly Notes on upgrading from OpenSAF 4.[1,2,3,4,5] to OpenSAF (4.6)). once the cluster are upgraded, the flags mentioned needs to be toggled on. > Jan 1 08:08:13 (none) local0.notice osafimmnd[986]: Started > Jan 1 08:08:13 (none) local0.notice osafimmnd[986]: NO Fevs count adjusted > to 5871 preLoadPid: 0 > Jan 1 08:08:13 (none) local0.notice osaflcknd[997]: Started > Jan 1 08:08:13 (none) local0.notice osafckptnd[1007]: Started > Jan 1 08:08:13 (none) local0.notice osafimmnd[986]: NO SERVER STATE: > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING > Jan 1 08:08:13 (none) local0.notice osafamfwd[1018]: Started > Jan 1 08:08:13 (none) local0.notice osafamfnd[953]: NO > 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => > INSTANTIATED > Jan 1 08:08:13 (none) local0.notice osafamfnd[953]: NO Assigning > 'safSi=NoRed10,safApp=OpenSAF' ACTIVE to > 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' > Jan 1 08:08:13 (none) local0.notice osafamfnd[953]: NO Assigned > 'safSi=NoRed10,safApp=OpenSAF' ACTIVE to > 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' > Jan 1 08:08:13 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:13 (none) user.notice opensafd: OpenSAF(4.6.0 - > 6467:3561f9d06464) services successfully started > Jan 1 08:08:13 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:14 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:14 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:15 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:15 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:15 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:16 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:16 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:17 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:17 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:17 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:18 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 5 > Jan 1 08:08:19 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:19 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:20 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:20 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:20 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:21 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:21 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:22 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:22 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:22 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:23 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 10 > Jan 1 08:08:23 (none) local0.warn osafimmnd[986]: WA Resending introduce-me > - problems with MDS ? 10 > Jan 1 08:08:23 (none) local0.notice osafimmnd[986]: NO > mds_register_callback: dest a040f000003b9 already exist > Jan 1 08:08:23 (none) local0.err osafamfnd[953]: saImmOmInitialize FAILED, > rc = 6 > Jan 1 08:08:23 (none) local0.alert osafimmnd[986]: AL AMF Node Director is > down, terminate this process > Jan 1 08:08:23 (none) local0.alert osaflcknd[997]: AL AMF Node Director is > down, terminate this process > Jan 1 08:08:23 (none) local0.crit osafamfwd[1018]: Rebooting OpenSAF NodeId > = 0 EE Name = No EE Mapped, Reason: AMF unexpectedly crashed, OwnNodeId = > 656399, SupervisionTime = 60 > Jan 1 08:08:23 (none) local0.notice osaflcknd[997]: exiting for shutdown > Jan 1 08:08:23 (none) local0.alert osafsmfnd[964]: AL AMF Node Director is > down, terminate this process > Jan 1 08:08:23 (none) local0.alert osafckptnd[1007]: AL AMF Node Director is > down, terminate this process > Jan 1 08:08:23 (none) local0.notice osafsmfnd[964]: exiting for shutdown > Jan 1 08:08:23 (none) local0.alert osafmsgnd[974]: AL AMF Node Director is > down, terminate this process > Jan 1 08:08:23 (none) local0.notice osafckptnd[1007]: exiting for shutdown > Jan 1 08:08:23 (none) local0.notice osafmsgnd[974]: exiting for shutdown > Jan 1 08:08:23 (none) local0.notice osafimmnd[986]: exiting for shutdown > Jan 1 08:08:23 (none) local0.notice osafimmnd[931]: NO Implementer locally > disconnected. Marking it as doomed 38 <51, a040f> (MsgQueueService656399) > Jan 1 08:08:23 (none) local0.err osafimmnd[931]: ER > immnd_evt_proc_discard_node for *this* node 656399 => Cluster partitioned > ("split brain") - exiting > Jan 1 08:08:23 (none) user.notice opensaf_reboot: Rebooting local node; > timeout=60 > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y<http://ad.doubleclick.net/ddm/clk/290420510%3b117567292%3by> > _______________________________________________ > Opensaf-users mailing list > [email protected]<mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y<http://ad.doubleclick.net/ddm/clk/290420510%3b117567292%3by> _______________________________________________ Opensaf-users mailing list [email protected]<mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
