Hi,

I suspect that link toggling is happening and active IMMD detected the 
MDS down, and broad casted  the PL-4 node down.

To confirm link loss, more logs are required at active controller:
1. IMMD traces
2. IMMND traces


/Neel.

On Monday 04 May 2015 07:44 PM, Yao Cheng LIANG wrote:
> It is here:
> https://www.dropbox.com/s/5ogpxqyv71ufhbw/messages.zip?dl=0
>
> Sent from Windows Mail
>
> *From:* 'Neelakanta Reddy' <mailto:[email protected]>
> *Sent:* ‎Monday‎, ‎May‎ ‎4‎, ‎2015 ‎10‎:‎02‎ ‎PM
> *To:* Yao Cheng LIANG <mailto:[email protected]>, 
> [email protected] 
> <mailto:[email protected]>
>
> Hi,
>
> share , the syslog of active controller.
>
> /Neel.
>
> On Monday 04 May 2015 07:23 PM, Yao Cheng LIANG wrote:
>
>     This is strange. We tried a few time and it I impossible that
>     there is link loss or link toggling happened at the same time we
>     start opensafd on the payload. This payload is running MIPS with
>     Busybox. If I run payload on another x86 there is no such issue.
>     That payload connects to controller with the same switch as the
>     mips one.
>
>     Thanks.
>
>     Ted
>
>
>
>     Sent from Windows Mail
>
>     *From:* 'Neelakanta Reddy' <mailto:[email protected]>
>     *Sent:* ‎Monday‎, ‎May‎ ‎4‎, ‎2015 ‎9‎:‎31‎ ‎PM
>     *To:* Yao Cheng LIANG <mailto:[email protected]>,
>     [email protected]
>     <mailto:[email protected]>
>
>     Hi,
>
>     comments in-line.
>
>     /Neel.
>     On Monday 04 May 2015 05:33 PM, Yao Cheng LIANG wrote:
>
>         The error was cause by line below as msq service register with
>         immnd. Please see this line in immnd log
>
>         Jan  1  8:06:49.884694 osafimmnd [936:immnd_evt.c:0729] WA
>         immnd_evt_proc_imm_init: PID 0 (1012) for 2040f000003f4, MDS
>         problem?
>
>
>     This happens, when the own node receives node down. when there is
>     link loss or link toggling happened, SC-1 or active controller
>     detected link loss and sends immnd down message, by the time time
>     message is sent the link has established again and the PL-3
>     received the message.
>
>     Verify, the link loss messages in active controller syslog messages.
>
>         *From:*Neelakanta Reddy [mailto:[email protected]]
>         *Sent:* Monday, May 04, 2015 6:07 PM
>         *To:* Yao Cheng LIANG; [email protected]
>         *Subject:* Re: [users] Fw: log
>
>         Hi,
>
>         Please,share the syslog and osafimmnd traces available at
>         /var/log/opensaf on all the nodes.
>
>         To enable immnd traces uncomment the below line in all the
>         nodes of the cluster at /etc/opensaf/immnd.conf:
>
>         # Uncomment the next line to enable trace
>         args="--tracemask=0xffffffff"
>
>         /Neel.
>
>         On Monday 04 May 2015 03:20 PM, Yao Cheng LIANG wrote:
>
>             Nope. All nodes upgraded to 4.6. /Ted
>
>             Sent from Samsung Mobile
>
>
>
>             -------- Original message --------
>             From: Neelakanta Reddy
>             Date:2015/05/04 5:03 PM (GMT+08:00)
>             To: [email protected]
>             <mailto:[email protected]>
>             Subject: Re: [users] Fw: log
>
>             Hi,
>
>             Two controllers are upgraded to  4.6 and payloads are
>             still have 4.2 .
>             In general upgrade the nodes must be rolling upgrade.
>             If some nodes are upgraded manually, and some nodes are
>             still in older
>             releases then IMM flags need to be toggled accordingly.
>
>             comments below.
>
>             /Neel.
>
>             On Monday 04 May 2015 04:55 AM, Yao Cheng LIANG wrote:
>             > Dear all,
>             >
>             > I recently upgraded my opensaf from 4.2.2 to u4.6.0 for
>             checkpoint service performance improvement. I have
>             successfully started on both controllers but I can not do
>             same thing on the payload. From the log below from
>             playload node, seems Opensaf started successfully but
>             later on shut itself down for some errors. I am using
>             imm.xml for my 4.2.2 version. May anyone help?
>             >
>             > Thanks.
>             >
>             > Ted
>             >
>             > Jan  1 08:08:11 (none) user.notice opensafd: Starting
>             OpenSAF Services (Using TCP)
>             > Jan  1 08:08:11 (none) local0.notice osafdtmd[914]: Started
>             > Jan  1 08:08:11 (none) local0.notice osafimmnd[931]: Started
>             > Jan  1 08:08:11 (none) local0.notice osafdtmd[914]: NO
>             Established contact with 'WR20-64_32'
>             > Jan  1 08:08:11 (none) local0.notice osafdtmd[914]: NO
>             Established contact with 'WR20-64_25'
>             > Jan  1 08:08:11 (none) local0.notice osafimmnd[931]: NO
>             SERVER STATE: IMM_SERVER_ANONYMOUS -->
>             IMM_SERVER_CLUSTER_WAITING
>             > Jan  1 08:08:11 (none) local0.notice osafimmnd[931]: NO
>             SERVER STATE: IMM_SERVER_CLUSTER_WAITING -->
>             IMM_SERVER_LOADING_PENDING
>             > Jan  1 08:08:11 (none) local0.notice osafimmnd[931]: NO
>             SERVER STATE: IMM_SERVER_LOADING_PENDING -->
>             IMM_SERVER_SYNC_PENDING
>             > Jan  1 08:08:11 (none) local0.notice osafimmnd[931]: NO
>             NODE STATE-> IMM_NODE_ISOLATED
>             > Jan  1 08:08:12 (none) local0.notice osafimmnd[931]: NO
>             NODE STATE-> IMM_NODE_W_AVAILABLE
>             > Jan  1 08:08:12 (none) local0.notice osafimmnd[931]: NO
>             SERVER STATE: IMM_SERVER_SYNC_PENDING -->
>             IMM_SERVER_SYNC_CLIENT
>             > Jan  1 08:08:12 (none) local0.notice osafimmnd[931]: NO
>             NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2578
>             > Jan  1 08:08:12 (none) local0.notice osafimmnd[931]: NO
>             RepositoryInitModeT is SA_IMM_INIT_FROM_FILE
>             > Jan  1 08:08:12 (none) local0.warn osafimmnd[931]: WA
>             IMM Access Control mode is DISABLED!
>             > Jan  1 08:08:12 (none) local0.notice osafimmnd[931]: NO
>             Epoch set to 18 in ImmModel
>             > Jan  1 08:08:12 (none) local0.notice osafimmnd[931]: NO
>             SERVER STATE: IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
>             > Jan  1 08:08:12 (none) local0.notice osafclmna[943]: Started
>             > Jan  1 08:08:12 (none) local0.notice osafclmna[943]: NO
>             safNode=PL-4,safCluster=myClmCluster Joined cluster,
>             nodeid=a040f
>             > Jan  1 08:08:13 (none) local0.notice osafamfnd[953]: Started
>             > Jan  1 08:08:13 (none) local0.notice osafamfnd[953]: NO
>             'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State
>             UNINSTANTIATED => INSTANTIATING
>             > Jan  1 08:08:13 (none) local0.notice osafsmfnd[964]: Started
>             > Jan  1 08:08:13 (none) local0.notice osafmsgnd[974]: Started
>             > Jan  1 08:08:13 (none) local0.notice osafimmnd[931]: NO
>             Implementer connected: 38 (MsgQueueService656399) <51, a040f>
>             IMMND asserted and restarted again, this is because of
>             some information
>             added in 4.6, which may not be compatible with older releases.
>             since the cluster is mixed version.
>
>             Please go through osaf/services/saf/immsv/README(
>             particularly Notes on
>             upgrading from OpenSAF 4.[1,2,3,4,5] to OpenSAF (4.6)).
>
>             once the cluster are upgraded, the flags mentioned needs
>             to be toggled on.
>
>
>             > Jan  1 08:08:13 (none) local0.notice osafimmnd[986]: Started
>             > Jan  1 08:08:13 (none) local0.notice osafimmnd[986]: NO
>             Fevs count adjusted to 5871 preLoadPid: 0
>             > Jan  1 08:08:13 (none) local0.notice osaflcknd[997]: Started
>             > Jan  1 08:08:13 (none) local0.notice osafckptnd[1007]:
>             Started
>             > Jan  1 08:08:13 (none) local0.notice osafimmnd[986]: NO
>             SERVER STATE: IMM_SERVER_ANONYMOUS -->
>             IMM_SERVER_CLUSTER_WAITING
>             > Jan  1 08:08:13 (none) local0.notice osafamfwd[1018]:
>             Started
>             > Jan  1 08:08:13 (none) local0.notice osafamfnd[953]: NO
>             'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State
>             INSTANTIATING => INSTANTIATED
>             > Jan  1 08:08:13 (none) local0.notice osafamfnd[953]: NO
>             Assigning 'safSi=NoRed10,safApp=OpenSAF' ACTIVE to
>             'safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
>             > Jan  1 08:08:13 (none) local0.notice osafamfnd[953]: NO
>             Assigned 'safSi=NoRed10,safApp=OpenSAF' ACTIVE to
>             'safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
>             > Jan  1 08:08:13 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:13 (none) user.notice opensafd:
>             OpenSAF(4.6.0 - 6467:3561f9d06464) services successfully
>             started
>             > Jan  1 08:08:13 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:14 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:14 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:15 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:15 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:15 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:16 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:16 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:17 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:17 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:17 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:18 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 5
>             > Jan  1 08:08:19 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:19 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:20 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:20 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:20 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:21 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:21 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:22 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:22 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:22 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:23 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 10
>             > Jan  1 08:08:23 (none) local0.warn osafimmnd[986]: WA
>             Resending introduce-me - problems with MDS ? 10
>             > Jan  1 08:08:23 (none) local0.notice osafimmnd[986]: NO
>             mds_register_callback: dest a040f000003b9 already exist
>             > Jan  1 08:08:23 (none) local0.err osafamfnd[953]:
>             saImmOmInitialize FAILED, rc = 6
>             > Jan  1 08:08:23 (none) local0.alert osafimmnd[986]: AL
>             AMF Node Director is down, terminate this process
>             > Jan  1 08:08:23 (none) local0.alert osaflcknd[997]: AL
>             AMF Node Director is down, terminate this process
>             > Jan  1 08:08:23 (none) local0.crit osafamfwd[1018]:
>             Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped,
>             Reason: AMF unexpectedly crashed, OwnNodeId = 656399,
>             SupervisionTime = 60
>             > Jan  1 08:08:23 (none) local0.notice osaflcknd[997]:
>             exiting for shutdown
>             > Jan  1 08:08:23 (none) local0.alert osafsmfnd[964]: AL
>             AMF Node Director is down, terminate this process
>             > Jan  1 08:08:23 (none) local0.alert osafckptnd[1007]: AL
>             AMF Node Director is down, terminate this process
>             > Jan  1 08:08:23 (none) local0.notice osafsmfnd[964]:
>             exiting for shutdown
>             > Jan  1 08:08:23 (none) local0.alert osafmsgnd[974]: AL
>             AMF Node Director is down, terminate this process
>             > Jan  1 08:08:23 (none) local0.notice osafckptnd[1007]:
>             exiting for shutdown
>             > Jan  1 08:08:23 (none) local0.notice osafmsgnd[974]:
>             exiting for shutdown
>             > Jan  1 08:08:23 (none) local0.notice osafimmnd[986]:
>             exiting for shutdown
>             > Jan  1 08:08:23 (none) local0.notice osafimmnd[931]: NO
>             Implementer locally disconnected. Marking it as doomed 38
>             <51, a040f> (MsgQueueService656399)
>             > Jan  1 08:08:23 (none) local0.err osafimmnd[931]: ER
>             immnd_evt_proc_discard_node for *this* node 656399 =>
>             Cluster partitioned ("split brain") - exiting
>             > Jan  1 08:08:23 (none) user.notice opensaf_reboot:
>             Rebooting local node; timeout=60
>             >
>             >
>             
> ------------------------------------------------------------------------------
>             > One dashboard for servers and applications across
>             Physical-Virtual-Cloud
>             > Widest out-of-the-box monitoring support with 50+
>             applications
>             > Performance metrics, stats and reports that give you
>             Actionable Insights
>             > Deep dive visibility with transaction tracing using APM
>             Insight.
>             > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>             <http://ad.doubleclick.net/ddm/clk/290420510%3b117567292%3by>
>             > _______________________________________________
>             > Opensaf-users mailing list
>             > [email protected]
>             <mailto:[email protected]>
>             > https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
>
>             
> ------------------------------------------------------------------------------
>             One dashboard for servers and applications across
>             Physical-Virtual-Cloud
>             Widest out-of-the-box monitoring support with 50+ applications
>             Performance metrics, stats and reports that give you
>             Actionable Insights
>             Deep dive visibility with transaction tracing using APM
>             Insight.
>             http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>             <http://ad.doubleclick.net/ddm/clk/290420510%3b117567292%3by>
>             _______________________________________________
>             Opensaf-users mailing list
>             [email protected]
>             <mailto:[email protected]>
>             https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
>
>

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to