Send again for email size limit reason. From: Jianfeng Dong Sent: Wednesday, May 24, 2017 6:08 PM To: 'Zoran Milinkovic' <[email protected]>; '[email protected]' <[email protected]> Subject: RE: osafimmnd coredump issue
Hi Zoran, Seems the issue is hard to repro, I checked the syslog and found the SC board "scm2" was doing nothing special at that time. The other SC board "scm1" was in a loop of rebooting due to its firmware fault, which has nothing to do with OpenSAF and OpenSAF was not started at all. I paste syslog here when the issue occurred: 2017-04-25T05:30:00.482306-04:00 user.info scm2 osafimmloadd: IN Synced 7032 objects in total 2017-04-25T05:30:00.482749-04:00 local0.notice scm2 osafimmnd[2793]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 18455 2017-04-25T05:30:00.489351-04:00 user.notice scm2 osafimmloadd: NO Sync ending normally 2017-04-25T05:30:01.395342-04:00 local0.notice pld0206 osafimmnd[3154]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.394642-04:00 local0.notice pld0106 osafimmnd[2996]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.395272-04:00 local0.notice cmm02b osafimmnd[5129]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.395862-04:00 local0.notice cmm02a osafimmnd[5102]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.399220-04:00 local0.notice scm2 osafimmnd[2793]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.396470-04:00 local0.notice cmm02a osafimmnd[5102]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.395932-04:00 local0.notice pld0206 osafimmnd[3154]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.396513-04:00 local0.notice pld0210 osafimmnd[4345]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.397065-04:00 local0.notice pld0210 osafimmnd[4345]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.395883-04:00 local0.notice cmm02b osafimmnd[5129]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.395214-04:00 local0.notice pld0106 osafimmnd[2996]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.396031-04:00 local0.notice pld0205 osafimmnd[3150]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.396647-04:00 local0.notice pld0205 osafimmnd[3150]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.400229-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 1100f old epoch: 146 new epoch:147 2017-04-25T05:30:01.400321-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 11a0f old epoch: 146 new epoch:147 2017-04-25T05:30:01.400380-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 1200f old epoch: 146 new epoch:147 2017-04-25T05:30:01.400435-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 11f0f old epoch: 146 new epoch:147 2017-04-25T05:30:01.400540-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 1160f old epoch: 146 new epoch:147 2017-04-25T05:30:01.400619-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 1150f old epoch: 146 new epoch:147 2017-04-25T05:30:01.428760-04:00 local0.notice pld0104 osafimmnd[7230]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2901 2017-04-25T05:30:01.428818-04:00 local0.notice pld0104 osafimmnd[7230]: NO RepositoryInitModeT is SA_IMM_INIT_FROM_FILE 2017-04-25T05:30:01.428854-04:00 local0.warning pld0104 osafimmnd[7230]: WA IMM Access Control mode is DISABLED! 2017-04-25T05:30:01.448820-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 10a0f old epoch: 0 new epoch:147 2017-04-25T05:30:01.492622-04:00 local0.notice scm2 osafimmnd[2793]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY 2017-04-25T05:30:01.429155-04:00 local0.notice pld0104 osafimmnd[7230]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.497081-04:00 local0.notice pld0104 osafimmnd[7230]: NO SERVER STATE: IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_READY 2017-04-25T05:30:01.497438-04:00 local0.notice pld0104 osafimmnd[7230]: NO ImmModel received scAbsenceAllowed 315360000 2017-04-25T05:30:01.527159-04:00 local0.notice pld0104 osafclmna[11250]: Started 2017-04-25T05:30:01.562054-04:00 local0.notice pld0104 osafamfnd[11259]: Started 2017-04-25T05:30:01.446033-04:00 local0.notice pld0110 osafimmnd[22477]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2901 2017-04-25T05:30:01.446346-04:00 local0.notice pld0110 osafimmnd[22477]: NO RepositoryInitModeT is SA_IMM_INIT_FROM_FILE 2017-04-25T05:30:01.446575-04:00 local0.warning pld0110 osafimmnd[22477]: WA IMM Access Control mode is DISABLED! 2017-04-25T05:30:01.446797-04:00 local0.notice pld0110 osafimmnd[22477]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:01.447185-04:00 local0.notice pld0110 osafimmnd[22477]: NO SERVER STATE: IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_READY 2017-04-25T05:30:01.447335-04:00 local0.notice pld0110 osafimmnd[22477]: NO ImmModel received scAbsenceAllowed 315360000 2017-04-25T05:30:02.281956-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 1040f old epoch: 0 new epoch:147 2017-04-25T05:30:02.281915-04:00 local0.notice pld0104 osafclmna[11250]: NO safNode=pld0104,safCluster=myClmCluster Joined cluster, nodeid=1040f 2017-04-25T05:30:02.281962-04:00 local0.notice pld0104 osafclmna[11250]: NO Starting to promote this node to a system controller 2017-04-25T05:30:01.395799-04:00 local0.notice pld0114 osafimmnd[2603]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 19026 2017-04-25T05:30:01.396359-04:00 local0.notice pld0114 osafimmnd[2603]: NO Epoch set to 147 in ImmModel 2017-04-25T05:30:02.785426-04:00 local0.notice pld0104 osafamfnd[11259]: NO Sending node up due to NCSMDS_UP 2017-04-25T05:30:04.605695-04:00 local0.notice scm2 osafimmnd[2793]: NO Announce sync, epoch:148 2017-04-25T05:30:04.605800-04:00 local0.notice scm2 osafimmnd[2793]: NO SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER 2017-04-25T05:30:04.605070-04:00 local0.notice pld0110 osafimmnd[22477]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.606138-04:00 local0.notice scm2 osafimmd[2782]: NO Successfully announced sync. New ruling epoch:148 2017-04-25T05:30:04.606534-04:00 local0.notice scm2 osafimmnd[2793]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.602397-04:00 local0.notice cmm02b osafimmnd[5129]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.603079-04:00 local0.notice pld0205 osafimmnd[3150]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.602959-04:00 local0.notice pld0114 osafimmnd[2603]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.603701-04:00 local0.notice pld0210 osafimmnd[4345]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.603020-04:00 local0.notice cmm02a osafimmnd[5102]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.602634-04:00 local0.notice pld0206 osafimmnd[3154]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:04.713226-04:00 user.notice scm2 osafimmloadd: NO Sync starting 2017-04-25T05:30:04.605741-04:00 local0.notice pld0104 osafimmnd[7230]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:05.294097-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 1060f old epoch: 146 new epoch:147 2017-04-25T05:30:04.601756-04:00 local0.notice pld0106 osafimmnd[2996]: NO NODE STATE-> IMM_NODE_R_AVAILABLE 2017-04-25T05:30:11.332048-04:00 local0.notice scm2 osafamfd[3356]: NO Received node_up from 1040f: msg_id 1 2017-04-25T05:30:11.332162-04:00 local0.notice scm2 osafamfd[3356]: NO Node 'PLD0104' joined the cluster 2017-04-25T05:30:11.929680-04:00 local0.notice pld0104 osafamfnd[11259]: NO 'safSu=PLD0104,safSg=NoRed,safApp=OpenSAF' Presence State UNINSTANTIATED => INSTANTIATING 2017-04-25T05:30:11.964354-04:00 local0.notice pld0104 osafmsgnd[11467]: Started 2017-04-25T05:30:12.022763-04:00 local0.notice pld0104 osafsmfnd[11487]: Started 2017-04-25T05:30:12.087846-04:00 local0.notice pld0104 osaflcknd[11497]: Started 2017-04-25T05:30:12.153829-04:00 local0.notice pld0104 osafamfwd[11514]: Started 2017-04-25T05:30:12.206118-04:00 local0.notice pld0104 osafckptnd[11524]: Started 2017-04-25T05:30:11.458565-04:00 local0.err pld0110 osafmsgnd[2839]: ER saImmOiDispatch returned BAD_HANDLE with error 9 2017-04-25T05:30:11.458704-04:00 local0.err pld0110 osafmsgnd[2839]: ER saImmOiDispatch returned BAD_HANDLE with error 9 2017-04-25T05:30:21.469886-04:00 local0.err pld0110 osafmsgnd[2839]: ER saImmOiImplementerSet FAILED:5 2017-04-25T05:30:21.472553-04:00 local0.notice pld0110 osafamfnd[2789]: NO 'safSu=PLD0110,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 60000000000 ns) 2017-04-25T05:30:21.472646-04:00 local0.notice pld0110 osafamfnd[2789]: NO Restarting a component of 'safSu=PLD0110,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) 2017-04-25T05:30:21.472686-04:00 local0.notice pld0110 osafamfnd[2789]: NO 'safComp=MQND,safSu=PLD0110,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' 2017-04-25T05:30:21.571145-04:00 local0.notice pld0110 osafmsgnd[25083]: Started 2017-04-25T05:30:22.223219-04:00 local0.warning pld0104 osafckptnd[11524]: WA cpnd_lib_init: saClmInitialize returned 5 2017-04-25T05:30:22.322777-04:00 local0.notice pld0104 osafamfnd[11259]: NO Instantiation of 'safComp=CPND,safSu=PLD0104,safSg=NoRed,safApp=OpenSAF' failed 2017-04-25T05:30:22.322819-04:00 local0.notice pld0104 osafamfnd[11259]: NO Reason: component registration timer expired 2017-04-25T05:30:22.500453-04:00 local0.notice pld0104 osafckptnd[11709]: Started 2017-04-25T05:30:24.335223-04:00 local0.warning scm2 osafimmnd[2793]: WA ERR_BAD_HANDLE: Handle use is blocked by pending reply on syncronous call 2017-04-25T05:30:24.335334-04:00 local0.notice scm2 osafimmnd[2793]: NO Implementer locally disconnected. Marking it as doomed 192 <23, 1100f> (safAmfService) 2017-04-25T05:30:24.335776-04:00 local0.warning scm2 osafamfd[3356]: WA saImmOiRtObjectUpdate of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' saAmfSURestartCount failed with 9 2017-04-25T05:30:24.336372-04:00 local0.warning scm2 osafimmnd[2793]: WA ERR_BAD_HANDLE: Client 98784317455 not found in server 2017-04-25T05:30:24.338606-04:00 local0.warning scm2 osafamfd[3356]: WA saImmOiRtObjectUpdate of 'safSu=SCM2,safSg=EVIPAppGroup,safApp=EVIPAppGroup' saAmfSURestartCount failed with 9 2017-04-25T05:30:24.341207-04:00 local0.notice scm2 osafamfd[3356]: NO Re-initializing with IMM 2017-04-25T05:30:24.339965-04:00 local0.warning scm2 osafimmnd[2793]: WA ERR_BAD_HANDLE: Client 98784317455 not found in server 2017-04-25T05:30:24.341616-04:00 local0.warning scm2 osafimmnd[2793]: WA IMMND - Client Node Get Failed for cli_hdl 98784317455 2017-04-25T05:30:24.483922-04:00 local0.warning scm2 osafimmnd[2793]: WA ERR_NO_RESOURCES: SearchNext: Implementer died during fetch of pure RTA 2017-04-25T05:30:24.670234-04:00 local0.notice scm2 osafamfnd[4402]: NO 'safSu=SCM2,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 60000000000 ns) 2017-04-25T05:30:24.670313-04:00 local0.notice scm2 osafamfnd[4402]: NO Restarting a component of 'safSu=SCM2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) 2017-04-25T05:30:24.670514-04:00 local0.notice scm2 osafamfnd[4402]: NO 'safComp=IMMND,safSu=SCM2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' 2017-04-25T05:30:24.881102-04:00 local0.notice scm2 osafimmnd[16386]: Started 2017-04-25T05:30:24.898286-04:00 local0.notice scm2 osafimmnd[16386]: NO IMMD service is UP ... ScAbsenseAllowed?:0 introduced?:0 2017-04-25T05:30:24.902976-04:00 local0.notice scm2 osafimmnd[16386]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING 2017-04-25T05:30:24.674108-04:00 local0.notice scm2 osafamfd[3356]: NO Re-initializing with IMM 2017-04-25T05:30:25.355375-04:00 local0.err scm2 osafamfd[3356]: ER saImmOiImplementerSet failed 9 2017-04-25T05:30:25.757042-04:00 local0.notice scm2 osafimmd[2782]: NO MDS event from svc_id 25 (change:4, dest:299134048919568) 2017-04-25T05:30:25.757177-04:00 local0.notice scm2 osafimmnd[16386]: NO Fevs count adjusted to 203211 preLoadPid: 0 2017-04-25T05:30:25.757299-04:00 local0.notice scm2 osafimmd[2782]: NO MDS event from svc_id 25 (change:3, dest:299135072567446) 2017-04-25T05:30:25.757673-04:00 local0.warning scm2 osafimmd[2782]: WA IMMND coordinator at 1100f apparently crashed => electing new coord 2017-04-25T05:30:25.757750-04:00 local0.notice scm2 osafimmd[2782]: NO Coord elected at payload:1040f 2017-04-25T05:30:25.757775-04:00 local0.notice scm2 osafimmd[2782]: NO New coord elected, resides at 1040f 2017-04-25T05:30:25.758552-04:00 local0.notice pld0104 osafimmnd[7230]: NO This IMMND is now the NEW Coord 2017-04-25T05:30:25.758662-04:00 local0.warning pld0104 osafimmnd[7230]: WA ABORTING UNCOMPLETED SYNC - COORDINATOR MUST HAVE CRASHED 2017-04-25T05:30:25.757871-04:00 local0.notice pld0110 osafimmnd[22477]: NO Global discard node received for nodeId:1100f pid:2793 2017-04-25T05:30:25.757948-04:00 local0.notice pld0110 osafimmnd[22477]: NO Implementer disconnected 189 <0, 1100f(down)> (safLogService) 2017-04-25T05:30:25.757987-04:00 local0.notice pld0110 osafimmnd[22477]: NO Implementer disconnected 190 <0, 1100f(down)> (@safLogService_appl) 2017-04-25T05:30:25.758017-04:00 local0.notice pld0110 osafimmnd[22477]: NO Implementer disconnected 191 <0, 1100f(down)> (safClmService) 2017-04-25T05:30:25.758047-04:00 local0.notice pld0110 osafimmnd[22477]: NO Implementer disconnected 192 <0, 1100f(down)> (safAmfService) 2017-04-25T05:30:25.758077-04:00 local0.notice pld0110 osafimmnd[22477]: NO Implementer disconnected 193 <0, 1100f(down)> (MsgQueueService69647) 2017-04-25T05:30:25.756191-04:00 local0.notice pld0205 osafimmnd[3150]: NO Global discard node received for nodeId:1100f pid:2793 2017-04-25T05:30:25.756338-04:00 local0.notice pld0205 osafimmnd[3150]: NO Implementer disconnected 195 <0, 1100f(down)> (safCheckPointService) 2017-04-25T05:30:25.756418-04:00 local0.notice pld0205 osafimmnd[3150]: NO Implementer disconnected 199 <0, 1100f(down)> (safSmfService) 2017-04-25T05:30:25.756501-04:00 local0.notice pld0205 osafimmnd[3150]: NO Implementer disconnected 198 <0, 1100f(down)> (safLckService) 2017-04-25T05:30:25.758209-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:23 2017-04-25T05:30:25.758249-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:24 2017-04-25T05:30:25.758285-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:24 2017-04-25T05:30:25.758328-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:23 2017-04-25T05:30:25.758362-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:24 2017-04-25T05:30:25.758437-04:00 local0.warning scm2 osafimmd[2782]: message repeated 2 times: [ WA Error returned from processing message err:0 msg-type:24] 2017-04-25T05:30:25.758962-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:14 2017-04-25T05:30:25.759139-04:00 local0.notice scm2 osafimmd[2782]: NO ACT: New Epoch for IMMND process at node 10e0f old epoch: 146 new epoch:147 2017-04-25T05:30:25.759267-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:24 2017-04-25T05:30:25.759687-04:00 local0.warning scm2 osafimmd[2782]: message repeated 3 times: [ WA Error returned from processing message err:0 msg-type:24] 2017-04-25T05:30:25.759810-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:23 2017-04-25T05:30:25.759958-04:00 local0.warning scm2 osafimmd[2782]: WA Error returned from processing message err:0 msg-type:15 Regards, Jianfeng ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
