Hi,

Please see inline with [Praveen]

Thanks,
Praveen

On 30-Nov-16 1:39 AM, Kang-Sen Lu wrote:
> We are running opensaf 4.4.0.
>
> In our chassis C7000, we have slot-5 as active controller, slot-10 as standby 
> controller, and slot-1 as payload controller.
>
> Somehow, slot-5 rebooted. Applications on slot-1 were terminated, but not 
> restarted automatically as expected.
>
> Here is a piece of syslog from slot-1. I hope someone can point out what 
> happened to the opensaf on slot-1, and can explain why applications on slot-1 
> not restarted as expected.
>
> ===============
> Nov 28 05:33:01 BHA-IND-WHF-KK-CAE-1 CRON[3462]: (root) CMD 
> (/usr/share/platform-config/c7000/update-ssh-keys)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.489596] tipc: Resetting 
> link <1.1.17:fabric1.96-1.1.81:fabric1.96>, peer not responding
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.489602] tipc: Lost link 
> <1.1.17:fabric1.96-1.1.81:fabric1.96> on network plane B
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: WA DISCARD DUPLICATE 
> FEVS message:69286
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: WA Error code 2 
> returned for message type 57 - ignoring
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: WA DISCARD DUPLICATE 
> FEVS message:69287
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: WA Error code 2 
> returned for message type 57 - ignoring
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Global discard node 
> received for nodeId:10501 pid:35221
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 126 <0, 10501(down)> (safLogService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 127 <0, 10501(down)> (safClmService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 128 <0, 10501(down)> (safAmfService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 125 <0, 10501(down)> (MsgQueueService66817)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 129 <0, 10501(down)> (safMsgGrpService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 130 <0, 10501(down)> (safCheckPointService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 132 <0, 10501(down)> (safLckService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 131 <0, 10501(down)> (safEvtService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 134 <0, 10501(down)> (safSmfService)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> connected: 145 (safClmService) <0, 10a01>
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> connected: 146 (safLogService) <0, 10a01>
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 135 <0, 10a01> (@safAmfService10a01)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafamfnd[38632]: NO This node has 
> exited the cluster
[Praveen] This log means CLM node, to which this payload is mapped, has 
lost CLM cluster membership. When a node loses its CLM membership, AMF 
cannot provide service to applications hosted on that node and all non 
OpenSAF components will be terminated.
Below component is terminated as an outcome of that.
These components will be re-instantiated when this node will again 
become a member node.

Please check syslog on new active controller.
Are you performing any CLM lock/shutdown operation on the slot-1 node.
I see a ticket #1120 where CLM sends a stale track callback. This was 
fixed after 4.4 GA release


> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.573512] tipc: Resetting 
> link <1.1.17:fabric0.96-1.1.81:fabric0.96>, peer not responding
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.573518] tipc: Lost link 
> <1.1.17:fabric0.96-1.1.81:fabric0.96> on network plane A
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 netmonT2FW_clean: Cleanup for CompName: 
> safComp=NetMonT2FW_PL-1,safSu=NetMonT2FWSU_PL-1,safSg=NetMonT2FWSG,safApp=NetMonT2FWApp
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 netmonT1Sweeper_clean: Cleanup for 
> CompName: 
> safComp=NetMonT1Sweeper_PL-1,safSu=NetMonT1SweeperSU_PL-1,safSg=NetMonT1SweeperSG,safApp=NetMonT1SweeperApp
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 netmonT1FL_clean: Cleanup for CompName: 
> safComp=NetMonT1FL_PL-1,safSu=NetMonT1FLSU_PL-1,safSg=NetMonT1FLSG,safApp=NetMonT1FLApp
> ==============
>
> The opensaf execduted " netmonT1FL_clean" for terminating netmonT1FL 
> application, it should execute " netmonT1FL_inst" to restart that application.
>
> Thanks.
>
> Kang-sen
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to