Hi, Praveen:

Thanks for your reply. Now I know why slot-1 shutdown.

I am not clear what you are talking about in:

Please check syslog on new active controller.
Are you performing any CLM lock/shutdown operation on the slot-1 node.

I do have the syslog from new active controller saved somewhere. Can you tell 
me exactly what log text to look for?

My current understanding is after 5 minutes, the slot-5 finished reboot and 
became standby controller properly. Slot-10 took no time to transition from 
standby to active, right after slot-5 rebooted.

But slot-1 never rejoin the cluster, until someone explicitly restarted opensaf 
on slot-1.

Kang-sen

-----Original Message-----
From: praveen malviya [mailto:praveen.malv...@oracle.com] 
Sent: Wednesday, November 30, 2016 3:44 AM
To: opensaf-users@lists.sourceforge.net
Subject: Re: [users] question about payload blade recovery

Hi,

Please see inline with [Praveen]

Thanks,
Praveen

On 30-Nov-16 1:39 AM, Kang-Sen Lu wrote:
> We are running opensaf 4.4.0.
>
> In our chassis C7000, we have slot-5 as active controller, slot-10 as standby 
> controller, and slot-1 as payload controller.
>
> Somehow, slot-5 rebooted. Applications on slot-1 were terminated, but not 
> restarted automatically as expected.
>
> Here is a piece of syslog from slot-1. I hope someone can point out what 
> happened to the opensaf on slot-1, and can explain why applications on slot-1 
> not restarted as expected.
>
> ===============
> Nov 28 05:33:01 BHA-IND-WHF-KK-CAE-1 CRON[3462]: (root) CMD 
> (/usr/share/platform-config/c7000/update-ssh-keys)
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.489596] tipc: 
> Resetting link <1.1.17:fabric1.96-1.1.81:fabric1.96>, peer not 
> responding Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: 
> [6402453.489602] tipc: Lost link <1.1.17:fabric1.96-1.1.81:fabric1.96> 
> on network plane B Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 
> osafimmnd[38590]: WA DISCARD DUPLICATE FEVS message:69286 Nov 28 
> 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: WA Error code 2 
> returned for message type 57 - ignoring Nov 28 05:33:31 
> BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: WA DISCARD DUPLICATE FEVS 
> message:69287 Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: 
> WA Error code 2 returned for message type 57 - ignoring Nov 28 
> 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Global discard node 
> received for nodeId:10501 pid:35221 Nov 28 05:33:31 
> BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer disconnected 126 
> <0, 10501(down)> (safLogService) Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 
> osafimmnd[38590]: NO Implementer disconnected 127 <0, 10501(down)> 
> (safClmService) Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: 
> NO Implementer disconnected 128 <0, 10501(down)> (safAmfService) Nov 
> 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 125 <0, 10501(down)> (MsgQueueService66817) Nov 28 
> 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> disconnected 129 <0, 10501(down)> (safMsgGrpService) Nov 28 05:33:31 
> BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer disconnected 130 
> <0, 10501(down)> (safCheckPointService) Nov 28 05:33:31 
> BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer disconnected 132 
> <0, 10501(down)> (safLckService) Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 
> osafimmnd[38590]: NO Implementer disconnected 131 <0, 10501(down)> 
> (safEvtService) Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: 
> NO Implementer disconnected 134 <0, 10501(down)> (safSmfService) Nov 
> 28 05:33:31 BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer 
> connected: 145 (safClmService) <0, 10a01> Nov 28 05:33:31 
> BHA-IND-WHF-KK-CAE-1 osafimmnd[38590]: NO Implementer connected: 146 
> (safLogService) <0, 10a01> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 
> osafimmnd[38590]: NO Implementer disconnected 135 <0, 10a01> 
> (@safAmfService10a01) Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 
> osafamfnd[38632]: NO This node has exited the cluster
[Praveen] This log means CLM node, to which this payload is mapped, has lost 
CLM cluster membership. When a node loses its CLM membership, AMF cannot 
provide service to applications hosted on that node and all non OpenSAF 
components will be terminated.
Below component is terminated as an outcome of that.
These components will be re-instantiated when this node will again become a 
member node.

Please check syslog on new active controller.
Are you performing any CLM lock/shutdown operation on the slot-1 node.
I see a ticket #1120 where CLM sends a stale track callback. This was fixed 
after 4.4 GA release


> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.573512] tipc: Resetting 
> link <1.1.17:fabric0.96-1.1.81:fabric0.96>, peer not responding
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 kernel: [6402453.573518] tipc: Lost link 
> <1.1.17:fabric0.96-1.1.81:fabric0.96> on network plane A
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 netmonT2FW_clean: Cleanup for CompName: 
> safComp=NetMonT2FW_PL-1,safSu=NetMonT2FWSU_PL-1,safSg=NetMonT2FWSG,safApp=NetMonT2FWApp
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 netmonT1Sweeper_clean: Cleanup for 
> CompName: 
> safComp=NetMonT1Sweeper_PL-1,safSu=NetMonT1SweeperSU_PL-1,safSg=NetMonT1SweeperSG,safApp=NetMonT1SweeperApp
> Nov 28 05:33:31 BHA-IND-WHF-KK-CAE-1 netmonT1FL_clean: Cleanup for CompName: 
> safComp=NetMonT1FL_PL-1,safSu=NetMonT1FLSU_PL-1,safSg=NetMonT1FLSG,safApp=NetMonT1FLApp
> ==============
>
> The opensaf execduted " netmonT1FL_clean" for terminating netmonT1FL 
> application, it should execute " netmonT1FL_inst" to restart that application.
>
> Thanks.
>
> Kang-sen
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to