I am able to reproduce the issue. Syslog below:
Sep 14 18:53:31 krishna-VirtualBox osafamfd[6029]: NO
safSi=SC-2N,safApp=OpenSAF Swap initiated
Sep 14 18:53:31 krishna-VirtualBox osafamfnd[6045]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer locally
disconnected. Marking it as doomed 34 <485, 2010f> (safSmfService)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
27 <336, 2010f> (safMsgGrpService)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
34 <485, 2010f> (safSmfService)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
35 <323, 2010f> (@safSmf_applier1)
Sep 14 18:53:31 krishna-VirtualBox osafsmfd[6065]: NO MDS
amf_quiesced_state_handler: smfd_mds_change_role()
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer locally
disconnected. Marking it as doomed 33 <489, 2010f> (safLogService)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer locally
disconnected. Marking it as doomed 32 <488, 2010f> (@safLogService_appl)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
33 <489, 2010f> (safLogService)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
32 <488, 2010f> (@safLogService_appl)
Sep 14 18:53:31 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
30 <328, 2010f> (safEvtService)
Sep 14 18:53:31 krishna-VirtualBox osafsmfd[6065]: NO MDS smfd_mds_change_role:
Setting; arg.info.vdest_chg_role.i_vdest = 0xf, ncsvda_api() rc = 1
Sep 14 18:53:32 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
29 <14, 2010f> (safClmService)
Sep 14 18:53:32 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
28 <343, 2010f> (safCheckPointService)
Sep 14 18:53:32 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
31 <334, 2010f> (safLckService)
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]: ER Alarm lost for
safComp=CPND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]:
safSu=PL-3,safSg=NoRed,safApp=OpenSAF OperState ENABLED => DISABLED
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]: WA State change notification
lost for 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
Sep 14 18:53:53 krishna-VirtualBox osafimmd[5966]: NO MDS event from svc_id 25
(change:4, dest:566314102483891)
Sep 14 18:53:53 krishna-VirtualBox osafimmnd[5978]: NO Global discard node
received for nodeId:2030f pid:3936
Sep 14 18:53:53 krishna-VirtualBox osafimmnd[5978]: NO Implementer disconnected
15 <0, 2030f(down)> (MsgQueueService131855)
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]: NO Node 'PL-3' left the
cluster
Sep 14 18:53:53 krishna-VirtualBox osafclmd[6018]: NO Node 131855 went down.
Not sending track callback for agents on that node
Sep 14 18:53:53 krishna-VirtualBox osafclmd[6018]: NO Node 131855 went down.
Not sending track callback for agents on that node
Sep 14 18:53:53 krishna-VirtualBox osafclmd[6018]:
safNode=PL-3,safCluster=myClmCluster LEFT, init view=3, cluster view=4
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]: ER Alarm lost for
safComp=CPND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]:
safSu=PL-3,safSg=NoRed,safApp=OpenSAF PresenceState INSTANTIATED =>
UNINSTANTIATED
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]: WA State change notification
lost for 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF'
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]:
safSu=PL-3,safSg=NoRed,safApp=OpenSAF ReadinessState IN_SERVICE =>
OUT_OF_SERVICE
Sep 14 18:53:53 krishna-VirtualBox osafamfd[6029]: ER Alarm lost for
safSi=NoRed3,safApp=OpenSAF
Sep 14 18:54:01 krishna-VirtualBox kernel: [22703.023603] Disabling bearer
<eth:eth0>
Sep 14 18:54:01 krishna-VirtualBox osafamfnd[6045]: ER
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackTimeout Recovery is:nodeFailfast
Sep 14 18:54:01 krishna-VirtualBox osafamfnd[6045]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131343, SupervisionTime = 60
Sep 14 18:54:01 krishna-VirtualBox opensaf_reboot: Rebooting local node;
timeout=60
---
** [tickets:#2061] smfd faulted on Active controller due to
csiSetcallbackTimeout during si-swap operation**
**Status:** assigned
**Milestone:** 5.18.09
**Created:** Thu Sep 22, 2016 09:26 AM UTC by Ritu Raj
**Last Updated:** Fri Sep 14, 2018 11:15 AM UTC
**Owner:** Krishna Pawar
**Attachments:**
-
[SC-1.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2061/attachment/SC-1.tar.bz2)
(178.0 kB; application/x-bzip)
-
[SC-2.tar.bz2](https://sourceforge.net/p/opensaf/tickets/2061/attachment/SC-2.tar.bz2)
(206.3 kB; application/x-bzip)
# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 3 nodes ( 2 controllers and 1 payloads with headless feature disabled &
1PBE with 100K objects)
# Summary
smfd faulted on Active(Previous active) controller due to csiSetcallbackTimeout
during si-swap operation
# Steps followed & Observed behaviour
1. Initiate si-swap operation from Active Controller, simultaneously killed
osafsmfnd on STANDBY and osafckptnd on Payload(PL-3)
2. Observed that, during role change smfd faulted on Active
>From the traces, it is observed that:
** In the file "osaf/services/saf/smfsv/smfd/smfd_smfnd.c"
there is no TRY_AGAIN mechanism for below api
~~~
/* Find Clm info about the node */
rc = saClmInitialize(&clmHandle, NULL, &clmVersion);
if (rc != SA_AIS_OK) {
LOG_ER("saClmInitialize failed, rc=%s", saf_error(rc));
if (newNode) free(smfnd);
pthread_mutex_unlock(&smfnd_list_lock);
return NCSCC_RC_FAILURE;
}
/* Get Clm info about the node */
SaClmClusterNodeT clmInfo;
rc = saClmClusterNodeGet(clmHandle, i_node_id,
10000000000LL, &clmInfo);
if (rc != SA_AIS_OK) {
LOG_ER("saClmClusterNodeGet failed, rc=%s", saf_error(rc));
if (newNode) free(smfnd);
rc = saClmFinalize(clmHandle);
if (rc != SA_AIS_OK) {
LOG_ER("saClmFinalize failed, rc=%s", saf_error(rc));
}
pthread_mutex_unlock(&smfnd_list_lock);
return NCSCC_RC_FAILURE;
}
~~~
**Syslog :
Sep 22 14:15:05 fos1 osafimmnd[6164]: NO Implementer disconnected 17 <0,
2030f(down)> (MsgQueueService131855)
Sep 22 14:15:08 fos1 osafamfnd[6253]: NO
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackTimeout' : Recovery is 'nodeFailfast'
Sep 22 14:15:08 fos1 osafamfnd[6253]: ER
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackTimeout Recovery is:nodeFailfast
Sep 22 14:15:08 fos1 osafamfnd[6253]: Rebooting OpenSAF NodeId = 131343 EE Name
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343,
SupervisionTime = 60
Sep 22 14:15:08 fos1 opensaf_reboot: Rebooting local node; timeout=60
Sep 22 14:15:09 fos1 osafsmfd[6272]: ER saClmInitialize failed,
rc=SA_AIS_ERR_TRY_AGAIN (6)
Sep 22 14:15:09 fos1 osafsmfd[6272]: WA proc_mds_info: SMFND UP failed
**Notes:
1. Syslog of controller's attached
2. smfd tarces attached
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list._______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets