---

** [tickets:#1486] SMFD faulted in active callback during switchovers**

**Status:** unassigned
**Milestone:** 4.5.2
**Created:** Wed Sep 16, 2015 10:04 AM UTC by Ritu Raj
**Last Updated:** Wed Sep 16, 2015 10:04 AM UTC
**Owner:** nobody


Setup
4.6GA with changeset 6490
4 nodes(OEL6.4 with TIPC version 1.7.7) configured with no PBE configured 

Issues Observed:
    > Cluser went for reboot during switchover as SMFD faulted due to 
'csiSetcallbackFailed'

Steps Performed:

 * Continuous switchovers are invoked on the setup.
 * After a count of over 1000 switchovers, Standby Controller (SC-2) got 
rebooted when it is being promoted to ACTIVE state , as SMFD failed in active 
callback.

Sep 16 06:25:00 SLOT-2 osafsmfd[1926]: ER amf_active_state_handler oi activate 
FAIL
Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: NO 
'safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 
'csiSetcallbackFailed' : Recovery is 'nodeFailfast'
Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: ER 
safComp=SMF,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due 
to:csiSetcallbackFailed Recovery is:nodeFailfast
Sep 16 06:25:00 SLOT-2 osafamfnd[1802]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60


* After SC-2 went for reboot, SC-1 tried to become active, during which smfd 
also faulted on the new promoted back active controller.

Sep 16 06:25:00 SLOT-1 root: Invoking switchover from invoke_switchover.sh
Sep 16 06:25:00 SLOT-1 osafamfd[3830]: NO safSi=SC-2N,safApp=OpenSAF Swap 
initiated
Sep 16 06:25:00 SLOT-1 osafamfnd[3845]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep 16 06:25:00 SLOT-1 osafsmfd[3871]: ncs_sel_obj_create: socketpair failed - 
Too many open files
....
Sep 16 06:25:05 SLOT-1 kernel: TIPC: Resetting link <1.1.1:eth0-1.1.2:eth1>, 
peer not responding
Sep 16 06:25:05 SLOT-1 kernel: TIPC: Lost link <1.1.1:eth0-1.1.2:eth1> on 
network plane A
Sep 16 06:25:05 SLOT-1 kernel: TIPC: Lost contact with <1.1.2>
Sep 16 06:25:05 SLOT-1 osaffmd[3716]: NO Node Down event for node id 2020f:
....
Sep 16 06:25:06 SLOT-1 osafimmnd[3746]: NO This IMMND re-elected coord 
redundantly, failover ?
Sep 16 06:25:06 SLOT-1 osafsmfd[3871]: ncs_sel_obj_create: socketpair failed - 
Too many open files
Sep 16 06:25:06 SLOT-1 osafsmfd[3871]: ER immutil_saImmOiInitialize_2 fail, rc 
= 2
...
Sep 16 06:25:06 SLOT-1 osafamfnd[3845]: ER 
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due 
to:csiSetcallbackFailed Recovery is:nodeFailfast
Sep 16 06:25:06 SLOT-1 osafamfnd[3845]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to