- **Milestone**: 4.3.3 --> 4.4.2


---

** [tickets:#825] Quiesced controller goes for reboot and fails to join the 
cluster**

**Status:** unassigned
**Milestone:** 4.4.2
**Created:** Thu Mar 27, 2014 10:34 AM UTC by Sirisha Alla
**Last Updated:** Sat Sep 13, 2014 09:38 AM UTC
**Owner:** nobody

The issue is seen on 4.4RC2 tag code base on 4 node SLES VMs. Single PBE is 
enabled. IMM tests are running during continuous switchovers.

Syslog of SLOT2(SC-2):

Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO safSi=SC-2N,safApp=OpenSAF 
Swap done
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO Controller switch over 
initiated
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO ROLE SWITCH Active --> 
Quiesced
Mar 27 12:56:28 SLES-64BIT-SLOT2 osafrded[5477]: NO RDE role set to QUIESCED
Mar 27 12:56:29 SLES-64BIT-SLOT2 osafimmnd[5506]: NO Implementer disconnected 
283 <10, 2020f> (safAmfService)
Mar 27 12:57:24 SLES-64BIT-SLOT2 osafamfnd[5571]: ER AMF director heart beat 
timeout, generating core for amfd
Mar 27 12:57:25 SLES-64BIT-SLOT2 osafamfnd[5571]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, 
SupervisionTime = 60
Mar 27 12:57:25 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Mar 27 12:57:29 SLES-64BIT-SLOT2 kernel: [ 6762.000085] md: stopping all md 
devices.
Mar 27 12:57:29 SLES-64BIT-SLOT2 kernel: [ 6763.000007] sd 0:0:0:0: [sda] 
Synchronizing SCSI cache
Mar 27 12:57:30 SLES-64BIT-SLOT2 kernel: [ 6764.001595] ohci_hcd 0000:00:06.0: 
PCI INT A disabled

Syslog of SLOT1(SC-1):

Mar 27 12:56:27 SLES-64BIT-SLOT1 osafamfnd[7123]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 27 12:56:27 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
281 <0, 2020f> (@OpenSafImmReplicatorB)
Mar 27 12:56:29 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
283 <0, 2020f> (safAmfService)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Node Down event for node id 
2020f:
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafclmd[7090]: NO proc_initialize_msg: send 
failed. dest:2020f57f0c01c
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Current role: STANDBY
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800090] TIPC: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800100] TIPC: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800105] TIPC: Lost contact with 
<1.1.2>
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Global discard node 
received for nodeId:2020f pid:5506
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
14 <0, 2020f(down)> (MsgQueueService131599)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
285 <0, 2020f(down)> (@applier2testMA_verifyObjAbortCallbackNode_69_101)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: WA Detected crash at node 
2020f, abort ccbId  38
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Ccb 38 ABORTED (<released>)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmd[7041]: WA IMMD lost contact with peer 
IMMD (NCSMDS_RED_DOWN)
Mar 27 12:57:31 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Controller Failover: Setting 
role to ACTIVE
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafrded[7022]: NO RDE role set to ACTIVE
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafclmd[7090]: NO ACTIVE request
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafamfd[7113]: NO FAILOVER StandBy --> Active

After this SLOT2(SC-2) does not join the cluster.

Mar 27 14:56:23 SLES-64BIT-SLOT2 osafclmna[4133]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafclmna[4133]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfd[4142]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfd[4142]: WA configuration validation 
error: Required attribute saAmfCtDefQuiescingCompleteTimeout not configured for 
'safVersion=4.0.0,safCompType=Comp_2nApp_2n_1_1'
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer (applier) 
connected: 470 (@safAmfService2020f) <10, 2020f>
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfnd[4152]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer (applier) 
connected: 471 (@OpenSafImmReplicatorB) <4, 2020f>
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafntfimcnd[4116]: NO Started
Mar 27 14:56:26 SLES-64BIT-SLOT2 osafamfd[4142]: NO Cold sync complete!
Mar 27 14:56:28 SLES-64BIT-SLOT2 osafimmnd[4087]: NO PBE-OI established on 
other SC. Dumping incrementally to file imm.db
......
......
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER Timed-out for response from 
AMFND
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER Going for recovery
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfd[4142]: exiting for shutdown
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfnd[4152]: ER AMF director unexpectedly 
crashed
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfnd[4152]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131599, SupervisionTime = 60
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer locally 
disconnected. Marking it as doomed 470 <10, 2020f> (@safAmfService2020f)
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer disconnected 
470 <10, 2020f> (@safAmfService2020f)

The node joins the cluster only after the cluster is reset.

Traces of amfd and amfnd are attached.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to