- **Milestone**: 4.3.3 --> 4.4.2
---
** [tickets:#825] Quiesced controller goes for reboot and fails to join the
cluster**
**Status:** unassigned
**Milestone:** 4.4.2
**Created:** Thu Mar 27, 2014 10:34 AM UTC by Sirisha Alla
**Last Updated:** Sat Sep 13, 2014 09:38 AM UTC
**Owner:** nobody
The issue is seen on 4.4RC2 tag code base on 4 node SLES VMs. Single PBE is
enabled. IMM tests are running during continuous switchovers.
Syslog of SLOT2(SC-2):
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO safSi=SC-2N,safApp=OpenSAF
Swap done
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO Controller switch over
initiated
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO ROLE SWITCH Active -->
Quiesced
Mar 27 12:56:28 SLES-64BIT-SLOT2 osafrded[5477]: NO RDE role set to QUIESCED
Mar 27 12:56:29 SLES-64BIT-SLOT2 osafimmnd[5506]: NO Implementer disconnected
283 <10, 2020f> (safAmfService)
Mar 27 12:57:24 SLES-64BIT-SLOT2 osafamfnd[5571]: ER AMF director heart beat
timeout, generating core for amfd
Mar 27 12:57:25 SLES-64BIT-SLOT2 osafamfnd[5571]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599,
SupervisionTime = 60
Mar 27 12:57:25 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
Mar 27 12:57:29 SLES-64BIT-SLOT2 kernel: [ 6762.000085] md: stopping all md
devices.
Mar 27 12:57:29 SLES-64BIT-SLOT2 kernel: [ 6763.000007] sd 0:0:0:0: [sda]
Synchronizing SCSI cache
Mar 27 12:57:30 SLES-64BIT-SLOT2 kernel: [ 6764.001595] ohci_hcd 0000:00:06.0:
PCI INT A disabled
Syslog of SLOT1(SC-1):
Mar 27 12:56:27 SLES-64BIT-SLOT1 osafamfnd[7123]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 27 12:56:27 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected
281 <0, 2020f> (@OpenSafImmReplicatorB)
Mar 27 12:56:29 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected
283 <0, 2020f> (safAmfService)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Node Down event for node id
2020f:
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafclmd[7090]: NO proc_initialize_msg: send
failed. dest:2020f57f0c01c
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Current role: STANDBY
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131343, SupervisionTime = 60
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800090] TIPC: Resetting link
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800100] TIPC: Lost link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800105] TIPC: Lost contact with
<1.1.2>
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Global discard node
received for nodeId:2020f pid:5506
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected
14 <0, 2020f(down)> (MsgQueueService131599)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected
285 <0, 2020f(down)> (@applier2testMA_verifyObjAbortCallbackNode_69_101)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: WA Detected crash at node
2020f, abort ccbId 38
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Ccb 38 ABORTED (<released>)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmd[7041]: WA IMMD lost contact with peer
IMMD (NCSMDS_RED_DOWN)
Mar 27 12:57:31 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Controller Failover: Setting
role to ACTIVE
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafrded[7022]: NO RDE role set to ACTIVE
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafclmd[7090]: NO ACTIVE request
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafamfd[7113]: NO FAILOVER StandBy --> Active
After this SLOT2(SC-2) does not join the cluster.
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafclmna[4133]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafclmna[4133]: NO
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfd[4142]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfd[4142]: WA configuration validation
error: Required attribute saAmfCtDefQuiescingCompleteTimeout not configured for
'safVersion=4.0.0,safCompType=Comp_2nApp_2n_1_1'
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer (applier)
connected: 470 (@safAmfService2020f) <10, 2020f>
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfnd[4152]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer (applier)
connected: 471 (@OpenSafImmReplicatorB) <4, 2020f>
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafntfimcnd[4116]: NO Started
Mar 27 14:56:26 SLES-64BIT-SLOT2 osafamfd[4142]: NO Cold sync complete!
Mar 27 14:56:28 SLES-64BIT-SLOT2 osafimmnd[4087]: NO PBE-OI established on
other SC. Dumping incrementally to file imm.db
......
......
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER Timed-out for response from
AMFND
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER Going for recovery
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfd[4142]: exiting for shutdown
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfnd[4152]: ER AMF director unexpectedly
crashed
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfnd[4152]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer locally
disconnected. Marking it as doomed 470 <10, 2020f> (@safAmfService2020f)
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer disconnected
470 <10, 2020f> (@safAmfService2020f)
The node joins the cluster only after the cluster is reset.
Traces of amfd and amfnd are attached.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets