- **Milestone**: 4.4.2 --> 4.7-Tentative


---

** [tickets:#825] Quiesced controller goes for reboot and fails to join the 
cluster**

**Status:** unassigned
**Milestone:** 4.7-Tentative
**Created:** Thu Mar 27, 2014 10:34 AM UTC by Sirisha Alla
**Last Updated:** Thu Jan 08, 2015 02:12 AM UTC
**Owner:** nobody

The issue is seen on 4.4RC2 tag code base on 4 node SLES VMs. Single PBE is 
enabled. IMM tests are running during continuous switchovers.

Syslog of SLOT2(SC-2):

Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO safSi=SC-2N,safApp=OpenSAF 
Swap done
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO Controller switch over 
initiated
Mar 27 12:56:27 SLES-64BIT-SLOT2 osafamfd[5561]: NO ROLE SWITCH Active --> 
Quiesced
Mar 27 12:56:28 SLES-64BIT-SLOT2 osafrded[5477]: NO RDE role set to QUIESCED
Mar 27 12:56:29 SLES-64BIT-SLOT2 osafimmnd[5506]: NO Implementer disconnected 
283 <10, 2020f> (safAmfService)
Mar 27 12:57:24 SLES-64BIT-SLOT2 osafamfnd[5571]: ER AMF director heart beat 
timeout, generating core for amfd
Mar 27 12:57:25 SLES-64BIT-SLOT2 osafamfnd[5571]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, 
SupervisionTime = 60
Mar 27 12:57:25 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Mar 27 12:57:29 SLES-64BIT-SLOT2 kernel: [ 6762.000085] md: stopping all md 
devices.
Mar 27 12:57:29 SLES-64BIT-SLOT2 kernel: [ 6763.000007] sd 0:0:0:0: [sda] 
Synchronizing SCSI cache
Mar 27 12:57:30 SLES-64BIT-SLOT2 kernel: [ 6764.001595] ohci_hcd 0000:00:06.0: 
PCI INT A disabled

Syslog of SLOT1(SC-1):

Mar 27 12:56:27 SLES-64BIT-SLOT1 osafamfnd[7123]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 27 12:56:27 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
281 <0, 2020f> (@OpenSafImmReplicatorB)
Mar 27 12:56:29 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
283 <0, 2020f> (safAmfService)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Node Down event for node id 
2020f:
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafclmd[7090]: NO proc_initialize_msg: send 
failed. dest:2020f57f0c01c
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Current role: STANDBY
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 
131343, SupervisionTime = 60
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800090] TIPC: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800100] TIPC: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Mar 27 12:57:31 SLES-64BIT-SLOT1 kernel: [ 6814.800105] TIPC: Lost contact with 
<1.1.2>
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Global discard node 
received for nodeId:2020f pid:5506
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
14 <0, 2020f(down)> (MsgQueueService131599)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Implementer disconnected 
285 <0, 2020f(down)> (@applier2testMA_verifyObjAbortCallbackNode_69_101)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: WA Detected crash at node 
2020f, abort ccbId  38
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmnd[7051]: NO Ccb 38 ABORTED (<released>)
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafimmd[7041]: WA IMMD lost contact with peer 
IMMD (NCSMDS_RED_DOWN)
Mar 27 12:57:31 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the 
absence of PLM is outside the scope of OpenSAF
Mar 27 12:57:31 SLES-64BIT-SLOT1 osaffmd[7031]: NO Controller Failover: Setting 
role to ACTIVE
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafrded[7022]: NO RDE role set to ACTIVE
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafclmd[7090]: NO ACTIVE request
Mar 27 12:57:31 SLES-64BIT-SLOT1 osafamfd[7113]: NO FAILOVER StandBy --> Active

After this SLOT2(SC-2) does not join the cluster.

Mar 27 14:56:23 SLES-64BIT-SLOT2 osafclmna[4133]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafclmna[4133]: NO 
safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfd[4142]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfd[4142]: WA configuration validation 
error: Required attribute saAmfCtDefQuiescingCompleteTimeout not configured for 
'safVersion=4.0.0,safCompType=Comp_2nApp_2n_1_1'
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer (applier) 
connected: 470 (@safAmfService2020f) <10, 2020f>
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafamfnd[4152]: Started
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer (applier) 
connected: 471 (@OpenSafImmReplicatorB) <4, 2020f>
Mar 27 14:56:23 SLES-64BIT-SLOT2 osafntfimcnd[4116]: NO Started
Mar 27 14:56:26 SLES-64BIT-SLOT2 osafamfd[4142]: NO Cold sync complete!
Mar 27 14:56:28 SLES-64BIT-SLOT2 osafimmnd[4087]: NO PBE-OI established on 
other SC. Dumping incrementally to file imm.db
......
......
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER Timed-out for response from 
AMFND
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensafd[4026]: ER Going for recovery
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfd[4142]: exiting for shutdown
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfnd[4152]: ER AMF director unexpectedly 
crashed
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafamfnd[4152]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 131599, SupervisionTime = 60
Mar 27 15:28:23 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer locally 
disconnected. Marking it as doomed 470 <10, 2020f> (@safAmfService2020f)
Mar 27 15:28:23 SLES-64BIT-SLOT2 osafimmnd[4087]: NO Implementer disconnected 
470 <10, 2020f> (@safAmfService2020f)

The node joins the cluster only after the cluster is reset.

Traces of amfd and amfnd are attached.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to