- **assigned_to**: A V Mahesh (AVM) -->  nobody 
- **Blocker**:  --> False



---

** [tickets:#457] Dtm: standby joins as active after restart in a 70 node 
setup**

**Status:** unassigned
**Milestone:** future
**Created:** Fri Jun 14, 2013 06:48 AM UTC by Neelakanta Reddy
**Last Updated:** Wed Jul 15, 2015 02:21 PM UTC
**Owner:** nobody
**Attachments:**

- 
[messages_SC1](https://sourceforge.net/p/opensaf/tickets/457/attachment/messages_SC1)
 (65.5 kB; application/octet-stream)
- 
[messages_SC2](https://sourceforge.net/p/opensaf/tickets/457/attachment/messages_SC2)
 (208.0 kB; application/octet-stream)


After analyzing the logs following is the observation:

Slot1 is active and slot2 is standby

1. IMMND killed in slot-2

Jun 11 21:29:46 SLES-64BIT-SLOT2 osafamfnd[3750]: NO 
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'


2. Active IMMD detected the slot-2 IMMND is discarded

Jun 11 15:54:02 SLES-64BIT-SLOT1 osafimmnd[3746]: NO Global discard node 
received for nodeId:2020f pid:3668


3. New immnd at slot2 requests for sync

Jun 11 21:29:46 SLES-64BIT-SLOT2 osafimmnd[7315]: Started

Jun 11 15:54:03 SLES-64BIT-SLOT1 osafimmd[3736]: NO Node 2020f request sync 
sync-pid:7315 epoch:0

4. slot2 went for reboot, IMMD is killed

Jun 11 21:29:49 SLES-64BIT-SLOT2 osafamfnd[3750]: ER 
safComp=IMMD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Jun 11 21:29:49 SLES-64BIT-SLOT2 osafamfnd[3750]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast
Jun 11 21:29:49 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node

5. After coming up the slot2 got active role (slot1 is still in active)

Jun 11 21:30:22 SLES-64BIT-SLOT2 osafrded[2095]: NO Peer not available => 
Active role
Jun 11 21:30:23 SLES-64BIT-SLOT2 osaffmd[2108]: Started
Jun 11 21:30:23 SLES-64BIT-SLOT2 osafimmd[2117]: Started
Jun 11 21:30:23 SLES-64BIT-SLOT2 osafimmnd[2127]: Started


6. After getting active role the node went for loading

Jun 11 21:30:23 SLES-64BIT-SLOT2 osafimmnd[2127]: NO This IMMND is now the NEW 
Coord

7. After some time, there is a connection established to the active node

Jun 11 21:30:23 SLES-64BIT-SLOT2 osafdtmd[2077]: NO Established contact with 
'SC-1
Jun 11 15:54:39 SLES-64BIT-SLOT1 osafdtmd[3696]: NO Established contact with 
'SC-2'


8. after connecting the loading event reaches to active IMMD at Slot1, the 
immnd up event is not received because by the time immnd is up the connection 
is not established between the two nodes.

Jun 11 15:54:42 SLES-64BIT-SLOT1 osafimmd[3736]: WA Wrong PID 0 != 2127

9. AMFD, tries to re-connect to IMM because, IMMND return bad_handle when the 
previous synchronous call from the amfd is not yet complete and AMFD requested 
for one more request on same handle.

Jun 11 15:54:49 SLES-64BIT-SLOT1 osafamfd[3815]: NO Re-initializing with IMM
Jun 11 15:54:49 SLES-64BIT-SLOT1 osafimmnd[3746]: WA IMMND - Client Node Get 
Failed for cli_hdl 85899477263
Jun 11 15:54:49 SLES-64BIT-SLOT1 osafamfd[3815]: ER saImmOiImplementerSet 
failed 14
Jun 11 15:54:49 SLES-64BIT-SLOT1 osafamfd[3815]: ER exiting since 
avd_imm_impl_set failed


conclusion:

The mds in the slot2 connected with slot1, after initiating loading in IMMND, 
because of this slot2 got active role. 


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to