---

** [tickets:#2337] cpd got crashed on new Active(SC-2) during checkpoint open 
with Active replica and write flag after si-swap operation**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 05:10 AM UTC by Ritu Raj
**Last Updated:** Fri Mar 03, 2017 05:10 AM UTC
**Owner:** nobody


#Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )


#Summary
cpd got crashed on new Active(SC-2) during checkpoint open with Active replica 
and write flag after si-swap operation

#Steps followed & Observed behaviour
    Invoke switchovers
        After few successfull switchovers while opening checkpoint with Active 
replica cpd got crashed
        Below is the API flow:
        1. Initialize ckpt with callbacks.
        2. Create checkpoint with Active Replica and write flag.
        3. Invoke switchover.
        4. Close checkpoint.
        5. Open same checkpoint with Active Replica and write flag.( at this 
step cpd got crashed)

Following is the syslog: 
Mar  2 13:25:38 TestBed-R2 osafimmnd[2118]: NO Implementer (applier) connected: 
14028 (@safLogService_appl) <2163, 2020f>
Mar  2 13:25:38 TestBed-R2 osafamfnd[2168]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar  2 13:25:38 TestBed-R2 osafimmnd[2118]: NO Implementer (applier) connected: 
14029 (@safSmf_applier1) <131, 2020f>
Mar  2 13:25:38 TestBed-R2 osafrded[2088]: NO Peer up on node 0x2010f
Mar  2 13:25:38 TestBed-R2 osafrded[2088]: NO Got peer info request from node 
0x2010f with role STANDBY
Mar  2 13:25:38 TestBed-R2 osafrded[2088]: NO Got peer info response from node 
0x2010f with role STANDBY
Mar  2 13:25:38 TestBed-R2 osafimmnd[2118]: NO Implementer disconnected 14017 
<0, 2010f> (@OpenSafImmReplicatorB)
Mar  2 13:25:38 TestBed-R2 osafamfnd[2168]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar  2 13:25:38 TestBed-R2 osafamfnd[2168]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar  2 13:25:38 TestBed-R2 osafamfnd[2168]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Mar  2 13:25:38 TestBed-R2 opensaf_reboot: Rebooting local node; timeout=60


BT 
(gdb) bt
0  0x00007fa91d057c95 in cpd_proc_increase_node_user_info 
(ckpt_node=0x7fa91d2bcff0, cpnd_dest=566317152296976, open_flags=2)
    at src/ckpt/ckptd/cpd_proc.c:1650
1  0x00007fa91d046bd0 in cpd_evt_proc_ckpt_usr_info (cb=0x7fa91d299980, 
evt=0x7fa91d2bd120, sinfo=0x7fa91d2bd778) at src/ckpt/ckptd/cpd_evt.c:455
2  0x00007fa91d045799 in cpd_process_evt (evt=0x7fa91d2bd110) at 
src/ckpt/ckptd/cpd_evt.c:116
3  0x00007fa91d04df15 in cpd_main_process (cb=0x7fa91d299980) at 
src/ckpt/ckptd/cpd_init.c:661
4  0x00007fa91d04e241 in main (argc=1, argv=0x7fffd6210e78) at 
src/ckpt/ckptd/cpd_main.c:74
(gdb)


Notes:
1. Syslog of both controllers attched
2. BT attached
3. Both nodes are not in time sysnc, there is time gap between two nodes 
   Relative to SC-2, SC-1 is (+50 min ahead) 
Time Diff
==========
TestBed-R1:~ # date
Thu Mar  2 16:34:45 IST 2017
TestBed-R2:~ # date
Thu Mar  2 15:44:30 IST 2017
=========


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to