I tried to reproduce this ticket on opensaf 5.18.04.
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )
1)In this setup, SC-1 is Active and SC-2 is standby
root@mohan-VirtualBox:~# amf-state siass
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=STANDBY(2)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,


safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

2)I ran the demo application, which creates checkpoints continuously on PL-3
root@mohan-VirtualBox:/home/mohan/ticket2011# ./tkt
*******************************************************************
Demonstrating Checkpoint Service
*******************************************************************
Initialising With Checkpoint Service....
 saCkptInitialize passed
Opening  Checkpoint = safCkpt=DemoCkpt000,safApp=safCkptService with create 
flags....
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt001,safApp=safCkptService with create 
flags....
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt002,safApp=safCkptService with create 
flags....
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt003,safApp=safCkptService with create 
flags....
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt004,safApp=safCkptService with create 
flags....
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt005,safApp=safCkptService with create 
flags....
saCkptCheckpointOpen failed 6
Opening  Checkpoint = safCkpt=DemoCkpt006,safApp=safCkptService with create 
flags....
saCkptCheckpointOpen failed 6
Opening  Checkpoint = safCkpt=DemoCkpt007,safApp=safCkptService with create 
flags....

3)while running the demo application on Pl-3, I done failover.
root@mohan-VirtualBox:~# /etc/init.d/opensafd stop
[ ok ] Stopping opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# /etc/init.d/opensafd start
[ ok ] Starting opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# amf-state siass
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3, safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4, safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1, safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=STANDBY(2)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)


4)I observed that ckptd is not crashed on active controller.


Sep  3 12:11:23 mohan-VirtualBox opensafd: OpenSAF services successfully stopped
Sep  3 12:11:23 mohan-VirtualBox opensafd[3193]: Stopping OpenSAF Services:  *
Sep  3 12:11:23 mohan-VirtualBox systemd[1]: Stopped OpenSAF daemon.
Sep  3 12:11:28 mohan-VirtualBox dhclient[3176]: DHCPDISCOVER on enp0s8 to 
255.255.255.255 port 67 interval 14 (xid=0xdde9787c)
Sep  3 12:11:29 mohan-VirtualBox systemd[1]: Starting OpenSAF daemon...
Sep  3 12:11:29 mohan-VirtualBox opensafd: Starting OpenSAF Services(5.18.04 - 
c5117a898d331edb395434df56d630449a9ad7d2) (Using TIPC)
Sep  3 12:11:29 mohan-VirtualBox opensafd: Reboot file 
/var/log/opensaf/clm_cluster_reboot_in_progress not found, startup continue...
Sep  3 12:11:29 mohan-VirtualBox opensafd[3820]: logtrace: trace enabled to 
file 'opensafd.log', mask=0x0
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.109226] tipc: Activated 
(version 2.0.0)
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.109295] NET: Registered 
protocol family 30
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.109408] tipc: Started in single 
node mode
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.181816] Started in network mode
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.181820] Own node address 
<1.1.1>, network identity 4000
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.191090] Enabled bearer 
<eth:enp0s3>, discovery domain <1.1.0>, priority 10
Sep  3 12:11:30 mohan-VirtualBox osaftransportd[3855]: Started
Sep  3 12:11:30 mohan-VirtualBox opensafd[3820]: NO Monitoring of TRANSPORT 
started
Sep  3 12:11:30 mohan-VirtualBox osafclmna[3860]: Started
Sep  3 12:11:30 mohan-VirtualBox opensafd[3820]: NO Monitoring of CLMNA started
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: Started
Sep  3 12:11:31 mohan-VirtualBox opensafd[3820]: NO Monitoring of RDE started
Sep  3 12:11:31 mohan-VirtualBox osafclmna[3860]: NO Starting to promote this 
node to a system controller
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: NO Requesting ACTIVE role
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: NO RDE role set to Undefined
Sep  3 12:11:31 mohan-VirtualBox osaffmd[3880]: Started
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: NO Peer up on node 0x2020f
Sep  3 12:11:31 mohan-VirtualBox osafclmna[3860]: NO 
safNode=SC-1,safCluster=myClmCluster Joined cluster, nodeid=2010f
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: NO Got peer info response from 
node 0x2020f with role ACTIVE
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: NO RDE role set to QUIESCED
Sep  3 12:11:31 mohan-VirtualBox osafrded[3870]: NO Giving up election against 
0x2020f with role ACTIVE. My role is now QUIESCED
Sep  3 12:11:31 mohan-VirtualBox opensafd[3820]: NO Monitoring of HLFM started
Sep  3 12:11:31 mohan-VirtualBox osafimmd[3891]: Started
Sep  3 12:11:31 mohan-VirtualBox opensafd[3820]: NO Monitoring of IMMD started
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: Started
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Use default reserved class 
names.
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Fevs count adjusted to 
32402 preLoadPid: 0
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Sync client discarded 
classimplementer set. Impl-id:27 Class:SaSmfCampaign
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Sync client discarded 
classimplementer set. Impl-id:27 Class:OpenSafSmfConfig
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Sync client discarded 
classimplementer set. Impl-id:27 Class:SaSmfSwBundle
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO Sync client discarded 
classimplementer set. Impl-id:27 Class:OpenSafSmfExecControl
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO NODE STATE-> 
IMM_NODE_ISOLATED
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <warn>  [1535956892.3765] 
dhcp4 (enp0s8): request timed out
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <warn>  [1535956892.3765] 
dhcp4 (enp0s8): request timed out
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <info>  [1535956892.3766] 
dhcp4 (enp0s8): state changed unknown -> timeout
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <info>  [1535956892.3780] 
dhcp4 (enp0s8): canceled DHCP transaction, DHCP client pid 3176
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <info>  [1535956892.3780] 
dhcp4 (enp0s8): state changed timeout -> done
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <info>  [1535956892.3799] 
device (enp0s8): state change: ip-config -> failed (reason 
'ip-config-unavailable') [70 120 5]
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <info>  [1535956892.3818] 
policy: disabling autoconnect for connection 'Wired connection 2'.
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <warn>  [1535956892.3820] 
device (enp0s8): Activation: failed for connection 'Wired connection 2'
Sep  3 12:11:32 mohan-VirtualBox NetworkManager[706]: <info>  [1535956892.3961] 
device (enp0s8): state change: failed -> disconnected (reason 'none') [120 30 0]
Sep  3 12:11:32 mohan-VirtualBox avahi-daemon[635]: Withdrawing address record 
for fe80::1568:1729:bf0a:6028 on enp0s8.
Sep  3 12:11:32 mohan-VirtualBox avahi-daemon[635]: Leaving mDNS multicast 
group on interface enp0s8.IPv6 with address fe80::1568:1729:bf0a:6028.
Sep  3 12:11:32 mohan-VirtualBox avahi-daemon[635]: Interface enp0s8.IPv6 no 
longer relevant for mDNS.
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO NODE STATE-> 
IMM_NODE_W_AVAILABLE
Sep  3 12:11:32 mohan-VirtualBox osafimmnd[3903]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Sep  3 12:11:43 mohan-VirtualBox osafimmnd[3903]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 2799
Sep  3 12:11:43 mohan-VirtualBox osafimmnd[3903]: NO RepositoryInitModeT is 
SA_IMM_KEEP_REPOSITORY
Sep  3 12:11:43 mohan-VirtualBox osafimmnd[3903]: WA IMM Access Control mode is 
DISABLED!
Sep  3 12:11:43 mohan-VirtualBox osafimmnd[3903]: NO Epoch set to 154 in 
ImmModel
Sep  3 12:11:43 mohan-VirtualBox osafimmnd[3903]: NO SERVER STATE: 
IMM_SERVER_SYNC_CLIENT --> IMM_SERVER_READY
Sep  3 12:11:43 mohan-VirtualBox osafimmnd[3903]: NO ImmModel received 
scAbsenceAllowed 0
Sep  3 12:11:43 mohan-VirtualBox opensafd[3820]: NO Monitoring of IMMND started
Sep  3 12:11:43 mohan-VirtualBox osaflogd[3918]: Started
Sep  3 12:11:43 mohan-VirtualBox opensafd[3820]: NO Monitoring of LOGD started
Sep  3 12:11:43 mohan-VirtualBox osafntfd[3929]: Started
Sep  3 12:11:43 mohan-VirtualBox opensafd[3820]: NO Monitoring of NTFD started
Sep  3 12:11:43 mohan-VirtualBox osafclmd[3940]: Started
Sep  3 12:11:43 mohan-VirtualBox opensafd[3820]: NO Monitoring of CLMD started
Sep  3 12:11:43 mohan-VirtualBox osafamfd[3951]: Started
Sep  3 12:11:43 mohan-VirtualBox opensafd[3820]: NO Monitoring of AMFD started
Sep  3 12:11:43 mohan-VirtualBox osafamfnd[3962]: Started
Sep  3 12:11:43 mohan-VirtualBox osafamfnd[3962]: NO Start monitoring AMFD 
using /var/lib/opensaf/osafamfd.fifo
Sep  3 12:11:43 mohan-VirtualBox osafamfnd[3962]: NO Sending node up due to 
NCSMDS_UP
Sep  3 12:11:44 mohan-VirtualBox osafamfnd[3962]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Presence State UNINSTANTIATED => 
INSTANTIATING
Sep  3 12:11:44 mohan-VirtualBox osafamfnd[3962]: NO 
'safSu=SC-1,safSg=2N,safApp=OpenSAF' Presence State UNINSTANTIATED => 
INSTANTIATING
Sep  3 12:11:44 mohan-VirtualBox osafsmfd[3980]: Started
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: Started
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: NO MDS initialize_smfnd: 
smfnd_mds_init()
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: NO MDS smfnd_mds_init: 
mds_get_handle()
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: NO MDS mds_get_handle: Done
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: NO MDS smfnd_mds_init: 
mds_register()
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: NO MDS mds_svc_event: 
NCSMDS_SVC_ID_SMFD dest = 0xf
Sep  3 12:11:44 mohan-VirtualBox osafsmfnd[3982]: NO MDS smfnd_mds_init: Done
Sep  3 12:11:44 mohan-VirtualBox osaflcknd[4030]: Started
Sep  3 12:11:44 mohan-VirtualBox osafmsgd[4031]: Started
Sep  3 12:11:44 mohan-VirtualBox osafckptnd[4057]: Started
Sep  3 12:11:44 mohan-VirtualBox osaflckd[4083]: Started
Sep  3 12:11:45 mohan-VirtualBox osafevtd[4106]: Started
Sep  3 12:11:45 mohan-VirtualBox osafamfnd[3962]: NO 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => 
INSTANTIATED
Sep  3 12:11:45 mohan-VirtualBox osafimmnd[3903]: NO Implementer connected: 29 
(MsgQueueService131343) <134, 2010f>
Sep  3 12:11:45 mohan-VirtualBox osafamfnd[3962]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep  3 12:11:45 mohan-VirtualBox osafrded[3870]: NO RDE role set to STANDBY
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 24 
(change:3, dest:13)
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 24 
(change:5, dest:13)
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 24 
(change:5, dest:13)
Sep  3 12:11:45 mohan-VirtualBox osafrded[3870]: NO Peer up on node 0x2020f
Sep  3 12:11:45 mohan-VirtualBox osafrded[3870]: NO Got peer info response from 
node 0x2020f with role ACTIVE
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 25 
(change:3, dest:564116203505534)
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 25 
(change:3, dest:565215709162246)
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 25 
(change:3, dest:567412432561139)
Sep  3 12:11:45 mohan-VirtualBox osafimmd[3891]: NO MDS event from svc_id 25 
(change:3, dest:566315084623799)
Sep  3 12:11:45 mohan-VirtualBox osaflogd[3918]: NO LOGSV_DATA_GROUPNAME not 
found
Sep  3 12:11:45 mohan-VirtualBox osaflogd[3918]: NO LOG root directory is: 
"/var/log/opensaf/saflog"
Sep  3 12:11:45 mohan-VirtualBox osaflogd[3918]: NO LOG data group is: ""
Sep  3 12:11:45 mohan-VirtualBox osaflogd[3918]: NO LGS_MBCSV_VERSION = 7
Sep  3 12:11:45 mohan-VirtualBox osafimmnd[3903]: NO Implementer (applier) 
connected: 30 (@safAmfService2010f) <142, 2010f>
Sep  3 12:11:45 mohan-VirtualBox osafamfnd[3962]: NO Assigning 
'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
Sep  3 12:11:45 mohan-VirtualBox osafamfnd[3962]: NO Assigned 
'safSi=NoRed2,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS 
initialize_for_assignment: smfd_mds_init()
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS smfd_mds_init: 
mds_vdest_create()
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS mds_vdest_create: VDEST 
created
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS smfd_mds_init: 
mds_register()
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS mds_register: mds 
registration is done
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS smfd_mds_init: 
smfd_mds_change_role()
Sep  3 12:11:46 mohan-VirtualBox opensafd[3798]: Starting OpenSAF Services 
(Using TIPC): *
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS smfd_mds_change_role: 
Setting; arg.info.vdest_chg_role.i_vdest = 0xf, ncsvda_api() rc = 1
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO initialize_for_assignment: 
smfd_mds_init() Done
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS amf_csi_set_callback: 
smfd_mds_change_role()
Sep  3 12:11:46 mohan-VirtualBox systemd[1]: Started OpenSAF daemon.
Sep  3 12:11:46 mohan-VirtualBox osafsmfd[3980]: NO MDS smfd_mds_change_role: 
Setting; arg.info.vdest_chg_role.i_vdest = 0xf, ncsvda_api() rc = 1
Sep  3 12:11:46 mohan-VirtualBox osafamfnd[3962]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Sep  3 12:11:46 mohan-VirtualBox opensafd: OpenSAF(5.18.04 - 
c5117a898d331edb395434df56d630449a9ad7d2) services successfully started
Sep  3 12:11:48 mohan-VirtualBox osafamfd[3951]: NO Cold sync complete!
Sep  3 12:14:31 mohan-VirtualBox dhc

5)I did multiple failovers, but didn't observe the crash.

Because, The test reproduction step is not clear and ckptd trace is not 
enabled, so i cant reproduce it and debug it further.
I tried to reproduce with these steps as close as possible but failed to 
reproduce it.
I am closing it as of now, please reopen it with reproduceable steps and ckptd 
traces if possible.                                    


---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** accepted
**Milestone:** 5.18.09
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Fri Aug 31, 2018 02:46 PM UTC
**Owner:** Mohan  Kanakam
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x00007fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x00007fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7ffffd22531c "\017\001\002") at patricia.c:435

2-  0x000000000040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7ffffd225350) at cpd_db.c:706

3-  0x000000000040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x00000000004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x000000000041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x0000000000411b89 in main (argc=1, argv=0x7ffffd225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to