[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2018-09-03 Thread Mohan Kanakam via Opensaf-tickets
- **status**: accepted --> not-reproducible
- **Milestone**: 5.18.09 --> never



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** not-reproducible
**Milestone:** never
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Mon Sep 03, 2018 07:18 AM UTC
**Owner:** Mohan  Kanakam
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2018-09-03 Thread Mohan Kanakam via Opensaf-tickets
I tried to reproduce this ticket on opensaf 5.18.04.
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )
1)In this setup, SC-1 is Active and SC-2 is standby
root@mohan-VirtualBox:~# amf-state siass
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,


safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)

2)I ran the demo application, which creates checkpoints continuously on PL-3
root@mohan-VirtualBox:/home/mohan/ticket2011# ./tkt
***
Demonstrating Checkpoint Service
***
Initialising With Checkpoint Service
 saCkptInitialize passed
Opening  Checkpoint = safCkpt=DemoCkpt000,safApp=safCkptService with create 
flags
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt001,safApp=safCkptService with create 
flags
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt002,safApp=safCkptService with create 
flags
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt003,safApp=safCkptService with create 
flags
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt004,safApp=safCkptService with create 
flags
 saCkptCheckpointOpen passed
Opening  Checkpoint = safCkpt=DemoCkpt005,safApp=safCkptService with create 
flags
saCkptCheckpointOpen failed 6
Opening  Checkpoint = safCkpt=DemoCkpt006,safApp=safCkptService with create 
flags
saCkptCheckpointOpen failed 6
Opening  Checkpoint = safCkpt=DemoCkpt007,safApp=safCkptService with create 
flags

3)while running the demo application on Pl-3, I done failover.
root@mohan-VirtualBox:~# /etc/init.d/opensafd stop
[ ok ] Stopping opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# /etc/init.d/opensafd start
[ ok ] Starting opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# amf-state siass
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed3, safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4, safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1, safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)


4)I observed that ckptd is not crashed on active controller.


Sep  3 12:11:23 mohan-VirtualBox opensafd: OpenSAF services successfully stopped
Sep  3 12:11:23 mohan-VirtualBox opensafd[3193]: Stopping OpenSAF Services:  *
Sep  3 12:11:23 mohan-VirtualBox systemd[1]: Stopped OpenSAF daemon.
Sep  3 12:11:28 mohan-VirtualBox dhclient[3176]: DHCPDISCOVER on enp0s8 to 
255.255.255.255 port 67 interval 14 (xid=0xdde9787c)
Sep  3 12:11:29 mohan-VirtualBox systemd[1]: Starting OpenSAF daemon...
Sep  3 12:11:29 mohan-VirtualBox opensafd: Starting OpenSAF Services(5.18.04 - 
c5117a898d331edb395434df56d630449a9ad7d2) (Using TIPC)
Sep  3 12:11:29 mohan-VirtualBox opensafd: Reboot file 
/var/log/opensaf/clm_cluster_reboot_in_progress not found, startup continue...
Sep  3 12:11:29 mohan-VirtualBox opensafd[3820]: logtrace: trace enabled to 
file 'opensafd.log', mask=0x0
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.109226] tipc: Activated 
(version 2.0.0)
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.109295] NET: Registered 
protocol family 30
Sep  3 12:11:30 mohan-VirtualBox kernel: [ 2141.109408] tipc: Started in 

[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2018-08-31 Thread Mohan Kanakam via Opensaf-tickets
- **assigned_to**: Mohan  Kanakam
- **Milestone**: future --> 5.18.09



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** accepted
**Milestone:** 5.18.09
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Mon Aug 28, 2017 08:45 AM UTC
**Owner:** Mohan  Kanakam
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2017-08-28 Thread A V Mahesh (AVM) via Opensaf-tickets
- **assigned_to**: A V Mahesh (AVM) -->  nobody 
- **Blocker**:  --> False
- **Milestone**: 5.17.08 --> future



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** accepted
**Milestone:** future
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Mon Apr 10, 2017 01:40 PM UTC
**Owner:** nobody
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2016-09-20 Thread Anders Widell
- **Milestone**: 4.7.2 --> 5.0.2



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** accepted
**Milestone:** 5.0.2
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Mon Sep 12, 2016 10:06 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2016-09-12 Thread A V Mahesh (AVM)
- **status**: unassigned --> accepted
- **assigned_to**: A V Mahesh (AVM)



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 08, 2016 07:28 AM UTC
**Owner:** A V Mahesh (AVM)
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2011 ckptd seg faulted on active controller when trying to create checkpoint

2016-09-08 Thread Ritu Raj



---

** [tickets:#2011] ckptd seg faulted on active controller when trying to create 
checkpoint**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Thu Sep 08, 2016 07:28 AM UTC by Ritu Raj
**Last Updated:** Thu Sep 08, 2016 07:28 AM UTC
**Owner:** nobody
**Attachments:**

- 
[ckptd_bt](https://sourceforge.net/p/opensaf/tickets/2011/attachment/ckptd_bt) 
(2.6 kB; application/octet-stream)
- 
[messages-20160907.bz2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/messages-20160907.bz2)
 (380.1 kB; application/x-bzip)
- [syslog2](https://sourceforge.net/p/opensaf/tickets/2011/attachment/syslog2) 
(1.4 MB; application/octet-stream)


Environment details

OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
1PBE enabled with 30K objects )

Summary :

ckptd crashed on active controller when trying to create checkpoint during 
failover

Steps followed & Observed behaviour

1. Initially ran some CKPT test scenarios, along with failovers. After the end 
of the test scenarios, The following IMM objects &  replicas are not deleted 
sofo-s3:/dev/shm # immfind | grep 101
safCkpt=all_replicas_ckpt_name_101
safCkpt=collocated_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=collocated_ckpt_name_101
safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101
safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=all_replicas_ckpt_name_101

2.  When ckpt is created with the earlier name (all_replicas_ckpt_name_101)  
observed the following error in syslog. Also CkptOpen failed with ERR_LIBRARY.

>>   saImmOiRtObjectCreate_2 failed with error = 14
>>
Sep  7 17:21:11 sofo-s2 osafimmnd[2137]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create_runtime_ckpt_object - 
saImmOiRtObjectCreate_2 failed with error = 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER create runtime ckpt object failed 
with error: 14
Sep  7 17:21:12 sofo-s2 osafckptd[2284]: ER cpd db add ckpt_node failed for 
ckpt_id:2


4. After some time cpktd seg faulted on active controller
>>
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: NO 
'safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: ER 
safComp=CPD,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Sep  7 17:21:43 sofo-s2 osafamfnd[2187]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Sep  7 17:21:43 sofo-s2 opensaf_reboot: Rebooting local node; timeout=60

5. Below is the bt

0-  0x7fbbd5ffcb20 in memcmp () from /lib64/libc.so.6
1-  0x7fbbd7a10929 in ncs_patricia_tree_get (pTree=0x67b4c8, 
pKey=0x7d22531c "\017\001\002") at patricia.c:435

2-  0x0040800d in cpd_cpnd_info_node_get (cpnd_tree=0x67b4c8, 
dest=0x67ec60, cpnd_info_node=0x7d225350) at cpd_db.c:706

3-  0x0040cd56 in cpd_evt_proc_mds_evt (cb=0x67b340, evt=0x67ec50) at 
cpd_evt.c:1378

4-  0x004091cb in cpd_process_evt (evt=0x67ec40) at cpd_evt.c:107
5-  0x0041185f in cpd_main_process (cb=0x67b340) at cpd_init.c:661
6 - 0x00411b89 in main (argc=1, argv=0x7d225578) at cpd_main.c:74


Notes:
1. Syslog attached
2. bt attached 
3. ckptd traces not enabled


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets