- **Milestone**: 4.7.2 --> future
- **Comment**:
As I understood When a collocated checkpoint replica is opened, and the
active replica has large numbers of sections (~200k) and each sections
size is approximately 2k , you are seeing the issue .
So for debugging/assing the issue just tune these below cpasv sync
timeout variables fist , if increasing these values resolves the issue, then we
can think of
alternates solution ,some thing like dynamically calculating the sync time
out time value
for the NON active collocated checkpoint replica opened
osaf/libs/common/cpsv/include/cpa_def.h:#define CPSV_WAIT_TIME 1400
/* MDS wait time in case of syncronous call */
osaf/libs/common/cpsv/include/cpnd_cb.h:#define CPSV_WAIT_TIME 1000
osaf/libs/common/cpsv/include/cpd_cb.h:#define CPSV_WAIT_TIME 1000
---
** [tickets:#1510] CKPT: cpnd crashes during checkpoint open timeout with large
sections**
**Status:** review
**Milestone:** future
**Created:** Thu Oct 01, 2015 04:14 PM UTC by Alex Jones
**Last Updated:** Wed May 04, 2016 07:08 PM UTC
**Owner:** Alex Jones
When opening a collocated checkpoint replica where the active has large numbers
of sections (~200k), the sync from the active can timeout with errorcode
SA_AIS_ERR_TRY_AGAIN. In this case the code deletes the memory for the node,
but does not delete the node from the db. When the checkpoint access is tried
again, the freed memory for the node is still in the db, and ckptnd crashes.
Valgrind analysis shows the following:
==53610== Thread 1:
==53610== Invalid read of size 4
==53610== at 0x4E4D7C4: ncs_patricia_tree_get (patricia.c:93)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de60 is 0 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 8
==53610== at 0x4E4D7C0: ncs_patricia_tree_get (patricia.c:90)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de70 is 16 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 8
==53610== at 0x4E4D7FB: ncs_patricia_tree_get (patricia.c:435)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de78 is 24 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 1
==53610== at 0x4C2D0B9: bcmp (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de80 is 32 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 1
==53610== at 0x4C2D0D0: bcmp (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x40D1A2: cpnd_process_evt (cpnd_evt.c:1957)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de81 is 33 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 4
==53610== at 0x4E4D7C4: ncs_patricia_tree_get (patricia.c:93)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de60 is 0 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 8
==53610== at 0x4E4D7C0: ncs_patricia_tree_get (patricia.c:90)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de70 is 16 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 8
==53610== at 0x4E4D7FB: ncs_patricia_tree_get (patricia.c:435)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de78 is 24 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 1
==53610== at 0x4C2D0B9: bcmp (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de80 is 32 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 1
==53610== at 0x4C2D0D0: bcmp (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x4E4D803: ncs_patricia_tree_get (patricia.c:435)
==53610== by 0x40400D: cpnd_ckpt_node_get (cpnd_db.c:42)
==53610== by 0x405872: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2602)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de81 is 33 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 4
==53610== at 0x418613: cpnd_ckpt_sec_get (cpnd_sec.cc:99)
==53610== by 0x418708: cpnd_ckpt_sec_find (cpnd_sec.cc:156)
==53610== by 0x405889: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2609)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687dfd8 is 376 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 8
==53610== at 0x418680: cpnd_ckpt_sec_get (cpnd_sec.cc:115)
==53610== by 0x418708: cpnd_ckpt_sec_find (cpnd_sec.cc:156)
==53610== by 0x405889: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2609)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687de80 is 32 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 1
==53610== at 0x40589E: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2616)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687e040 is 480 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
==53610== by 0x40D426: cpnd_process_evt (cpnd_evt.c:202)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610==
==53610== Invalid read of size 4
==53610== at 0x411D80: cpnd_ckpt_get_lck_sec_id (cpnd_proc.c:599)
==53610== by 0x4043C7: cpnd_ckpt_sec_add (cpnd_db.c:380)
==53610== by 0x40595E: cpnd_evt_proc_nd2nd_ckpt_sect_create (cpnd_evt.c:2617)
==53610== by 0x40D2B8: cpnd_process_evt (cpnd_evt.c:335)
==53610== by 0x40E9D6: cpnd_main_process (cpnd_init.c:568)
==53610== by 0x403882: main (cpnd_main.c:72)
==53610== Address 0x687dfa8 is 328 bytes inside a block of size 1,072 free'd
==53610== at 0x4C29D4E: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==53610== by 0x40A827: cpnd_evt_proc_ckpt_open (cpnd_evt.c:983)
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets