[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
- **status**: accepted --> not-reproducible --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** not-reproducible **Milestone:** 5.2.RC1 **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Fri Mar 10, 2017 10:23 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850, i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665 2 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 3 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 4 0x in ?? () Thread 3 (Thread 0x7fa7b700 (LWP 5586)): 0 0x7fa7c6804294 in __lll_lock_wait () from /lib64/libpthread.so.0 1 0x7fa7c67ff619 in _L_lock_1008 () from /lib64/libpthread.so.0
[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
Thanks for the information Chani. Please repoen the ticket with Amf and Imm traces if it gets reproduced later on also. --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** accepted **Milestone:** 5.2.RC1 **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Fri Mar 10, 2017 09:51 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850, i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665 2 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 3 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 4 0x in ?? () Thread 3 (Thread 0x7fa7b700 (LWP 5586)): 0 0x7fa7c6804294 in __lll_lock_wait () from /lib64/libpthread.so.0
[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
Not able to reproduce it with latest changeset 8634 ( 5.2.FC) --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** accepted **Milestone:** 5.2.RC1 **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Wed Mar 01, 2017 06:17 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850, i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665 2 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 3 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 4 0x in ?? () Thread 3 (Thread 0x7fa7b700 (LWP 5586)): 0 0x7fa7c6804294 in __lll_lock_wait () from /lib64/libpthread.so.0 1 0x7fa7c67ff619 in _L_lock_1008 () from
[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
- **status**: unassigned --> accepted - **assigned_to**: Nagendra Kumar --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** accepted **Milestone:** 5.2.RC1 **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Wed Mar 01, 2017 06:17 AM UTC **Owner:** Nagendra Kumar **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850, i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665 2 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 3 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 4 0x in ?? () Thread 3 (Thread 0x7fa7b700 (LWP 5586)): 0 0x7fa7c6804294 in __lll_lock_wait () from /lib64/libpthread.so.0 1 0x7fa7c67ff619 in _L_lock_1008 () from
[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
This might have been fixed by: changeset: 8365:1108027c16f0 branch: opensaf-5.1.x parent: 8362:13742b479d92 user:Praveen Malviyadate:Fri Nov 25 15:56:05 2016 +0530 summary: amfd: do not spawn multiple threads for imm init[#2188] V2 Please retest on 5.2 FC release. --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** unassigned **Milestone:** 5.2.RC1 **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Fri Oct 07, 2016 02:42 PM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850 , i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665 2 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 3
[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
- Description has changed: Diff: --- old +++ new @@ -1,5 +1,5 @@ OS : Suse 64bit -Changeset : 8190 ( 5.1.FC) +Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Fri Oct 07, 2016 01:29 PM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850, i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665 2 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 3 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 4 0x in ?? () Thread 3 (Thread 0x7fa7b700 (LWP
[tickets] [opensaf:tickets] #2101 AMF: Heartbeat timeout observed after ImmNd restart
- Description has changed: Diff: --- old +++ new @@ -2,6 +2,10 @@ Changeset : 8190 ( 5.1.FC) Setup : 4 physical nodee 1 PBE enabled with 1Lakh load +Step +1. Bringu opensaf on four nodes +2. Imm test cases running with ndrestart scenario on standby controller +3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 --- ** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava **Last Updated:** Fri Oct 07, 2016 01:26 PM UTC **Owner:** nobody **Attachments:** - [messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages) (777.0 kB; application/octet-stream) - [osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd) (5.6 MB; application/octet-stream) OS : Suse 64bit Changeset : 8190 ( 5.1.FC) Setup : 4 physical nodee 1 PBE enabled with 1Lakh load Step 1. Bringu opensaf on four nodes 2. Imm test cases running with ndrestart scenario on standby controller 3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the backtrace. Coredump: 0 0x7fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 1 0x0044da7d in avd_imm_reinit_bg () at imm.cc:1949 2 0x00453e33 in main_loop () at main.cc:737 3 0x004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848 (gdb) thread apply all bt Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1, i_timeout=3) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=3) at osaf_poll.c:128 4 0x7fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2", size=64) at rda_papi.cc:673 5 0x7fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at rda_papi.cc:150 6 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 7 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 8 0x in ?? () Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)): 0 0x7fa7c57124f6 in poll () from /lib64/libc.so.6 1 0x7fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105 2 0x7fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1, i_timeout=1) at osaf_poll.c:44 3 0x7fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=1) at osaf_poll.c:128 4 0x7fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at mds_c_sndrcv.c:2570 5 0x7fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071, fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25, req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM) at mds_c_sndrcv.c:2457 6 0x7fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:690 7 0x7fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390 8 0x7fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at mds_papi.c:191 9 0x7fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071, destination=0x7fa7c707a850, i_evt=0x7fa7c4e0a0c0, o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604 10 0x7fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at imma_om_api.c:7584 11 0x7fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444 12 0x004fb140 in immutil_saImmOmSearchNext_2 (searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at immutil.c:1818 13 0x00431bc2 in avd_compcstype_config_get (name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at compcstype.cc:306 14 0x00429c5c in avd_comp_config_get (su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756 15 0x004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF", sg=0x7bb4c0) at su.cc:717 16 0x0048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF", app=0x7c08b0) at sg.cc:457 17 0x0040a88a in avd_app_config_get () at app.cc:460 18 0x0044c154 in avd_imm_config_get () at imm.cc:1574 19 0x0044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1891 20 0x7fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0 21 0x7fa7c571b9cd in clone () from /lib64/libc.so.6 22 0x in ?? () Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)): 0 0x7fa7c57124f6 in poll () from