Not able to reproduce it with latest changeset 8634 ( 5.2.FC)
---
** [tickets:#2101] AMF: Heartbeat timeout observed after ImmNd restart**
**Status:** accepted
**Milestone:** 5.2.RC1
**Created:** Fri Oct 07, 2016 01:26 PM UTC by Chani Srivastava
**Last Updated:** Wed Mar 01, 2017 06:17 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**
-
[messages](https://sourceforge.net/p/opensaf/tickets/2101/attachment/messages)
(777.0 kB; application/octet-stream)
-
[osafamfd](https://sourceforge.net/p/opensaf/tickets/2101/attachment/osafamfd)
(5.6 MB; application/octet-stream)
OS : Suse 64bit
Changeset : 8190
Setup : 4 physical nodee 1 PBE enabled with 1Lakh load
Step
1. Bringu opensaf on four nodes
2. Imm test cases running with ndrestart scenario on standby controller
3. After 1 of the ndrestart amfd heart beat timeout happened. Below is the
backtrace.
Coredump:
0 0x00007fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
1 0x000000000044da7d in avd_imm_reinit_bg () at imm.cc:1949
2 0x0000000000453e33 in main_loop () at main.cc:737
3 0x00000000004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:848
(gdb) thread apply all bt
Thread 6 (Thread 0x7fa7c78fcb00 (LWP 3837)):
0 0x00007fa7c57124f6 in poll () from /lib64/libc.so.6
1 0x00007fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c78fc1c0, i_nfds=1,
i_timeout_ts=0x7fa7c78fc180, i_sigmask=0x0) at osaf_poll.c:105
2 0x00007fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c78fc1c0, i_nfds=1,
i_timeout=30000) at osaf_poll.c:44
3 0x00007fa7c74f81c8 in osaf_poll_one_fd (i_fd=13, i_timeout=30000) at
osaf_poll.c:128
4 0x00007fa7c65f3a0c in rda_read_msg (sockfd=13, msg=0x7fa7c78fc260 "10 2",
size=64) at rda_papi.cc:673
5 0x00007fa7c65f31ec in rda_callback_task (rda_callback_cb=0x7963e0) at
rda_papi.cc:150
6 0x00007fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0
7 0x00007fa7c571b9cd in clone () from /lib64/libc.so.6
8 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7fa7c4e0b700 (LWP 5557)):
0 0x00007fa7c57124f6 in poll () from /lib64/libc.so.6
1 0x00007fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c4e09d40, i_nfds=1,
i_timeout_ts=0x7fa7c4e09d00, i_sigmask=0x0) at osaf_poll.c:105
2 0x00007fa7c74f7ff9 in osaf_poll (io_fds=0x7fa7c4e09d40, i_nfds=1,
i_timeout=10000) at osaf_poll.c:44
3 0x00007fa7c74f81c8 in osaf_poll_one_fd (i_fd=29, i_timeout=10000) at
osaf_poll.c:128
4 0x00007fa7c7535605 in mds_mcm_time_wait (sel_obj=0x80c3d8, time_val=1000) at
mds_c_sndrcv.c:2570
5 0x00007fa7c75351b1 in mcm_pvt_normal_svc_sndrsp (env_hdl=131071,
fr_svc_id=26, msg=0x7fa7c4e0a0c0, to_dest=565214705385538, to_svc_id=25,
req=0x7fa7c4e09e60, pri=MDS_SEND_PRIORITY_MEDIUM)
at mds_c_sndrcv.c:2457
6 0x00007fa7c7530d87 in mds_mcm_send (info=0x7fa7c4e09f90) at
mds_c_sndrcv.c:690
7 0x00007fa7c7530170 in mds_send (info=0x7fa7c4e09f90) at mds_c_sndrcv.c:390
8 0x00007fa7c752fde1 in ncsmds_api (svc_to_mds_info=0x7fa7c4e09f90) at
mds_papi.c:191
9 0x00007fa7c6e527a7 in imma_mds_msg_sync_send (imma_mds_hdl=131071,
destination=0x7fa7c707a850 <imma_cb+144>, i_evt=0x7fa7c4e0a0c0,
o_evt=0x7fa7c4e0a058, timeout=1000) at imma_mds.c:604
10 0x00007fa7c6e47bd6 in search_next_common (searchHandle=1475865130618188739,
objectName=0x7fa7c4e0a298, attributes=0x7fa7c4e0a318, bUseString=false) at
imma_om_api.c:7584
11 0x00007fa7c6e475f9 in saImmOmSearchNext_2 (searchHandle=1475865130618188739,
objectName=0x7fa7c4e0a370, attributes=0x7fa7c4e0a318) at imma_om_api.c:7444
12 0x00000000004fb140 in immutil_saImmOmSearchNext_2
(searchHandle=1475865130618188739, objectName=0x7fa7c4e0a370,
attributes=0x7fa7c4e0a318) at immutil.c:1818
13 0x0000000000431bc2 in avd_compcstype_config_get
(name="safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF", comp=0x806e50) at
compcstype.cc:306
14 0x0000000000429c5c in avd_comp_config_get
(su_name="safSu=SC-2,safSg=2N,safApp=OpenSAF", su=0x7b08c0) at comp.cc:756
15 0x00000000004d9f63 in avd_su_config_get (sg_name="safSg=2N,safApp=OpenSAF",
sg=0x7bb4c0) at su.cc:717
16 0x000000000048c7dc in avd_sg_config_get (app_dn="safApp=OpenSAF",
app=0x7c08b0) at sg.cc:457
17 0x000000000040a88a in avd_app_config_get () at app.cc:460
18 0x000000000044c154 in avd_imm_config_get () at imm.cc:1574
19 0x000000000044d6e6 in avd_imm_reinit_bg_thread (_cb=0x75dba0
<_control_block>) at imm.cc:1891
20 0x00007fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0
21 0x00007fa7c571b9cd in clone () from /lib64/libc.so.6
22 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7fa7c791cb00 (LWP 3836)):
0 0x00007fa7c57124f6 in poll () from /lib64/libc.so.6
1 0x00007fa7c7549696 in mdtm_process_recv_events () at mds_dt_tipc.c:665
2 0x00007fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0
3 0x00007fa7c571b9cd in clone () from /lib64/libc.so.6
4 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7fa7bffff700 (LWP 5586)):
0 0x00007fa7c6804294 in __lll_lock_wait () from /lib64/libpthread.so.0
1 0x00007fa7c67ff619 in _L_lock_1008 () from /lib64/libpthread.so.0
2 0x00007fa7c67ff42e in pthread_mutex_lock () from /lib64/libpthread.so.0
3 0x000000000044d358 in osaf_mutex_lock_ordie (io_mutex=0x75d580
<imm_reinit_mutex>) at
../../../../../osaf/libs/core/common/include/osaf_utility.h:72
4 avd_imm_reinit_bg_thread (_cb=0x75dba0 <_control_block>) at imm.cc:1838
5 0x00007fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0
6 0x00007fa7c571b9cd in clone () from /lib64/libc.so.6
7 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
Thread 2 (Thread 0x7fa7c798bb00 (LWP 3835)):
0 0x00007fa7c57124f6 in poll () from /lib64/libc.so.6
1 0x00007fa7c74f80d3 in osaf_ppoll (io_fds=0x7fa7c798b270, i_nfds=1,
i_timeout_ts=0x7fa7c798b2a0, i_sigmask=0x0) at osaf_poll.c:105
2 0x00007fa7c7502627 in ncs_tmr_wait () at sysf_tmr.c:409
3 0x00007fa7c67fd7b6 in start_thread () from /lib64/libpthread.so.0
4 0x00007fa7c571b9cd in clone () from /lib64/libc.so.6
5 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fa7c7954720 (LWP 3833)):
0 0x00007fa7c680161c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
1 0x000000000044da7d in avd_imm_reinit_bg () at imm.cc:1949
2 0x0000000000453e33 in main_loop () at main.cc:737
3 0x00000000004541ff in main (argc=2, argv=0x7fff1cbfa268) at main.cc:84
*Syslog:*
Oct 8 00:02:10 SLES-SLOT2 sudo: tet : TTY=unknown ; PWD=/tmp/05420aa ;
USER=root ; COMMAND=/bin/kill -9 5691
Oct 8 00:02:10 SLES-SLOT2 osafimmd[3774]: NO MDS event from svc_id 25
(change:4, dest:565214705385538)
**Oct 8 00:02:10 SLES-SLOT2 osafamfnd[3843]: NO Restarting a component of
'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2)**
Oct 8 00:02:10 SLES-SLOT2 osafimmd[3774]: IN Standby IMMD recording IMMND DOWN
for node 2020f
Oct 8 00:02:10 SLES-SLOT2 osafamfnd[3843]: NO
'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown'
: Recovery is 'componentRestart'
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: Started
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: NO Persistent Back-End capability
configured, Pbe file:imm.db (suffix may get added)
Oct 8 00:02:10 SLES-SLOT2 osafimmd[3774]: NO MDS event from svc_id 25
(change:3, dest:565214705262608)
Oct 8 00:02:10 SLES-SLOT2 osafimmd[3774]: IN Added IMMND node with dest
565214705262608
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: NO IMMD service is UP ...
ScAbsenseAllowed?:0 introduced?:0
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: NO SERVER STATE:
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: NO SERVER STATE:
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: NO SERVER STATE:
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Oct 8 00:02:10 SLES-SLOT2 osafimmnd[5717]: NO NODE STATE-> IMM_NODE_ISOLATED
Oct 8 00:02:12 SLES-SLOT2 osafimmd[3774]: NO SBY: Ruling epoch noted as:44
Oct 8 00:02:12 SLES-SLOT2 osafimmd[3774]: NO IMMND coord at 2010f
Oct 8 00:02:12 SLES-SLOT2 osafimmnd[5717]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Oct 8 00:02:12 SLES-SLOT2 osafimmnd[5717]: NO SERVER STATE:
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
**Oct 8 00:02:18 SLES-SLOT2 osafamfnd[3843]: ER AMF director heart beat
timeout, generating core for amfd
Oct 8 00:02:19 SLES-SLOT2 osafamfnd[3843]: Rebooting OpenSAF NodeId = 131599
EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599,
SupervisionTime = 60**
Oct 8 00:02:19 SLES-SLOT2 opensaf_reboot: Rebooting local node; timeout=60
Oct 8 00:02:19 SLES-SLOT2 osafimmd[3774]: NO SBY: New Epoch for IMMND process
at node 2010f old epoch: 43 new epoch:44
Syslog and amd traces attached. Will share immnd traces and other controller
syslog on request as they are huge
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets