Hi Jianfeng, I have raised ticket #2468 for this issue. Please attach bt, logs and traces in the ticket.
Thanks, Praveen On 24-May-17 1:25 PM, Jianfeng Dong wrote: > Thanks Praveen, we tried but couldn't repro the issue, it should be hard > to reproduce it. > > According to the description from guys who found the issue, all boards > in the chassis were trying to reboot required by user command: > > Here is syslog when the issue occurred: > > 2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link > <1.1.16:eth2-1.1.5:bond0>, peer not responding > > 2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link > <1.1.16:eth2-1.1.5:bond0> on network plane A > > 2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with > <1.1.5> > > 2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from > svc_id 25 (change:4, dest:287038266327043) > > 2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went > down. Not sending track callback for agents on that node > > 2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard > node received for nodeId:1050f pid:15395 > > 2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer > disconnected 104 <0, 1050f(down)> (MsgQueueService66831) > > 2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' > left the cluster > > 2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI > state change notification from NTF, entity PLD0105 now has new state > DISABLED (Oper state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed) > > 2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: > dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed. > > 2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link > <1.1.16:eth2-1.1.6:bond0>, peer not responding > > 2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link > <1.1.16:eth2-1.1.6:bond0> on network plane A > > 2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with > <1.1.6> > > 2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from > svc_id 25 (change:4, dest:288139774320643) > > 2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went > down. Not sending track callback for agents on that node > > 2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went > down. Not sending track callback for agents on that node > > 2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard > node received for nodeId:1060f pid:17439 > > 2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer > disconnected 106 <0, 1060f(down)> (MsgQueueService67087) > > 2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director > unexpectedly crashed > > 2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF > NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD > down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0 > > 2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer > locally disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService) > > 2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer > disconnected 105 <29, 1100f> (safAmfService) > > 2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local > node; timeout=0 > > 2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) > local node > > And could you please do me a favor to open a ticket for the issue? I > just tried to register in sourceforge but failed, the registration page > always complain something “Form security missing”. > > Thanks, > > Jianfeng > > -----Original Message----- > From: praveen malviya [mailto:[email protected]] > Sent: Wednesday, May 24, 2017 1:40 PM > To: Jianfeng Dong <[email protected]>; [email protected] > Subject: Re: [users] osafamfd coredump issue > > Hi Jianfeng, > > Any steps to reproduce it? > > While AMFD is performing failover, it finds mismatch in assignment > counters and it asserted. > > Please share amfd traces if available and also raise a ticket with the same. > > Thanks > > Praveen > > On 23-May-17 3:41 PM, Jianfeng Dong wrote: > > > Hi, > > > > > > > > > > > > We also got a 'osafamfd' coredump in our controller board, could > please someone take a look at the issue? Thanks in advance. > > > > > > > > > > > > I listed the backtrace info here but not attach the coredump file(due > to email size limit), so please let me know if you need more information. > > > > > > > > > > > > > > > > > > root@scm1:/coredumps/# gdb /usr/lib64/opensaf/osafamfd > > > core.image\=26115.proc\=osafamfd.pid\=4277.signal\=6.time\=1493639577 > > > > > > GNU gdb (Wind River Linux Sourcery CodeBench 4.8-28) 7.6 > > > > > > Copyright (C) 2013 Free Software Foundation, Inc. > > > > > > License GPLv3+: GNU GPL version 3 or later > > > <https://urldefense.proofpoint.com/v2/url?u=http-3A__gnu.org_licenses_ > > > gpl.html&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk > > > 1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=i29npjWFiXQmxwH36EDTrR9FGoBD > > > UiNYwHVYDA9w-_M&s=YnuqkZRthOUoqXPc8jSZiuTM5L7kb24nNWEV6_8GrUY&e= > > > > > > > This is free software: you are free to change and redistribute it. > > > > > > There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > > > > > > and "show warranty" for details. > > > > > > This GDB was configured as "x86_64-wrs-linux-gnu". > > > > > > For bug reporting instructions, please see: > > > > > > <[email protected]<mailto:[email protected] > <mailto:[email protected]%3cmailto:[email protected]>>>... > > > > > > Reading symbols from /usr/lib64/opensaf/osafamfd...Reading symbols > from /usr/lib64/opensaf/.debug/osafamfd...done. > > > > > > done. > > > > > > [New LWP 4277] > > > > > > [New LWP 4279] > > > > > > [New LWP 4280] > > > > > > [New LWP 4282] > > > > > > > > > > > > warning: Could not load shared library symbols for linux-vdso.so.1. > > > > > > Do you need "set solib-search-path" or "set sysroot"? > > > > > > [Thread debugging using libthread_db enabled] > > > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > > > bCore was generated by `/usr/lib64/opensaf/osafamfd osafamfd'. > > > > > > Program terminated with signal 6, Aborted. > > > > > > #0 0x0000003d84a353e9 in __GI_raise (sig=sig@entry=6) at > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > > > > > > 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or > directory. > > > > > > (gdb) bt full > > > > > > #0 0x0000003d84a353e9 in __GI_raise (sig=sig@entry=6) at > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > > > > > > resultvar = 0 > > > > > > pid = 4277 > > > > > > selftid = 4277 > > > > > > #1 0x0000003d84a38508 in __GI_abort () at abort.c:89 > > > > > > save_stage = 2 > > > > > > act = {__sigaction_handler = {sa_handler = 0x51560d, > > > sa_sigaction = 0x51560d}, sa_mask = {__val = {2006, 5336880, 5335460, > > > 2130303778826, 5320117, 9977552, 264237561592, 140737298378800, > > > 264235064979, 17179869185, > > > > > > 18442240615826079272, 4294967296, 5873756416, 5321392, > > > 14559416, 140737298378864}}, sa_flags = -2052873586, sa_restorer = > > > 0x0} > > > > > > sigs = {__val = {32, 0 <repeats 15 times>}} > > > > > > #2 0x0000003d85a2110a in __osafassert_fail (__file=0x51560d "su.cc", > > > __line=2006, __func=0x516f30 <AVD_SU::dec_curr_act_si()::__FUNCTION__> > > > "dec_curr_act_si", __assertion=0x5169a4 "saAmfSUNumCurrActiveSIs > 0") > > > at sysf_def.c:281 > > > > > > No locals. > > > > > > #3 0x00000000004d907d in AVD_SU::dec_curr_act_si (this=0xde8390) at > > > su.cc:2006 > > > > > > __FUNCTION__ = "dec_curr_act_si" > > > > > > #4 0x00000000004c0301 in avd_susi_delete (cb=0x75a2e0 > > > <_control_block>, susi=0xd38320, ckpt=false) at siass.cc:554 > > > > > > i_su_si = 0xd38320 > > > > > > su = 0xde8390 > > > > > > __FUNCTION__ = "avd_susi_delete" > > > > > > p_su_si = 0x0 > > > > > > p_si_su = 0x0 > > > > > > #5 0x00000000004964e1 in SG_NORED::node_fail (this=0xd7e9a0, > > > cb=0x75a2e0 <_control_block>, su=0xde8390) at sg_nored_fsm.cc:781 > > > > > > l_si = 0x7ffff4ad31a0 > > > > > > old_state = SA_AMF_HA_QUIESCED > > > > > > su_node_ptr = 0x0 > > > > > > __FUNCTION__ = "node_fail" > > > > > > #6 0x00000000004b8c78 in avd_node_down_mw_susi_failover (cb=0x75a2e0 > > > <_control_block>, avnd=0x9e3bf0) at sgproc.cc:1983 > > > > > > i_su = @0xde84a0: 0xde8390 > > > > > > __for_range = @0x9e3eb8: {<std::_Vector_base<AVD_SU*, > > > std::allocator<AVD_SU*> >> = {_M_impl = {<std::allocator<AVD_SU*>> = > > > {<__gnu_cxx::new_allocator<AVD_SU*>> = {<No data fields>}, <No data > > > fields>}, _M_start = 0xde84a0, > > > > > > _M_finish = 0xde84a8, _M_end_of_storage = 0xde84a8}}, > > > <No data fields>} > > > > > > __for_begin = {_M_current = 0xde84a0} > > > > > > __for_end = {_M_current = 0xde84a8} > > > > > > __FUNCTION__ = "avd_node_down_mw_susi_failover" > > > > > > #7 0x000000000045eb75 in avd_node_failover (node=0x9e3bf0) at > > > ndproc.cc:1142 > > > > > > __FUNCTION__ = "avd_node_failover" > > > > > > #8 0x0000000000456fea in avd_mds_avnd_down_evh (cb=0x75a2e0 > > > <_control_block>, evt=0x7f5f78000ec0) at ndfsm.cc:684 > > > > > > node = 0x9e3bf0 > > > > > > __FUNCTION__ = "avd_mds_avnd_down_evh" > > > > > > #9 0x00000000004514f5 in process_event (cb_now=0x75a2e0 > > > <_control_block>, evt=0x7f5f78000ec0) at main.cc:775 > > > > > > __FUNCTION__ = "process_event" > > > > > > #10 0x0000000000451211 in main_loop () at main.cc:696 > > > > > > pollretval = 1 > > > > > > evt = 0x7f5f78000ec0 > > > > > > mbx_fd = {raise_obj = 10, rmv_obj = 11} > > > > > > polltmo = -1 > > > > > > term_fd = 22 > > > > > > __FUNCTION__ = "main_loop" > > > > > > cb = 0x75a2e0 <_control_block> > > > > > > error = SA_AIS_OK > > > > > > #11 0x000000000045178f in main (argc=2, argv=0x7ffff4ad33e8) at > > > main.cc:848 > > > > > > No locals. > > > > > > (gdb) > > > > > > > > > > > > > > > > > > Regards, > > > > > > Jianfeng > > > > > > > > > ---------------------------------------------------------------------- > > > -------- Check out the vibrant tech community on one of the world's > > > most engaging tech sites, Slashdot.org! > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot& > > > d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQt > > > YJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=i29npjWFiXQmxwH36EDTrR9FGoBDUiNYwHVYD > > > A9w-_M&s=Bq2ghAHZvB73HvRatGkk7vHW0gl6YS7Aya3phD8RAOw&e= > > > _______________________________________________ > > > Opensaf-users mailing list > > > [email protected] > <mailto:[email protected]> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge > > > .net_lists_listinfo_opensaf-2Dusers&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQc > > > xBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=i > > > 29npjWFiXQmxwH36EDTrR9FGoBDUiNYwHVYDA9w-_M&s=xnZZo5ISRUbYu7JEb1nFi9dWf > > > eUD4iC-_75QfR6SMaY&e= > > > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
