Hi Jianfeng,

I have raised ticket #2468 for this issue.
Please attach bt, logs and traces in the ticket.

Thanks,
Praveen

On 24-May-17 1:25 PM, Jianfeng Dong wrote:
> Thanks Praveen, we tried but couldn't repro the issue, it should be hard 
> to reproduce it.
> 
> According to the description from guys who found the issue, all boards 
> in the chassis were trying to reboot required by user command:
> 
> Here is syslog when the issue occurred:
> 
> 2017-05-01T07:52:57.714906-04:00 scm2 kernel: tipc: Resetting link 
> <1.1.16:eth2-1.1.5:bond0>, peer not responding
> 
> 2017-05-01T07:52:57.714935-04:00 scm2 kernel: tipc: Lost link 
> <1.1.16:eth2-1.1.5:bond0> on network plane A
> 
> 2017-05-01T07:52:57.714939-04:00 scm2 kernel: tipc: Lost contact with 
> <1.1.5>
> 
> 2017-05-01T07:52:57.716788-04:00 scm2 osafimmd[3009]: NO MDS event from 
> svc_id 25 (change:4, dest:287038266327043)
> 
> 2017-05-01T07:52:57.717304-04:00 scm2 osafclmd[4259]: NO Node 66831 went 
> down. Not sending track callback for agents on that node
> 
> 2017-05-01T07:52:57.719178-04:00 scm2 osafimmnd[3020]: NO Global discard 
> node received for nodeId:1050f pid:15395
> 
> 2017-05-01T07:52:57.719233-04:00 scm2 osafimmnd[3020]: NO Implementer 
> disconnected 104 <0, 1050f(down)> (MsgQueueService66831)
> 
> 2017-05-01T07:52:57.721345-04:00 scm2 osafamfd[4277]: NO Node 'PLD0105' 
> left the cluster
> 
> 2017-05-01T07:52:57.722778-04:00 scm2 log_demo[6160]: [0.I.Proc]: FYI 
> state change notification from NTF, entity PLD0105 now has new state 
> DISABLED (Oper state safAmfNode=PLD0105,safAmfCluster=myAmfCluster changed)
> 
> 2017-05-01T07:52:57.732796-04:00 scm2 osafamfd[4277]: su.cc:2006: 
> dec_curr_act_si: Assertion 'saAmfSUNumCurrActiveSIs > 0' failed.
> 
> 2017-05-01T07:52:57.778777-04:00 scm2 kernel: tipc: Resetting link 
> <1.1.16:eth2-1.1.6:bond0>, peer not responding
> 
> 2017-05-01T07:52:57.778827-04:00 scm2 kernel: tipc: Lost link 
> <1.1.16:eth2-1.1.6:bond0> on network plane A
> 
> 2017-05-01T07:52:57.778833-04:00 scm2 kernel: tipc: Lost contact with 
> <1.1.6>
> 
> 2017-05-01T07:52:57.777979-04:00 scm2 osafimmd[3009]: NO MDS event from 
> svc_id 25 (change:4, dest:288139774320643)
> 
> 2017-05-01T07:52:57.717343-04:00 scm2 osafclmd[4259]: NO Node 66831 went 
> down. Not sending track callback for agents on that node
> 
> 2017-05-01T07:52:57.779373-04:00 scm2 osafclmd[4259]: NO Node 67087 went 
> down. Not sending track callback for agents on that node
> 
> 2017-05-01T07:52:57.780552-04:00 scm2 osafimmnd[3020]: NO Global discard 
> node received for nodeId:1060f pid:17439
> 
> 2017-05-01T07:52:57.780607-04:00 scm2 osafimmnd[3020]: NO Implementer 
> disconnected 106 <0, 1060f(down)> (MsgQueueService67087)
> 
> 2017-05-01T07:52:57.810785-04:00 scm2 osafamfnd[5281]: WA AMF director 
> unexpectedly crashed
> 
> 2017-05-01T07:52:57.810839-04:00 scm2 osafamfnd[5281]: Rebooting OpenSAF 
> NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD 
> down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0
> 
> 2017-05-01T07:52:57.810978-04:00 scm2 osafimmnd[3020]: NO Implementer 
> locally disconnected. Marking it as doomed 105 <29, 1100f> (safAmfService)
> 
> 2017-05-01T07:52:57.812582-04:00 scm2 osafimmnd[3020]: NO Implementer 
> disconnected 105 <29, 1100f> (safAmfService)
> 
> 2017-05-01T07:52:57.950567-04:00 scm2 opensaf_reboot: Rebooting local 
> node; timeout=0
> 
> 2017-05-01T07:52:58.084968-04:00 scm2 atwdog[28335]: rebooting (-f) 
> local node
> 
> And could you please do me a favor to open a ticket for the issue? I 
> just tried to register in sourceforge but failed, the registration page 
> always complain something “Form security missing”.
> 
> Thanks,
> 
> Jianfeng
> 
> -----Original Message-----
> From: praveen malviya [mailto:[email protected]]
> Sent: Wednesday, May 24, 2017 1:40 PM
> To: Jianfeng Dong <[email protected]>; [email protected]
> Subject: Re: [users] osafamfd coredump issue
> 
> Hi Jianfeng,
> 
> Any steps to reproduce it?
> 
> While AMFD is performing failover, it finds mismatch in assignment 
> counters and it asserted.
> 
> Please share amfd traces if available and also raise a ticket with the same.
> 
> Thanks
> 
> Praveen
> 
> On 23-May-17 3:41 PM, Jianfeng Dong wrote:
> 
>  > Hi,
> 
>  >
> 
>  >
> 
>  >
> 
>  > We also got a 'osafamfd' coredump in our controller board, could 
> please someone take a look at the issue? Thanks in advance.
> 
>  >
> 
>  >
> 
>  >
> 
>  > I listed the backtrace info here but not attach the coredump file(due 
> to email size limit), so please let me know if you need more information.
> 
>  >
> 
>  >
> 
>  >
> 
>  >
> 
>  >
> 
>  > root@scm1:/coredumps/# gdb /usr/lib64/opensaf/osafamfd
> 
>  > core.image\=26115.proc\=osafamfd.pid\=4277.signal\=6.time\=1493639577
> 
>  >
> 
>  > GNU gdb (Wind River Linux Sourcery CodeBench 4.8-28) 7.6
> 
>  >
> 
>  > Copyright (C) 2013 Free Software Foundation, Inc.
> 
>  >
> 
>  > License GPLv3+: GNU GPL version 3 or later
> 
>  > <https://urldefense.proofpoint.com/v2/url?u=http-3A__gnu.org_licenses_
> 
>  > gpl.html&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk
> 
>  > 1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=i29npjWFiXQmxwH36EDTrR9FGoBD
> 
>  > UiNYwHVYDA9w-_M&s=YnuqkZRthOUoqXPc8jSZiuTM5L7kb24nNWEV6_8GrUY&e= >
> 
>  >
> 
>  > This is free software: you are free to change and redistribute it.
> 
>  >
> 
>  > There is NO WARRANTY, to the extent permitted by law.  Type "show 
> copying"
> 
>  >
> 
>  > and "show warranty" for details.
> 
>  >
> 
>  > This GDB was configured as "x86_64-wrs-linux-gnu".
> 
>  >
> 
>  > For bug reporting instructions, please see:
> 
>  >
> 
>  > <[email protected]<mailto:[email protected] 
> <mailto:[email protected]%3cmailto:[email protected]>>>...
> 
>  >
> 
>  > Reading symbols from /usr/lib64/opensaf/osafamfd...Reading symbols 
> from /usr/lib64/opensaf/.debug/osafamfd...done.
> 
>  >
> 
>  > done.
> 
>  >
> 
>  > [New LWP 4277]
> 
>  >
> 
>  > [New LWP 4279]
> 
>  >
> 
>  > [New LWP 4280]
> 
>  >
> 
>  > [New LWP 4282]
> 
>  >
> 
>  >
> 
>  >
> 
>  > warning: Could not load shared library symbols for linux-vdso.so.1.
> 
>  >
> 
>  > Do you need "set solib-search-path" or "set sysroot"?
> 
>  >
> 
>  > [Thread debugging using libthread_db enabled]
> 
>  >
> 
>  > Using host libthread_db library "/lib64/libthread_db.so.1".
> 
>  >
> 
>  > bCore was generated by `/usr/lib64/opensaf/osafamfd osafamfd'.
> 
>  >
> 
>  > Program terminated with signal 6, Aborted.
> 
>  >
> 
>  > #0  0x0000003d84a353e9 in __GI_raise (sig=sig@entry=6) at
> 
>  > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 
>  >
> 
>  > 56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or 
> directory.
> 
>  >
> 
>  > (gdb) bt full
> 
>  >
> 
>  > #0  0x0000003d84a353e9 in __GI_raise (sig=sig@entry=6) at
> 
>  > ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 
>  >
> 
>  >          resultvar = 0
> 
>  >
> 
>  >          pid = 4277
> 
>  >
> 
>  >          selftid = 4277
> 
>  >
> 
>  > #1  0x0000003d84a38508 in __GI_abort () at abort.c:89
> 
>  >
> 
>  >          save_stage = 2
> 
>  >
> 
>  >          act = {__sigaction_handler = {sa_handler = 0x51560d,
> 
>  > sa_sigaction = 0x51560d}, sa_mask = {__val = {2006, 5336880, 5335460,
> 
>  > 2130303778826, 5320117, 9977552, 264237561592, 140737298378800,
> 
>  > 264235064979, 17179869185,
> 
>  >
> 
>  >                18442240615826079272, 4294967296, 5873756416, 5321392,
> 
>  > 14559416, 140737298378864}}, sa_flags = -2052873586, sa_restorer =
> 
>  > 0x0}
> 
>  >
> 
>  >          sigs = {__val = {32, 0 <repeats 15 times>}}
> 
>  >
> 
>  > #2  0x0000003d85a2110a in __osafassert_fail (__file=0x51560d "su.cc",
> 
>  > __line=2006, __func=0x516f30 <AVD_SU::dec_curr_act_si()::__FUNCTION__>
> 
>  > "dec_curr_act_si", __assertion=0x5169a4 "saAmfSUNumCurrActiveSIs > 0")
> 
>  > at sysf_def.c:281
> 
>  >
> 
>  > No locals.
> 
>  >
> 
>  > #3  0x00000000004d907d in AVD_SU::dec_curr_act_si (this=0xde8390) at
> 
>  > su.cc:2006
> 
>  >
> 
>  >          __FUNCTION__ = "dec_curr_act_si"
> 
>  >
> 
>  > #4  0x00000000004c0301 in avd_susi_delete (cb=0x75a2e0
> 
>  > <_control_block>, susi=0xd38320, ckpt=false) at siass.cc:554
> 
>  >
> 
>  >          i_su_si = 0xd38320
> 
>  >
> 
>  >          su = 0xde8390
> 
>  >
> 
>  >          __FUNCTION__ = "avd_susi_delete"
> 
>  >
> 
>  >          p_su_si = 0x0
> 
>  >
> 
>  >          p_si_su = 0x0
> 
>  >
> 
>  > #5  0x00000000004964e1 in SG_NORED::node_fail (this=0xd7e9a0,
> 
>  > cb=0x75a2e0 <_control_block>, su=0xde8390) at sg_nored_fsm.cc:781
> 
>  >
> 
>  >          l_si = 0x7ffff4ad31a0
> 
>  >
> 
>  >          old_state = SA_AMF_HA_QUIESCED
> 
>  >
> 
>  >          su_node_ptr = 0x0
> 
>  >
> 
>  >          __FUNCTION__ = "node_fail"
> 
>  >
> 
>  > #6  0x00000000004b8c78 in avd_node_down_mw_susi_failover (cb=0x75a2e0
> 
>  > <_control_block>, avnd=0x9e3bf0) at sgproc.cc:1983
> 
>  >
> 
>  >          i_su = @0xde84a0: 0xde8390
> 
>  >
> 
>  >          __for_range = @0x9e3eb8: {<std::_Vector_base<AVD_SU*,
> 
>  > std::allocator<AVD_SU*> >> = {_M_impl = {<std::allocator<AVD_SU*>> =
> 
>  > {<__gnu_cxx::new_allocator<AVD_SU*>> = {<No data fields>}, <No data
> 
>  > fields>}, _M_start = 0xde84a0,
> 
>  >
> 
>  >                _M_finish = 0xde84a8, _M_end_of_storage = 0xde84a8}},
> 
>  > <No data fields>}
> 
>  >
> 
>  >          __for_begin = {_M_current = 0xde84a0}
> 
>  >
> 
>  >          __for_end = {_M_current = 0xde84a8}
> 
>  >
> 
>  >          __FUNCTION__ = "avd_node_down_mw_susi_failover"
> 
>  >
> 
>  > #7  0x000000000045eb75 in avd_node_failover (node=0x9e3bf0) at
> 
>  > ndproc.cc:1142
> 
>  >
> 
>  >          __FUNCTION__ = "avd_node_failover"
> 
>  >
> 
>  > #8  0x0000000000456fea in avd_mds_avnd_down_evh (cb=0x75a2e0
> 
>  > <_control_block>, evt=0x7f5f78000ec0) at ndfsm.cc:684
> 
>  >
> 
>  >          node = 0x9e3bf0
> 
>  >
> 
>  >          __FUNCTION__ = "avd_mds_avnd_down_evh"
> 
>  >
> 
>  > #9  0x00000000004514f5 in process_event (cb_now=0x75a2e0
> 
>  > <_control_block>, evt=0x7f5f78000ec0) at main.cc:775
> 
>  >
> 
>  >          __FUNCTION__ = "process_event"
> 
>  >
> 
>  > #10 0x0000000000451211 in main_loop () at main.cc:696
> 
>  >
> 
>  >          pollretval = 1
> 
>  >
> 
>  >          evt = 0x7f5f78000ec0
> 
>  >
> 
>  >          mbx_fd = {raise_obj = 10, rmv_obj = 11}
> 
>  >
> 
>  >          polltmo = -1
> 
>  >
> 
>  >          term_fd = 22
> 
>  >
> 
>  >          __FUNCTION__ = "main_loop"
> 
>  >
> 
>  >          cb = 0x75a2e0 <_control_block>
> 
>  >
> 
>  >          error = SA_AIS_OK
> 
>  >
> 
>  > #11 0x000000000045178f in main (argc=2, argv=0x7ffff4ad33e8) at
> 
>  > main.cc:848
> 
>  >
> 
>  > No locals.
> 
>  >
> 
>  > (gdb)
> 
>  >
> 
>  >
> 
>  >
> 
>  >
> 
>  >
> 
>  > Regards,
> 
>  >
> 
>  > Jianfeng
> 
>  >
> 
>  >
> 
>  > ----------------------------------------------------------------------
> 
>  > -------- Check out the vibrant tech community on one of the world's
> 
>  > most engaging tech sites, Slashdot.org!
> 
>  > https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot&;
> 
>  > d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQt
> 
>  > YJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=i29npjWFiXQmxwH36EDTrR9FGoBDUiNYwHVYD
> 
>  > A9w-_M&s=Bq2ghAHZvB73HvRatGkk7vHW0gl6YS7Aya3phD8RAOw&e=
> 
>  > _______________________________________________
> 
>  > Opensaf-users mailing list
> 
>  > [email protected] 
> <mailto:[email protected]>
> 
>  > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge
> 
>  > .net_lists_listinfo_opensaf-2Dusers&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PQc
> 
>  > xBKCX5YTpkKY057SbK10&r=Lehk1PZKwfDQtYJXNyUKbPAqrw5O--SlPRAF9DIEps4&m=i
> 
>  > 29npjWFiXQmxwH36EDTrR9FGoBDUiNYwHVYDA9w-_M&s=xnZZo5ISRUbYu7JEb1nFi9dWf
> 
>  > eUD4iC-_75QfR6SMaY&e=
> 
>  >
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to