[tickets] [opensaf:tickets] #2338 amfd got crashed while changing role from queised to active

2017-03-10 Thread Nagendra Kumar
- **status**: review --> fixed
- **Comment**:

changeset:   8687:43deca051ae2
branch:  opensaf-5.0.x
parent:  8682:50a2033a8a8d
user:Nagendra Kumar
date:Fri Mar 10 15:30:59 2017 +0530
summary: amfd: handle TIMEOUT for avd_imm_applier_set [#2338]

changeset:   8688:c4271e0114d8
branch:  opensaf-5.1.x
parent:  8683:59e265654232
user:Nagendra Kumar
date:Fri Mar 10 15:31:10 2017 +0530
summary: amfd: handle TIMEOUT for avd_imm_applier_set [#2338]

changeset:   8689:4cefc956fdf0
tag: tip
parent:  8686:03647db14f06
user:Nagendra Kumar
date:Fri Mar 10 15:31:28 2017 +0530
summary: amfd: handle TIMEOUT for avd_imm_applier_set [#2338]

[staging:43deca]
[staging:c4271e]
[staging:4cefc9]




---

** [tickets:#2338] amfd got crashed while changing role from queised to active**

**Status:** fixed
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 05:41 AM UTC by Ritu Raj
**Last Updated:** Wed Mar 08, 2017 08:39 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[osafamfd.tgz](https://sourceforge.net/p/opensaf/tickets/2338/attachment/osafamfd.tgz)
 (2.8 MB; application/octet-stream)
- 
[syslog.7z](https://sourceforge.net/p/opensaf/tickets/2338/attachment/syslog.7z)
 (649.4 kB; application/octet-stream)


#Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )


#Summary
amfd got crashed while changing role from queised to active

#Steps followed & Observed behaviour
   1. Invoke switchovers
   2. After few successfull switchovers, SC-1 got Active role and SC-2 got 
standby role.
   3. Invoke one more switchover where SC-1 got queised role and 
SC-2 successfully become active after this cpd got crashed(SC-2) while 
SC-1 changing role from queised to active amfd got crashed on SC-1, resulted 
into cluster reset

>>For CPD crash refer ticket #2337

Syslog of SC-1:
Mar  2 14:12:00 TestBed-R1 osafimmnd[2138]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER Impl Set Failed for 
SaAmfNodeSwBundle, returned 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER avd_imm_applier_set FAILED, 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: src/amf/amfd/role.cc:807: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: ER AMFD has unexpectedly crashed. 
Rebooting node
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131343, SupervisionTime = 60



BT:
(gdb) thread apply all bt

Thread 4 (Thread 0x7f2e04fe4b00 (LWP 2182)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout_ts=0x7f2e04fe4180, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04146261 in osaf_poll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout=3) at src/base/osaf_poll.c:44
3  0x7f2e04146430 in osaf_poll_one_fd (i_fd=15, i_timeout=3) at 
src/base/osaf_poll.c:128
4  0x7f2e0418d360 in rda_read_msg (sockfd=15, msg=0x7f2e04fe4260 "10 1", 
size=64) at src/rde/agent/rda_papi.cc:673
5  0x7f2e0418cb40 in rda_callback_task (rda_callback_cb=0x7f2e0549c440) at 
src/rde/agent/rda_papi.cc:150
6  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
7  0x7f2e034209cd in clone () from /lib64/libc.so.6
8  0x in ?? ()

Thread 3 (Thread 0x7f2e05004b00 (LWP 2181)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e04188958 in mdtm_process_recv_events () at 
src/mds/mds_dt_tipc.c:669
2  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
3  0x7f2e034209cd in clone () from /lib64/libc.so.6
4  0x in ?? ()

Thread 2 (Thread 0x7f2e0503ab00 (LWP 2180)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e0503a270, i_nfds=1, 
i_timeout_ts=0x7f2e0503a2a0, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04150604 in ncs_tmr_wait () at src/base/sysf_tmr.c:406
3  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
4  0x7f2e034209cd in clone () from /lib64/libc.so.6
5  0x in ?? ()

Thread 1 (Thread 0x7f2e05007720 (LWP 2178)):
0  0x7f2e0337bb55 in raise () from /lib64/libc.so.6
1  0x7f2e0337d131 in abort () from /lib64/libc.so.6
2  0x7f2e0414b6e7 in __osafassert_fail (__file=0x7f2e05215e2f 
"src/amf/amfd/role.cc", __line=807,
__func=0x7f2e05216c90  "avd_mds_qsd_role_evh", __assertion=0x7f2e05216548 "0")
at src/base/sysf_def.c:281
3  0x7f2e05182755 in avd_mds_qsd_role_evh (cb=0x7f2e054640c0 
<_control_block>, evt=0x7f2dfc000b20) at src/amf/amfd/role.cc:807
4  0x7f2e05156536 in process_event 

[tickets] [opensaf:tickets] #2338 amfd got crashed while changing role from queised to active

2017-03-07 Thread Nagendra Kumar
- **status**: assigned --> accepted



---

** [tickets:#2338] amfd got crashed while changing role from queised to active**

**Status:** accepted
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 05:41 AM UTC by Ritu Raj
**Last Updated:** Tue Mar 07, 2017 07:27 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[osafamfd.tgz](https://sourceforge.net/p/opensaf/tickets/2338/attachment/osafamfd.tgz)
 (2.8 MB; application/octet-stream)
- 
[syslog.7z](https://sourceforge.net/p/opensaf/tickets/2338/attachment/syslog.7z)
 (649.4 kB; application/octet-stream)


#Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )


#Summary
amfd got crashed while changing role from queised to active

#Steps followed & Observed behaviour
   1. Invoke switchovers
   2. After few successfull switchovers, SC-1 got Active role and SC-2 got 
standby role.
   3. Invoke one more switchover where SC-1 got queised role and 
SC-2 successfully become active after this cpd got crashed(SC-2) while 
SC-1 changing role from queised to active amfd got crashed on SC-1, resulted 
into cluster reset

>>For CPD crash refer ticket #2337

Syslog of SC-1:
Mar  2 14:12:00 TestBed-R1 osafimmnd[2138]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER Impl Set Failed for 
SaAmfNodeSwBundle, returned 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER avd_imm_applier_set FAILED, 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: src/amf/amfd/role.cc:807: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: ER AMFD has unexpectedly crashed. 
Rebooting node
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131343, SupervisionTime = 60



BT:
(gdb) thread apply all bt

Thread 4 (Thread 0x7f2e04fe4b00 (LWP 2182)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout_ts=0x7f2e04fe4180, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04146261 in osaf_poll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout=3) at src/base/osaf_poll.c:44
3  0x7f2e04146430 in osaf_poll_one_fd (i_fd=15, i_timeout=3) at 
src/base/osaf_poll.c:128
4  0x7f2e0418d360 in rda_read_msg (sockfd=15, msg=0x7f2e04fe4260 "10 1", 
size=64) at src/rde/agent/rda_papi.cc:673
5  0x7f2e0418cb40 in rda_callback_task (rda_callback_cb=0x7f2e0549c440) at 
src/rde/agent/rda_papi.cc:150
6  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
7  0x7f2e034209cd in clone () from /lib64/libc.so.6
8  0x in ?? ()

Thread 3 (Thread 0x7f2e05004b00 (LWP 2181)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e04188958 in mdtm_process_recv_events () at 
src/mds/mds_dt_tipc.c:669
2  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
3  0x7f2e034209cd in clone () from /lib64/libc.so.6
4  0x in ?? ()

Thread 2 (Thread 0x7f2e0503ab00 (LWP 2180)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e0503a270, i_nfds=1, 
i_timeout_ts=0x7f2e0503a2a0, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04150604 in ncs_tmr_wait () at src/base/sysf_tmr.c:406
3  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
4  0x7f2e034209cd in clone () from /lib64/libc.so.6
5  0x in ?? ()

Thread 1 (Thread 0x7f2e05007720 (LWP 2178)):
0  0x7f2e0337bb55 in raise () from /lib64/libc.so.6
1  0x7f2e0337d131 in abort () from /lib64/libc.so.6
2  0x7f2e0414b6e7 in __osafassert_fail (__file=0x7f2e05215e2f 
"src/amf/amfd/role.cc", __line=807,
__func=0x7f2e05216c90  "avd_mds_qsd_role_evh", __assertion=0x7f2e05216548 "0")
at src/base/sysf_def.c:281
3  0x7f2e05182755 in avd_mds_qsd_role_evh (cb=0x7f2e054640c0 
<_control_block>, evt=0x7f2dfc000b20) at src/amf/amfd/role.cc:807
4  0x7f2e05156536 in process_event (cb_now=0x7f2e054640c0 <_control_block>, 
evt=0x7f2dfc000b20) at src/amf/amfd/main.cc:811
5  0x7f2e051560ee in main_loop () at src/amf/amfd/main.cc:702
6  0x7f2e051566fd in main (argc=2, argv=0x7fff5826f318) at 
src/amf/amfd/main.cc:861
(gdb)





Notes:
1. Syslog of both controller's attached
2. amfd bt attached
3. amfd trace attached

Both nodes are not in time sysnc, there is time gap between two nodes
Relative to SC-2, SC-1 is (+50 min ahead)
Time Diff
==
TestBed-R1:~  date
Thu Mar 2 16:34:45 IST 2017
TestBed-R2:~  date
Thu Mar 2 15:44:30 IST 2017
=


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 

[tickets] [opensaf:tickets] #2338 amfd got crashed while changing role from queised to active

2017-03-06 Thread Nagendra Kumar
- **status**: unassigned --> assigned
- **assigned_to**: Nagendra Kumar



---

** [tickets:#2338] amfd got crashed while changing role from queised to active**

**Status:** assigned
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 05:41 AM UTC by Ritu Raj
**Last Updated:** Fri Mar 03, 2017 05:42 AM UTC
**Owner:** Nagendra Kumar
**Attachments:**

- 
[osafamfd.tgz](https://sourceforge.net/p/opensaf/tickets/2338/attachment/osafamfd.tgz)
 (2.8 MB; application/octet-stream)
- 
[syslog.7z](https://sourceforge.net/p/opensaf/tickets/2338/attachment/syslog.7z)
 (649.4 kB; application/octet-stream)


#Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )


#Summary
amfd got crashed while changing role from queised to active

#Steps followed & Observed behaviour
   1. Invoke switchovers
   2. After few successfull switchovers, SC-1 got Active role and SC-2 got 
standby role.
   3. Invoke one more switchover where SC-1 got queised role and 
SC-2 successfully become active after this cpd got crashed(SC-2) while 
SC-1 changing role from queised to active amfd got crashed on SC-1, resulted 
into cluster reset

>>For CPD crash refer ticket #2337

Syslog of SC-1:
Mar  2 14:12:00 TestBed-R1 osafimmnd[2138]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER Impl Set Failed for 
SaAmfNodeSwBundle, returned 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER avd_imm_applier_set FAILED, 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: src/amf/amfd/role.cc:807: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: ER AMFD has unexpectedly crashed. 
Rebooting node
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131343, SupervisionTime = 60



BT:
(gdb) thread apply all bt

Thread 4 (Thread 0x7f2e04fe4b00 (LWP 2182)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout_ts=0x7f2e04fe4180, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04146261 in osaf_poll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout=3) at src/base/osaf_poll.c:44
3  0x7f2e04146430 in osaf_poll_one_fd (i_fd=15, i_timeout=3) at 
src/base/osaf_poll.c:128
4  0x7f2e0418d360 in rda_read_msg (sockfd=15, msg=0x7f2e04fe4260 "10 1", 
size=64) at src/rde/agent/rda_papi.cc:673
5  0x7f2e0418cb40 in rda_callback_task (rda_callback_cb=0x7f2e0549c440) at 
src/rde/agent/rda_papi.cc:150
6  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
7  0x7f2e034209cd in clone () from /lib64/libc.so.6
8  0x in ?? ()

Thread 3 (Thread 0x7f2e05004b00 (LWP 2181)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e04188958 in mdtm_process_recv_events () at 
src/mds/mds_dt_tipc.c:669
2  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
3  0x7f2e034209cd in clone () from /lib64/libc.so.6
4  0x in ?? ()

Thread 2 (Thread 0x7f2e0503ab00 (LWP 2180)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e0503a270, i_nfds=1, 
i_timeout_ts=0x7f2e0503a2a0, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04150604 in ncs_tmr_wait () at src/base/sysf_tmr.c:406
3  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
4  0x7f2e034209cd in clone () from /lib64/libc.so.6
5  0x in ?? ()

Thread 1 (Thread 0x7f2e05007720 (LWP 2178)):
0  0x7f2e0337bb55 in raise () from /lib64/libc.so.6
1  0x7f2e0337d131 in abort () from /lib64/libc.so.6
2  0x7f2e0414b6e7 in __osafassert_fail (__file=0x7f2e05215e2f 
"src/amf/amfd/role.cc", __line=807,
__func=0x7f2e05216c90  "avd_mds_qsd_role_evh", __assertion=0x7f2e05216548 "0")
at src/base/sysf_def.c:281
3  0x7f2e05182755 in avd_mds_qsd_role_evh (cb=0x7f2e054640c0 
<_control_block>, evt=0x7f2dfc000b20) at src/amf/amfd/role.cc:807
4  0x7f2e05156536 in process_event (cb_now=0x7f2e054640c0 <_control_block>, 
evt=0x7f2dfc000b20) at src/amf/amfd/main.cc:811
5  0x7f2e051560ee in main_loop () at src/amf/amfd/main.cc:702
6  0x7f2e051566fd in main (argc=2, argv=0x7fff5826f318) at 
src/amf/amfd/main.cc:861
(gdb)





Notes:
1. Syslog of both controller's attached
2. amfd bt attached
3. amfd trace attached

Both nodes are not in time sysnc, there is time gap between two nodes
Relative to SC-2, SC-1 is (+50 min ahead)
Time Diff
==
TestBed-R1:~  date
Thu Mar 2 16:34:45 IST 2017
TestBed-R2:~  date
Thu Mar 2 15:44:30 IST 2017
=


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a 

[tickets] [opensaf:tickets] #2338 amfd got crashed while changing role from queised to active

2017-03-02 Thread Ritu Raj
- Attachments has changed:

Diff:



--- old
+++ new
@@ -0,0 +1,2 @@
+osafamfd.tgz (2.8 MB; application/octet-stream)
+syslog.7z (649.4 kB; application/octet-stream)






---

** [tickets:#2338] amfd got crashed while changing role from queised to active**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 05:41 AM UTC by Ritu Raj
**Last Updated:** Fri Mar 03, 2017 05:42 AM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfd.tgz](https://sourceforge.net/p/opensaf/tickets/2338/attachment/osafamfd.tgz)
 (2.8 MB; application/octet-stream)
- 
[syslog.7z](https://sourceforge.net/p/opensaf/tickets/2338/attachment/syslog.7z)
 (649.4 kB; application/octet-stream)


#Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )


#Summary
amfd got crashed while changing role from queised to active

#Steps followed & Observed behaviour
   1. Invoke switchovers
   2. After few successfull switchovers, SC-1 got Active role and SC-2 got 
standby role.
   3. Invoke one more switchover where SC-1 got queised role and 
SC-2 successfully become active after this cpd got crashed(SC-2) while 
SC-1 changing role from queised to active amfd got crashed on SC-1, resulted 
into cluster reset

>>For CPD crash refer ticket #2337

Syslog of SC-1:
Mar  2 14:12:00 TestBed-R1 osafimmnd[2138]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER Impl Set Failed for 
SaAmfNodeSwBundle, returned 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER avd_imm_applier_set FAILED, 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: src/amf/amfd/role.cc:807: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: ER AMFD has unexpectedly crashed. 
Rebooting node
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131343, SupervisionTime = 60



BT:
(gdb) thread apply all bt

Thread 4 (Thread 0x7f2e04fe4b00 (LWP 2182)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout_ts=0x7f2e04fe4180, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04146261 in osaf_poll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout=3) at src/base/osaf_poll.c:44
3  0x7f2e04146430 in osaf_poll_one_fd (i_fd=15, i_timeout=3) at 
src/base/osaf_poll.c:128
4  0x7f2e0418d360 in rda_read_msg (sockfd=15, msg=0x7f2e04fe4260 "10 1", 
size=64) at src/rde/agent/rda_papi.cc:673
5  0x7f2e0418cb40 in rda_callback_task (rda_callback_cb=0x7f2e0549c440) at 
src/rde/agent/rda_papi.cc:150
6  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
7  0x7f2e034209cd in clone () from /lib64/libc.so.6
8  0x in ?? ()

Thread 3 (Thread 0x7f2e05004b00 (LWP 2181)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e04188958 in mdtm_process_recv_events () at 
src/mds/mds_dt_tipc.c:669
2  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
3  0x7f2e034209cd in clone () from /lib64/libc.so.6
4  0x in ?? ()

Thread 2 (Thread 0x7f2e0503ab00 (LWP 2180)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e0503a270, i_nfds=1, 
i_timeout_ts=0x7f2e0503a2a0, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04150604 in ncs_tmr_wait () at src/base/sysf_tmr.c:406
3  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
4  0x7f2e034209cd in clone () from /lib64/libc.so.6
5  0x in ?? ()

Thread 1 (Thread 0x7f2e05007720 (LWP 2178)):
0  0x7f2e0337bb55 in raise () from /lib64/libc.so.6
1  0x7f2e0337d131 in abort () from /lib64/libc.so.6
2  0x7f2e0414b6e7 in __osafassert_fail (__file=0x7f2e05215e2f 
"src/amf/amfd/role.cc", __line=807,
__func=0x7f2e05216c90  "avd_mds_qsd_role_evh", __assertion=0x7f2e05216548 "0")
at src/base/sysf_def.c:281
3  0x7f2e05182755 in avd_mds_qsd_role_evh (cb=0x7f2e054640c0 
<_control_block>, evt=0x7f2dfc000b20) at src/amf/amfd/role.cc:807
4  0x7f2e05156536 in process_event (cb_now=0x7f2e054640c0 <_control_block>, 
evt=0x7f2dfc000b20) at src/amf/amfd/main.cc:811
5  0x7f2e051560ee in main_loop () at src/amf/amfd/main.cc:702
6  0x7f2e051566fd in main (argc=2, argv=0x7fff5826f318) at 
src/amf/amfd/main.cc:861
(gdb)





Notes:
1. Syslog of both controller's attached
2. amfd bt attached
3. amfd trace attached

Both nodes are not in time sysnc, there is time gap between two nodes
Relative to SC-2, SC-1 is (+50 min ahead)
Time Diff
==
TestBed-R1:~  date
Thu Mar 2 16:34:45 IST 2017
TestBed-R2:~  date
Thu Mar 2 15:44:30 IST 2017
=


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 

[tickets] [opensaf:tickets] #2338 amfd got crashed while changing role from queised to active

2017-03-02 Thread Ritu Raj



---

** [tickets:#2338] amfd got crashed while changing role from queised to active**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Fri Mar 03, 2017 05:41 AM UTC by Ritu Raj
**Last Updated:** Fri Mar 03, 2017 05:41 AM UTC
**Owner:** nobody


#Environment details
OS : Suse 64bit
Changeset : 8634 ( 5.2.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with 1PBE enabled )


#Summary
amfd got crashed while changing role from queised to active

#Steps followed & Observed behaviour
   1. Invoke switchovers
   2. After few successfull switchovers, SC-1 got Active role and SC-2 got 
standby role.
   3. Invoke one more switchover where SC-1 got queised role and 
SC-2 successfully become active after this cpd got crashed(SC-2) while 
SC-1 changing role from queised to active amfd got crashed on SC-1, resulted 
into cluster reset

>>For CPD crash refer ticket #2337

Syslog of SC-1:
Mar  2 14:12:00 TestBed-R1 osafimmnd[2138]: NO PBE-OI established on this SC. 
Dumping incrementally to file imm.db
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER Impl Set Failed for 
SaAmfNodeSwBundle, returned 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: ER avd_imm_applier_set FAILED, 5
Mar  2 14:12:03 TestBed-R1 osafamfd[2178]: src/amf/amfd/role.cc:807: 
avd_mds_qsd_role_evh: Assertion '0' failed.
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: ER AMFD has unexpectedly crashed. 
Rebooting node
Mar  2 14:12:03 TestBed-R1 osafamfnd[2188]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 
131343, SupervisionTime = 60



BT:
(gdb) thread apply all bt

Thread 4 (Thread 0x7f2e04fe4b00 (LWP 2182)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout_ts=0x7f2e04fe4180, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04146261 in osaf_poll (io_fds=0x7f2e04fe41c0, i_nfds=1, 
i_timeout=3) at src/base/osaf_poll.c:44
3  0x7f2e04146430 in osaf_poll_one_fd (i_fd=15, i_timeout=3) at 
src/base/osaf_poll.c:128
4  0x7f2e0418d360 in rda_read_msg (sockfd=15, msg=0x7f2e04fe4260 "10 1", 
size=64) at src/rde/agent/rda_papi.cc:673
5  0x7f2e0418cb40 in rda_callback_task (rda_callback_cb=0x7f2e0549c440) at 
src/rde/agent/rda_papi.cc:150
6  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
7  0x7f2e034209cd in clone () from /lib64/libc.so.6
8  0x in ?? ()

Thread 3 (Thread 0x7f2e05004b00 (LWP 2181)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e04188958 in mdtm_process_recv_events () at 
src/mds/mds_dt_tipc.c:669
2  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
3  0x7f2e034209cd in clone () from /lib64/libc.so.6
4  0x in ?? ()

Thread 2 (Thread 0x7f2e0503ab00 (LWP 2180)):
0  0x7f2e034174f6 in poll () from /lib64/libc.so.6
1  0x7f2e0414633b in osaf_ppoll (io_fds=0x7f2e0503a270, i_nfds=1, 
i_timeout_ts=0x7f2e0503a2a0, i_sigmask=0x0) at src/base/osaf_poll.c:105
2  0x7f2e04150604 in ncs_tmr_wait () at src/base/sysf_tmr.c:406
3  0x7f2e036c47b6 in start_thread () from /lib64/libpthread.so.0
4  0x7f2e034209cd in clone () from /lib64/libc.so.6
5  0x in ?? ()

Thread 1 (Thread 0x7f2e05007720 (LWP 2178)):
0  0x7f2e0337bb55 in raise () from /lib64/libc.so.6
1  0x7f2e0337d131 in abort () from /lib64/libc.so.6
2  0x7f2e0414b6e7 in __osafassert_fail (__file=0x7f2e05215e2f 
"src/amf/amfd/role.cc", __line=807,
__func=0x7f2e05216c90  "avd_mds_qsd_role_evh", __assertion=0x7f2e05216548 "0")
at src/base/sysf_def.c:281
3  0x7f2e05182755 in avd_mds_qsd_role_evh (cb=0x7f2e054640c0 
<_control_block>, evt=0x7f2dfc000b20) at src/amf/amfd/role.cc:807
4  0x7f2e05156536 in process_event (cb_now=0x7f2e054640c0 <_control_block>, 
evt=0x7f2dfc000b20) at src/amf/amfd/main.cc:811
5  0x7f2e051560ee in main_loop () at src/amf/amfd/main.cc:702
6  0x7f2e051566fd in main (argc=2, argv=0x7fff5826f318) at 
src/amf/amfd/main.cc:861
(gdb)





Notes:
1. Syslog of both controller's attached
2. amfd bt attached
3. amfd trace attached

Both nodes are not in time sysnc, there is time gap between two nodes
Relative to SC-2, SC-1 is (+50 min ahead)
Time Diff
==
TestBed-R1:~  date
Thu Mar 2 16:34:45 IST 2017
TestBed-R2:~  date
Thu Mar 2 15:44:30 IST 2017
=


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org!