[Users] Concurrent AMF healthcheck timeouts using 1.0-4 without patch

Hans Feldt Wed, 16 Jan 2008 01:27:27 -0800

We have seen a problem that looks pretty bad. AMF reports health check
timeout for a couple of components simultaneously. Since there is
probably nothing wrong with the components the possible reasons for this
could be:
- the missing patch (soon to be integrated)
- AMF stops sending health checks
- MDS/TIPS hang-up


Syslog excerpt:

Jan 10 11:12:52 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
-safComp=CompT_MQD,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
-rcvr=9
Jan 10 11:12:52 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
-safComp=CompT_GLD,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
-rcvr=9
Jan 10 11:12:57 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot - Some
one has reset this card
Jan 10 11:12:58 SC_2_1 shutdown[12802]: shutting down for system reboot
Jan 10 11:13:04 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
-safComp=CompT_MAS,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
-rcvr=9
Jan 10 11:13:06 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
-safComp=CompT_EDS,safSu=SuT_EDS,safNode=SC_2_1 faulted due to 6 -rcvr=9
Jan 10 11:13:39 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
-safComp=CompT_DTS,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
-rcvr=9
Jan 10 10:13:53 SC_2_1 init: Switching to runlevel: 6
Jan 10 11:13:54 SC_2_1 shutdown: THE SYSTEM IS SHUTTING DOWN

No core dumps, nothing more interesting than this.

The problem has been seen once, maybe twice. Our application was running
on the payloads using check points and events as mentioned before. The
processor load was probably 50-60% on all processors (controllers and
payloads). In order to be able to run with 60% load, we doubled the
rcHbInt to 6s in BOM.xml.

I will try to generate debug info in the /etc/opt/opensaf/reboot script
(change from symlink to script) that is called by OpenSAF. This would be
helpful if the problem is seen again.

What is your opinion?

Regards,
Hans
_______________________________________________
Users mailing list
[email protected]
http://list.opensaf.org/maillist/listinfo/users

[Users] Concurrent AMF healthcheck timeouts using 1.0-4 without patch

Reply via email to