We have had further crashes/reboots. Yesterday it crashed twice within 10
minutes, then stabilized and hasn't crashed since.
This time, though, I had a serial cable hooked up to the system, and this got
logged to the console both times:
+ AAA SERVICE................................................ [DEAD]
- PGWC SERVICE............................................... [STOP]
- SGWC SERVICE............................................... [STOP]
- S6A SERVICE................................................ [STOP]
- MME SERVICE................................................ [STOP]
- UPGRADE INTERFACE.......................................... [STOP]
WARNING : recevied Error Message with reason=21 and cause =207 from
Application/OAM-CL
WARNING : recevied Error Message with reason=21 and cause =157 from
Application/OAM-CL
WARNING : recevied Error Message with reason=21 and cause =107 from
Application/OAM-CL
WARNING : recevied Error Message with reason=21 and cause =107 from
Application/OAM-CL
- CONFIGURATION AGENT........................................ [STOP]
- BASE CONFD................................................. [STOP]
- CONFD PHASE-0.............................................. [STOP]
- FSTS....................................................... [STOP]
- HWIF....................................................... [STOP]
#################################################################################
System failure condition detected!
POWER restart scheduled in 300 second(s)
#################################################################################
Forker timeout has expired. Reset the board...
Requesting Power On system reset...
So I guess the AAA process is failing for some reason. We are using external
HSS (RADIUS), so I presume it has something to do with that.
Guess I'll dig through our FreeRADIUS logs and then open a ticket...
-- Nathan
From: [email protected] [mailto:[email protected]] On Behalf Of
Nathan Anderson
Sent: Saturday, December 31, 2016 2:09 PM
To: Telrad List
Subject: Re: [Telrad] BreezeWAY EPC spontaneous reboots?
If a full reset is what it takes to fix, I am familiar enough with the
procedure that I can do it myself.
To me, it sounds like confd is bombing out, and other processes (mme, pgwc) are
trying to talk to it, unsuccessfully (since it isn't running). A few minutes
later, I presume the 'forker' process sees the problem and issues a system
reboot.
Yesterday was the first and only time it has done this, and it happened 4
times. We have not had a recurrence in 24 hours.
-- Nathan
From: [email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Matthew Carpenter
Sent: Saturday, December 31, 2016 8:00 AM
To: Telrad List
Subject: Re: [Telrad] BreezeWAY EPC spontaneous reboots?
Have not had any issues like this with our 2 EPCs.
I would contact support (Nick) and see about doing a full factory default on
the EPC and set it back up from scratch.
We had some odd issues with an eNB and that was the solution, at least for an
eNB.
The error messages sounds more like confd is trying to start a process with
parameters and its not working.
Matt Carpenter
On Sat, Dec 31, 2016 at 12:34 AM, Nathan Anderson
<[email protected]<mailto:[email protected]>> wrote:
Okay, I take it back. I found a clue in the 'tlsyslog' after all. Before it
reboots, I see several of these logged in there, interwoven with otherwise
normally-expected messages:
MME:1229:MME - Failed to start the session with confd for operational params
(...or...)
PGWC:1287:Failed to start the session with confd for operational params
So it sounds like confd is dying for some reason, and then the watchdog kicks
the box a few minutes later.
So I guess the question is, why is confd dying.
-- Nathan
-----Original Message-----
From: [email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>] On Behalf Of
Nathan Anderson
Sent: Friday, December 30, 2016 10:27 PM
To: [email protected]<mailto:[email protected]>
Subject: [Telrad] BreezeWAY EPC spontaneous reboots?
So, we had a new one today. One of our EPCs rebooted itself 4 times within the
span of 90 minutes.
Yes, latest public code level (6.6 729).
Are we the only ones who have seen THIS happen??
I didn't observe this particular detail myself, but others whose eyeballs were
trained on the physical BreezeWAY box at the time say that the alarm light went
red when it stopped responding, sat like that for a few minutes, and then the
box finally rebooted (presumably some sort of watchdog process).
Is there anything that I can look for to explain the reboots? 'show
notification stream alarms' only show the 'device-is-up-and-running' event with
nothing suspicious-looking showing up before that (or at least that managed to
get committed to NVRAM before the reboot occurred). Similarly, the tlsyslog
file just abruptly ends with "-- SYSTEM STARTED --" and "NOTICE:Last restart
type is POWERUP" with nothing suspicious-looking getting logged right before
that.
I now have a serial cable hooked up to it in case it happens again and in case
whatever is causing the crash logs something to the console that isn't getting
written to disk for some reason. But of course it hasn't happened again (it's
been 6 hours since the last event).
We haven't made a change to the config on this thing since the 6.6 upgrade was
installed last August or whenever that was. So, whyyyyyyyy after months of
stability is this happeningggggggg.
...actually, I take that back. We increased the uplink AMBR value to help
troubleshoot the eNB capacity issues we have been seeing. But that's all.
Ugh. If it isn't one thing...
--
Nathan Anderson
First Step Internet, LLC
[email protected]<mailto:[email protected]>
_______________________________________________
Telrad mailing list
[email protected]<mailto:[email protected]>
http://lists.wispa.org/mailman/listinfo/telrad
_______________________________________________
Telrad mailing list
[email protected]<mailto:[email protected]>
http://lists.wispa.org/mailman/listinfo/telrad
--
Matthew Carpenter
806-316-5071 office
806-236-9558 cell
[https://docs.google.com/uc?export=download&id=0BxDRq5UV7HPOaEM4LXVaVnk5cWM&revid=0BxDRq5UV7HPOTDdiVjM0TXRIc3ZzMXVUVDdDVjBiaFU0bHJNPQ]
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad