Okay, I take it back. I found a clue in the 'tlsyslog' after all. Before it reboots, I see several of these logged in there, interwoven with otherwise normally-expected messages:
MME:1229:MME - Failed to start the session with confd for operational params (...or...) PGWC:1287:Failed to start the session with confd for operational params So it sounds like confd is dying for some reason, and then the watchdog kicks the box a few minutes later. So I guess the question is, why is confd dying. -- Nathan -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Nathan Anderson Sent: Friday, December 30, 2016 10:27 PM To: [email protected] Subject: [Telrad] BreezeWAY EPC spontaneous reboots? So, we had a new one today. One of our EPCs rebooted itself 4 times within the span of 90 minutes. Yes, latest public code level (6.6 729). Are we the only ones who have seen THIS happen?? I didn't observe this particular detail myself, but others whose eyeballs were trained on the physical BreezeWAY box at the time say that the alarm light went red when it stopped responding, sat like that for a few minutes, and then the box finally rebooted (presumably some sort of watchdog process). Is there anything that I can look for to explain the reboots? 'show notification stream alarms' only show the 'device-is-up-and-running' event with nothing suspicious-looking showing up before that (or at least that managed to get committed to NVRAM before the reboot occurred). Similarly, the tlsyslog file just abruptly ends with "-- SYSTEM STARTED --" and "NOTICE:Last restart type is POWERUP" with nothing suspicious-looking getting logged right before that. I now have a serial cable hooked up to it in case it happens again and in case whatever is causing the crash logs something to the console that isn't getting written to disk for some reason. But of course it hasn't happened again (it's been 6 hours since the last event). We haven't made a change to the config on this thing since the 6.6 upgrade was installed last August or whenever that was. So, whyyyyyyyy after months of stability is this happeningggggggg. ...actually, I take that back. We increased the uplink AMBR value to help troubleshoot the eNB capacity issues we have been seeing. But that's all. Ugh. If it isn't one thing... -- Nathan Anderson First Step Internet, LLC [email protected] _______________________________________________ Telrad mailing list [email protected] http://lists.wispa.org/mailman/listinfo/telrad _______________________________________________ Telrad mailing list [email protected] http://lists.wispa.org/mailman/listinfo/telrad
