Re: [Telrad] UE upgrade failure rate

Nathan Anderson Thu, 09 Feb 2017 02:56:13 -0800

After much trial and error, I managed to learn that the white port is a 
TTL-level serial interface.  And there was much rejoicing.


ALSO, I FIGURED OUT WHAT HAS BEEN KILLING (at least our) CPE8000s.

Remember that problem that the EPC firmware had back when it was first 
released?  Back when root access was still available on the EPC firmware, there 
was an FTP server running on it that accepted connections via the PDN IP 
address, and if you didn't change the root password from the default insecure 
one (which was ironically named), then infected machines trying to spread that 
stupid Photo.scr worm would successfully log into the EPC via FTP and, thinking 
that it had managed to log into a public web server somewhere, upload a 
bajillion copies of the virus to it in various directories, filling up the disk.

The exact same thing is happening here, believe it or not.  It hadn't ever 
occurred to me to test for this, but it turns out that under certain 
circumstances that I haven't yet managed to nail down, the CPE8000 firmware 
actually starts running an FTP server.  Even worse, this FTP server, once 
enabled, does not ask for any credentials.  You can literally type in any 
username when prompted, and you are in.  I see no config option on the web 
interface for the CPE that allows you to turn this on and off...but whatever is 
triggering it ends up creating a ready and completely unsecured backdoor to the 
CPE.

*headdesk*

If you guys give out routable IPs to your LTE users, or if you have somebody on 
your network that has a PC infected with this particular virus, then it might 
be that this could also explain your CPE8000 firmware upgrade problems.

After figuring out the serial port bit and examining the "dead" CPEs more 
in-depth, I found the filesystems littered with files named things like 
Photo.scr, IMG001.scr, Info.zip, etc.  Once the writable partition with the CPE 
configuration is completely full, if at that point you issue either a 
reset-to-defaults, or upload a configuration backup, or initiate a firmware 
upgrade (which has to migrate your configuration from the old firmware version 
during the process), your CPE gets bricked because there isn't enough disk 
space left for it to properly finish writing the config changes to disk.  So it 
gets only half-done, and the configuration is left in an inconsistent state.

I've managed to fix my dead units, and also found the mechanism for disabling 
the FTP server.  Still not sure how it is getting toggled on in the first place 
(perhaps there is some other vulnerability that is getting exploited first?), 
but I'll keep looking.

I'll write up some instructions for y'all and post them here soon.

-- Nathan

From: [email protected] [mailto:[email protected]] On Behalf Of 
Nathan Anderson
Sent: Wednesday, February 08, 2017 1:49 PM
To: Telrad List; Adam Moffett
Subject: Re: [Telrad] UE upgrade failure rate

Does anybody happen to know if the 6-pin white connector on the 8000's board is 
either a serial port or a JTAG interface?

-- Nathan

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Nathan Anderson
Sent: Wednesday, February 08, 2017 1:40 PM
To: Telrad List; Adam Moffett
Subject: Re: [Telrad] UE upgrade failure rate

This thread is interesting because I was just complaining last night to our 
vendor about how fragile the software on the CPE8000s seems to be.

We have not had specific issues with flashing CPEs "over the air" from the web 
interface, but sometimes ACS-initiated updates don't complete correctly.  On 
7000s it usually takes the form of the upgrade not completing and the UE 
falling off of the ACS, but the radio stays up and attached to the network.  We 
go in via the web interface OTA and reboot it and it comes back with the same 
version of firmware it was already running.  Second time is usually the charm, 
and I'm thinking that perhaps if the UE had been freshly-rebooted before 
attempting the update, we might have a higher success rate.  (We have also seen 
7000s just stop talking to the ACS without us touching the firmware, and even 
though they are otherwise working fine.  Again, rebooting the CPE fixes this.  
Although it is rare, we have seen this even on the latest .116)

We once had a 7000 that did drop off the network after pushing the update via 
ACS.  We never checked what state it was in from the ethernet side, but we had 
the customer powercycle it themselves and it came back…again running the same 
firmware.  So the upgrade did not take, but it didn't brick it either and 
resetting config to defaults on the UE was not (and at least for us never has 
been) necessary.

So we have never had to truck-roll to a 7000 as a result of a failed firmware 
upgrade.  The 8000s, however, seem to be another story.  I am so scared to 
touch the ones we have in the field anymore.  We have had a couple that seem to 
get their configs corrupted after a firmware change, and get into very funky 
states.

One of them had these symptoms: defaulted to a 192.168.0.1 IP on the ethernet 
(!), no web server running, no DHCP server running, had telnet access that 
didn't prompt for a password (!!).  Fixed it by resetting to defaults (found a 
shell script that performs this function on the CPE's filesystem).  I got lucky 
with this one.

One that I have sitting on my desk now is one that we tried to rollback the 
firmware on (customer was experiencing random network detaches, and the latest 
8000 firmware doesn't reattach for 15 minutes on-the-dot, so customer was -- I 
think justifiably -- getting a bit pissy).  Current symptoms are: NO IPv4 on 
the ethernet, IPv6 link-local responds, no web server running, no DHCP server 
running, telnet responds (calls itself "KZTECH") but default root/root123 
doesn't work, so I have NO way to get in and reset the damn thing, and the 
8000s don't seem to have a reset button.  Thus it seems that it is possible for 
a scrambled config to completely brick an 8000.

If anybody has reliable information on how to get the 8000 to wipe its config 
during bootup even though it seemingly lacks a reset button, I would be 
eternally grateful...

-- Nathan

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Matthew Carpenter
Sent: Wednesday, February 08, 2017 6:38 AM
To: Adam Moffett; Telrad List
Subject: Re: [Telrad] UE upgrade failure rate

Hi,

So far only 1 CPE8000 UE that did not come back after a firmware update.  
Normally a hard reboot would fix it, but in this case we had to replace it.
I have that CPE8000 on my desk and need to see what the status is from the LAN 
side.  Thanks for the info on defaulting it, will try it.

Matt C.



On Wed, Feb 8, 2017 at 8:23 AM, Adam Moffett 
<[email protected]<mailto:[email protected]>> wrote:
We've had a helluva time upgrading UE firmware over the air.  It was worse with 
Wimax.  On Wimax it was more like 75% of the time we would lose the channel 
scan table and have to go on site to add it back in.  It became SOP to leave 
the operator password at default so we had the option of having the customer 
log in and fix the scan table for us.

I think we've had more success since going to LTE.  However, failed firmware 
updates was one of the incentives to set up a dedicated management bearer.  I 
was hoping it would help with these things.  We haven't pushed out an update 
recently enough to say whether it helped.

-Adam



------ Original Message ------
From: "Shayne Lebrun" 
<[email protected]<mailto:[email protected]>>
To: [email protected]<mailto:[email protected]>
Sent: 2/8/2017 9:14:36 AM
Subject: [Telrad] UE upgrade failure rate

Does anybody else experience a ten to fifteen percent failure rate when 
upgrading UEs?  The behavior is, you upgrade the firmware, reboot, and the 
device doesn’t come back.  Logging into the UE’s management from LAN, you’ll 
see it’s stuck in ‘device init.’  Defaulting the unit and rebooting allows it 
to boot and attach.

We’re not using the residential gateway device or anything, and the only config 
we put in is device name, SNMP and ACS settings.  Sometimes we hardcode the 
client’s device in the DHCP server, to turn on DMZ to allow port forwarding, 
but that doesn’t seem to be a causal factor.

_______________________________________________
Telrad mailing list
[email protected]<mailto:[email protected]>
http://lists.wispa.org/mailman/listinfo/telrad



--
Matthew Carpenter
806-316-5071 office
806-236-9558 cell

[https://docs.google.com/uc?export=download&id=0BxDRq5UV7HPOaEM4LXVaVnk5cWM&revid=0BxDRq5UV7HPOTDdiVjM0TXRIc3ZzMXVUVDdDVjBiaFU0bHJNPQ]

_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad

Re: [Telrad] UE upgrade failure rate

Reply via email to