Re: [Telrad] Uplink throughput again

Adam Moffett Tue, 14 Mar 2017 20:35:41 -0700

We've had EPC crashes, but I haven't gotten to the bottom of any ofthem. I rebooted and moved on with the day because I have a full timejob outside of troubleshooting Telrad gear.....I made sure there's aserial cable connected now so I can hopefully get some info out of itnext time.

Data corruption for user traffic going through the EPC

I have not noticed this one....or I didn't know what I was seeing if Idid. What are the symptoms?


In addition to Nathan's list:

* 10 second cycle to traffic through the EPC. This is reproducible evenwith a single UE in a lab setup, and it's clearly tied to a process inthe Breezeway2020. There's a dip in throughput every 10 seconds, and itgets worse with more UE. When it gets extreme you can see air linkutilization flatten out at 65-75% as your eNB cycles between fullthroughput and zero throughput.

* UE getting stuck at MCS4....apparently until an S1 reset. This may ormay not be the same throughput issue that you guys were talking aboutearlier in the thread.


These two are both known by Telrad and being worked on.

* Saturating the upload on an eNB can cause CQI timeouts which makes UEdisconnect from the eNB. The customer is down for 1-2 minutes. This isthe main reason not to use configuration 2.This one (I was told) is not a bug, but simply part of how LTEfunctions. I'm skeptical of that explanation, but nevertheless ifyou're having better performance & reliability on configuration 1 thismight be why.

The point being that we're not having problems because we're dumb, we'rehaving problems because the product has a few problems. Justin, ifyou're not seeing these issues, maybe you're using different equipment(embedded core as someone said?), or maybe you'd see them if you golooking for them.

After migrating about 1/3 of our Wimax to LTE we now have about 400 LTEUE in service. Now we're in a holding pattern. I know we'll moveforward with the rest of the migration, but I think I'm waiting for 6.6Mfirst.




------ Original Message ------
From: "Nathan Anderson" <[email protected]>
To: "[email protected]" <[email protected]>
Sent: 3/14/2017 5:43:50 PM
Subject: Re: [Telrad] Uplink throughput again

There really isn't much for me to add to Jeremy's excellent response.
As far as what could be different with regard to the specific issue wewere talking about here? So many things. You could be running adifferent software release...only people who have been wrestling with(Telrad's acknowledged) issue with uplink performance likely are usingbleeding-edge eNB code to deal with it. You could be running adifferent configuration that is not affected, or not affected asnoticeably. We use 15MHz channels. I'd guess not very many others do,and until recently, different channel widths required some specialtweaks elsewhere in the config in order to get acceptable performance,which tells you something. We also currently use subframe profile 1,but before the latest CPE7000 firmware version, we *couldn't*, becauseof bugs in the CPE firmware that just absolutely killed performancewith that configuration under some circumstances.
Just because your particular configuration does not trigger some ofthese bugs does not mean that they don't exist, and I don't think it isfair or accurate for you to compare the issues being discussed here tothe "OMG whye cain't I do 100 megawattbits on me neg85Ubiquitay!!!!!111!!one!!" crowd.
In fact, over the last 3 months or so, the majority of our problemshave been traced back to the EPC and have absolutely zero to do withRF. Here are some of the issues we have dealt with there, some ofwhich are finally fixed, and some of which are not...granted, some ofthese (especially towards the bottom) are on account of us runningprerelease code, but at this point, because the fixes for the crasherswe found on 6.6 haven't been backported to 6.6, we are stuck between arock and a hard place and forced to play beta tester on our productionnetwork (!!!!):
* The RADIUS/AAA (external HSS) client in the EPC crashing (causing thewhole EPC to reboot itself)
* The switch FPGA dying/resetting for several seconds at a time(causing the RADIUS client to crash, causing reboot...beautiful dominoeffect!)
* The switch FPGA dying/resetting for several MINUTES at a time (firstrelease we got as a fix for this made the problem worse instead ofbetter)
* Dedicated bearers getting carte blanche on the uplink (100%reproducible within 5 minutes of trying to use the feature...how didthis manage to ship?)
* SGW process on the EPC crashing in a similar manner to RADIUS client
* Broken eNB tracking (gets confused about which eNB has which IPaddress; may be related to SGW issue: I suspect a linked list isgetting clobbered in memory)
* Data corruption for user traffic going through the EPC (it seeminglythrows out and recomputes the TCP payload checksum, making it silentbut deadly!)
Also, if I may (since I'm on the subject of frustrations, and I don'tknow where else to put this): why does every little config change(e.g., TFTP server? Really??) on either the Compact or BreezeWayrequire a full system reboot to be applied? I mean, what year is this?
-- Nathan
From:[email protected] [mailto:[email protected]] OnBehalf Of Jeremy Austin
Sent: Thursday, March 09, 2017 11:56 PM
To:[email protected]
Subject: Re: [Telrad] Uplink throughput again



Justin,
"What's Different" is an excellent question; I can't speak for otherson this list.
What's different (in my case) is that we start from the assumption thatour experience should be as positive as yours. Success stories aregreat. How a company handles success stories... they print a presser.
What's different is that we start from the assumption that if therewere a problem with performance, it was *our* oversight or failure, notTelrad's. We jump through prescribed troubleshooting hoops, repeatedly.We invent new tests when necessary.
What's different? Telrad's initial assumptions about our network and RFconditions prove consistently false. We schedule many complete systemoutages, and (working hand in hand with Telrad) discover Telradhardware failure modes that (guess what) nobody has ever seen before.
What's different is that we have had a non-zero percentage of failedTelrad equipment, and Telrad has RMAed, and we have not complained.That can happen to anyone.
What's different is that Telrad has eventually and repeatedly confirmedthat we have done everything right: that our network is intelligentlydesigned and built. ("His parents were poor but honest.")
That our customers could not be better installed with attention tosignal and conditions.
That poor performance is not, I repeat not, due to our incompetence,our environment, our customers, or our network -- in fact, not due toone single thing *under our control*.
What's different is that we have repeatedly exhausted the abilities ofthe North American support crew -- terrific as they are -- not becausewe were asking dumb questions, but because the answers have not yetbeen found.
What's different is that Ubiquiti won't send you an engineer to checkyour stuff out. Telrad's commitment to support is real, if spotty. Wehad a truly enjoyable time spending three days with Guy Shahak fromIsrael. We learned a great deal. He came ready to help, and earnestlybelieved that he would fix all problems. And guess what -- we stumpedhim too.
What's different is that if there were any thing or things that I -- orthe entire engineering resources of Telrad that have been brought tobear on this -- could do to find, fix, and permanently stop thewhack-a-mole process -- *it would be done by now*.
What's different is that I do not assume that *my* experienceinvalidates yours. I would hope you could extend the same courtesy tome. Telrad knows, at least by now, that "Works for me!" or "Works inthe lab!" does not guarantee 100% success. We too added upwards of 40UEs a month, until we were forced to stop.
How a company handles failure: that is telling.
I apologize for the length of this, and I appreciate your feedback. Thelist needs more success stories. I hope to be one of them soon.
On Thu, Mar 9, 2017 at 6:36 PM, Skywerx Support<[email protected]> wrote:
The Telrad Wispa forum is starting to remind me of the Ubiquiti forum.The same exact people posting with the same exact issues. But onlylike three people!!! Issues like why can't I put 80 people on a RocketM5, or how do I configure a Rocket M5?Why does service suck on myaccess points? Oh wait, they all have -81 signals.
We have deployed over 700 UE's on 16 eNB's. Adding around 40 new UE'sper month and at least 1 new eNB per month. It's our fastest growingproduct with the least amount of tech support calls offering speedsanywhere from 3 Mbps to 30 Mbps on the LTE network.
What's Different?

--
Justin Davis
COO
SkyWerx Industries, LLC


> On Mar 9, 2017, at 5:01 PM, Steve Cole <[email protected]> wrote:
>
>> On 3/9/2017 4:53 PM, Jeremy Austin wrote:
>> I have been fairly quiet on list about our outstanding issues,
>> thinking that they would be better solved by superiortroubleshooting
>> and Telrad engineering than by social engineering.
>
> You are certainly not alone.
> _______________________________________________
> Telrad mailing list
> [email protected]
> http://lists.wispa.org/mailman/listinfo/telrad
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad






--

Jeremy Austin



(907) 895-2311

(907) 803-5422

[email protected]



Heritage NetWorks

Whitestone Power & Communications

Vertical Broadband, LLC
Schedule a meeting: http://doodle.com/jermudgeon<http://doodle.com/jermudgeon>

_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad

Re: [Telrad] Uplink throughput again

Reply via email to