We've had EPC crashes, but I haven't gotten to the bottom of any of
them. I rebooted and moved on with the day because I have a full time
job outside of troubleshooting Telrad gear.....I made sure there's a
serial cable connected now so I can hopefully get some info out of it
next time.
Data corruption for user traffic going through the EPC
I have not noticed this one....or I didn't know what I was seeing if I
did. What are the symptoms?
In addition to Nathan's list:
* 10 second cycle to traffic through the EPC. This is reproducible even
with a single UE in a lab setup, and it's clearly tied to a process in
the Breezeway2020. There's a dip in throughput every 10 seconds, and it
gets worse with more UE. When it gets extreme you can see air link
utilization flatten out at 65-75% as your eNB cycles between full
throughput and zero throughput.
* UE getting stuck at MCS4....apparently until an S1 reset. This may or
may not be the same throughput issue that you guys were talking about
earlier in the thread.
These two are both known by Telrad and being worked on.
* Saturating the upload on an eNB can cause CQI timeouts which makes UE
disconnect from the eNB. The customer is down for 1-2 minutes. This is
the main reason not to use configuration 2.
This one (I was told) is not a bug, but simply part of how LTE
functions. I'm skeptical of that explanation, but nevertheless if
you're having better performance & reliability on configuration 1 this
might be why.
The point being that we're not having problems because we're dumb, we're
having problems because the product has a few problems. Justin, if
you're not seeing these issues, maybe you're using different equipment
(embedded core as someone said?), or maybe you'd see them if you go
looking for them.
After migrating about 1/3 of our Wimax to LTE we now have about 400 LTE
UE in service. Now we're in a holding pattern. I know we'll move
forward with the rest of the migration, but I think I'm waiting for 6.6M
first.
------ Original Message ------
From: "Nathan Anderson" <[email protected]>
To: "[email protected]" <[email protected]>
Sent: 3/14/2017 5:43:50 PM
Subject: Re: [Telrad] Uplink throughput again
There really isn't much for me to add to Jeremy's excellent response.
As far as what could be different with regard to the specific issue we
were talking about here? So many things. You could be running a
different software release...only people who have been wrestling with
(Telrad's acknowledged) issue with uplink performance likely are using
bleeding-edge eNB code to deal with it. You could be running a
different configuration that is not affected, or not affected as
noticeably. We use 15MHz channels. I'd guess not very many others do,
and until recently, different channel widths required some special
tweaks elsewhere in the config in order to get acceptable performance,
which tells you something. We also currently use subframe profile 1,
but before the latest CPE7000 firmware version, we *couldn't*, because
of bugs in the CPE firmware that just absolutely killed performance
with that configuration under some circumstances.
Just because your particular configuration does not trigger some of
these bugs does not mean that they don't exist, and I don't think it is
fair or accurate for you to compare the issues being discussed here to
the "OMG whye cain't I do 100 megawattbits on me neg85
Ubiquitay!!!!!111!!one!!" crowd.
In fact, over the last 3 months or so, the majority of our problems
have been traced back to the EPC and have absolutely zero to do with
RF. Here are some of the issues we have dealt with there, some of
which are finally fixed, and some of which are not...granted, some of
these (especially towards the bottom) are on account of us running
prerelease code, but at this point, because the fixes for the crashers
we found on 6.6 haven't been backported to 6.6, we are stuck between a
rock and a hard place and forced to play beta tester on our production
network (!!!!):
* The RADIUS/AAA (external HSS) client in the EPC crashing (causing the
whole EPC to reboot itself)
* The switch FPGA dying/resetting for several seconds at a time
(causing the RADIUS client to crash, causing reboot...beautiful domino
effect!)
* The switch FPGA dying/resetting for several MINUTES at a time (first
release we got as a fix for this made the problem worse instead of
better)
* Dedicated bearers getting carte blanche on the uplink (100%
reproducible within 5 minutes of trying to use the feature...how did
this manage to ship?)
* SGW process on the EPC crashing in a similar manner to RADIUS client
* Broken eNB tracking (gets confused about which eNB has which IP
address; may be related to SGW issue: I suspect a linked list is
getting clobbered in memory)
* Data corruption for user traffic going through the EPC (it seemingly
throws out and recomputes the TCP payload checksum, making it silent
but deadly!)
Also, if I may (since I'm on the subject of frustrations, and I don't
know where else to put this): why does every little config change
(e.g., TFTP server? Really??) on either the Compact or BreezeWay
require a full system reboot to be applied? I mean, what year is this?
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Jeremy Austin
Sent: Thursday, March 09, 2017 11:56 PM
To:[email protected]
Subject: Re: [Telrad] Uplink throughput again
Justin,
"What's Different" is an excellent question; I can't speak for others
on this list.
What's different (in my case) is that we start from the assumption that
our experience should be as positive as yours. Success stories are
great. How a company handles success stories... they print a presser.
What's different is that we start from the assumption that if there
were a problem with performance, it was *our* oversight or failure, not
Telrad's. We jump through prescribed troubleshooting hoops, repeatedly.
We invent new tests when necessary.
What's different? Telrad's initial assumptions about our network and RF
conditions prove consistently false. We schedule many complete system
outages, and (working hand in hand with Telrad) discover Telrad
hardware failure modes that (guess what) nobody has ever seen before.
What's different is that we have had a non-zero percentage of failed
Telrad equipment, and Telrad has RMAed, and we have not complained.
That can happen to anyone.
What's different is that Telrad has eventually and repeatedly confirmed
that we have done everything right: that our network is intelligently
designed and built. ("His parents were poor but honest.")
That our customers could not be better installed with attention to
signal and conditions.
That poor performance is not, I repeat not, due to our incompetence,
our environment, our customers, or our network -- in fact, not due to
one single thing *under our control*.
What's different is that we have repeatedly exhausted the abilities of
the North American support crew -- terrific as they are -- not because
we were asking dumb questions, but because the answers have not yet
been found.
What's different is that Ubiquiti won't send you an engineer to check
your stuff out. Telrad's commitment to support is real, if spotty. We
had a truly enjoyable time spending three days with Guy Shahak from
Israel. We learned a great deal. He came ready to help, and earnestly
believed that he would fix all problems. And guess what -- we stumped
him too.
What's different is that if there were any thing or things that I -- or
the entire engineering resources of Telrad that have been brought to
bear on this -- could do to find, fix, and permanently stop the
whack-a-mole process -- *it would be done by now*.
What's different is that I do not assume that *my* experience
invalidates yours. I would hope you could extend the same courtesy to
me. Telrad knows, at least by now, that "Works for me!" or "Works in
the lab!" does not guarantee 100% success. We too added upwards of 40
UEs a month, until we were forced to stop.
How a company handles failure: that is telling.
I apologize for the length of this, and I appreciate your feedback. The
list needs more success stories. I hope to be one of them soon.
On Thu, Mar 9, 2017 at 6:36 PM, Skywerx Support
<[email protected]> wrote:
The Telrad Wispa forum is starting to remind me of the Ubiquiti forum.
The same exact people posting with the same exact issues. But only
like three people!!! Issues like why can't I put 80 people on a Rocket
M5, or how do I configure a Rocket M5?Why does service suck on my
access points? Oh wait, they all have -81 signals.
We have deployed over 700 UE's on 16 eNB's. Adding around 40 new UE's
per month and at least 1 new eNB per month. It's our fastest growing
product with the least amount of tech support calls offering speeds
anywhere from 3 Mbps to 30 Mbps on the LTE network.
What's Different?
--
Justin Davis
COO
SkyWerx Industries, LLC
> On Mar 9, 2017, at 5:01 PM, Steve Cole <[email protected]> wrote:
>
>> On 3/9/2017 4:53 PM, Jeremy Austin wrote:
>> I have been fairly quiet on list about our outstanding issues,
>> thinking that they would be better solved by superior
troubleshooting
>> and Telrad engineering than by social engineering.
>
> You are certainly not alone.
> _______________________________________________
> Telrad mailing list
> [email protected]
> http://lists.wispa.org/mailman/listinfo/telrad
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad
--
Jeremy Austin
(907) 895-2311
(907) 803-5422
[email protected]
Heritage NetWorks
Whitestone Power & Communications
Vertical Broadband, LLC
Schedule a meeting: http://doodle.com/jermudgeon
<http://doodle.com/jermudgeon>
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad