but my impression is that the Compact's WiMAX firmware was in a similar
state for a good while when it was first introduced, but is now
regarded as rock-solid
We ran Wimax on the Compact for almost 3 years. It's not rock-solid in
the sense of being bug free, but the bugs are known and mostly
avoidable. There was a recurring reset issue that was only resolved a
few months ago.
------ Original Message ------
From: "Nathan Anderson" <[email protected]>
To: "[email protected]" <[email protected]>
Sent: 3/9/2017 5:46:25 PM
Subject: Re: [Telrad] Uplink throughput again
Your post comes at an interesting time.
For the last few weeks, we have been fighting with Telrad engineering
over multiple issues. The vast majority of them have been on the EPC
side (many of the bugs have been crashers or otherwise extremely
service-affecting!), some of which I have detailed in past posts here.
Fortunately, within the last couple of weeks we have actually made
significant progress with these and the latest EPC code is the most
stable we have seen in some time (though there are still one or two
outstanding issues; fortunately they do not seem to rear their heads
very often so by-and-large things are running smoothly).
I am still unconvinced that the dips I screenshotted a while back as
recorded by my realtime SNMP grapher are "real" and not just a function
of the utility I am using. As such I haven't spent much time chasing
that particular issue down. However, on the eNB side we have
definitely run into the exact same issue as your #2 in your list. So
far, we have only noticed it on the eNBs that are running the
pre-release code, and we have only upgraded our 3 most heavily-loaded
eNBs to this release while the rest remain on 6.6 GA (4013). (If the
other eNBs were more loaded, we might feel pressure to upgrade them
past GA in order to resolve the upload performance issues that exist
with the GA release but which seems to largely be dealt with in the
pre-release code. As things stand, though, those sectors seem to be
performing fine, so until they either start seeing more traffic/UEs or
I know that this particular problem has been licked in an
as-yet-unreleased eNB firmware, I will hold off on further upgrades.)
We also seem to be fortunate in that it sounds like it is happening
much less often to us than it is to others (you). We ran pre-release
eNB code for a good 2 weeks before we first encountered the issue (or,
at least before we actually noticed it). We discovered resetting RF
interface cured it (S1 reset, quickly toggle spectrum analyzer on/off,
etc.; reboot takes way too long). It didn't happen again for a couple
of days but then happened 2 days in a row on the same eNB. During one
of those times, we managed to get one of the Israeli support guys to
remote in and investigate, collect logs, etc., but then after we
thought he was done and we reset the S1 to that eNB, he said there were
a few more things he forgot to collect and to let them know when it
happens again.
And of course, it hasn't! (Actually not true...it happened on a second
sector, but I knew he wasn't going to be around at that very moment and
I wanted to test and see if kicking all UEs off worked just as well as
a full S1 reset, and sure enough it did. It simply hasn't happened
again since then, so there hasn't been another opportunity for them to
remote in and collect more data.) So in our case, it can sometimes be
several days in between incidences. Also, sometimes we will notice (by
reviewing BreezeView KPIs) that it has actually occurred but then
self-correct with no action on our part, sometimes in as little as 5-10
minutes (1-2 samples).
The guy I was interacting with wouldn't commit to a yes or no when I
point-blank asked him if this was "yet another bug" we were chasing (I
wanted to say "yet another @#$@#$%@ bug" but I bit my tongue ;-)). But
given that it is now clear (thanks to your post) that others are seeing
the exact same thing, I'm sure they know that they have another problem
on their hands (as they CAN'T just have heard about this from the two
of us), as elusive as it might be (impossible to reproduce at-will,
etc.).
I, too, share your view of things: frustrated with the state of the
product and support infrastructure as a whole, but unwilling to pin
blame on any individual I have been in contact with. At some level, I
empathize with them because I have been in similar positions, chasing
problems that are elusive and hard to reproduce while at the same time
having (legitimately) upset customers beating you up over them. It
*really* sucks, and I am *sure* that they acutely feel the pressure to
get these things fixed. At the same time, the sheer quantity of issues
we have experienced over the time we have owned this gear is somewhat
staggering, and often it seems like we trade one issue in for another,
which makes applying upgrades a scary prospect ("what new regression
will we end up fighting with this version?"). Over the last few
months, we have scheduled maintenance windows more times than I can
count at the drop of a hat, often several times in a given week, and
have allowed bleeding-edge/hot off the presses code to touch our
*production* infrastructure, in essence allowing Telrad to use our
network as a guinea pig so that we can aid their engineers as they work
to reproduce these issues (since many of them have not been
reproducible in a lab.) I personally have lost countless hours of
sleep and built up a tremendous sleep debt over maintaining this
system, and have fallen behind on other duties (as well as life in
general) as a result. I am trying not to sound snippy here, but it's
getting to the point where I'm seriously considering asking the
question of what sort of compensation we should expect to get in return
for all of this.
At the same time, perhaps partly because I see real progress, and
partly because I think I'm largely an optimist by nature, I still hold
out hope that things are eventually going to work how they ought. I
can't remember where I heard or read this, but my impression is that
the Compact's WiMAX firmware was in a similar state for a good while
when it was first introduced, but is now regarded as rock-solid. So
they are probably just in a similar stage of the development and
evolution of the LTE product. That doesn't make it any less
frustrating, though, that we seem to have been caught in this
particular stage for as long as we have.
Sadly, I won't be at WISPAmerica. If you manage to get some productive
face time with Telrad there, I'd love to hear about it afterward.
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Jeremy Austin
Sent: Thursday, March 09, 2017 1:53 PM
To:[email protected]
Subject: Re: [Telrad] Uplink throughput again
On the other hand, the dips are back for us again. This is getting to
be very wearing.
To recap:
1) We are running the prerelease code
2) We have been having to reset S1/reboot ENBs periodically (multiple
times a day on one particular sector) due to a state of stuck high RF
usage
3) The dips are back (60 second cycle, significant drop in throughput
for about 7 seconds)
4) Otherwise, throughput is performing *better than ever*
We are now going on 8 full months with failure to resolve these. (To be
fair, other manufacturers can take a while to fix things as well.)
Customers are complaining.
Telrad has confirmed (multiple times) that there is nothing wrong with
our network/setup/UEs. They have confirmed that we have done every
single thing we can do to verify that performance issues are *not* our
problem, but Telrad's.
However, replacing Telrad is not an option at present.
I doubt this falls under any lemon laws, but I can only describe our
experience of failures as systematic of core issues with the Telrad
business. Individually, I take no issue with either the American or
Israeli support team.
Collectively, however, we have a significant problem. There is no 24/7
NOC/TAC… and even if there were, the fact that we probably couldn't
rouse an engineer to observe/collect data when the trouble is occurring
is a serious defect.
I'm looking forward to seeing the Telrad team at Wispamerica — but with
extremely mixed feelings about the support experience. I have attempted
to burn no bridges, but at the same time be very clear about what I
perceive as a systematic failure to deliver as promised.
I have been fairly quiet on list about our outstanding issues, thinking
that they would be better solved by superior troubleshooting and Telrad
engineering than by social engineering.
Perhaps it is time for that to change. Perhaps I am doing a disservice
to other Telrad customers by keeping quiet.
Thoughts?
On Thu, Feb 16, 2017 at 2:40 AM, Nathan Anderson <[email protected]>
wrote:
Ugh, this is what I get for jumping to conclusions and running my mouth
off before doing just the slightest bit of investigation.
I think it might somehow just be the tool I'm using to do the graphing.
If I watch one of the active bandwidth tests closely while also
watching the graph of the eNB that UE is attached to, I don't (always)
see the same dips.
Sooo, false alarm. Possibly. I'll keep watching things and report
back.
If it's just a graphing error/anomaly, not sure what the problem would
be here. Both the tool and the switch that the eNBs are plugged into
supposedly support SNMP v2c, so we shouldn't be overrunning a 32-bit
integer.
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Adam Moffett
Sent: Thursday, February 16, 2017 2:18 AM
To:[email protected]
Subject: Re: [Telrad] Uplink throughput again
Interesting.
------ Original Message ------
From: "Nathan Anderson" <[email protected]>
To: "[email protected]" <[email protected]>
Sent: 2/16/2017 4:24:00 AM
Subject: Re: [Telrad] Uplink throughput again
Jeremy mentioned his periodic traffic dips to me recently off-list. I
haven't seen anything exactly like what either of you two are talking
about, but...attached is an interesting screenshot I just took of
downlink usage on 3 separate eNBs on our network, each of which I am
currently saturating (off-hours) with MT download bandwidth test
(occurring behind 1 UE on each sector, and each UE has been
temporarily granted 100Mbit downlink AMBR).
Notice the little icicle-like formations? Also notice how they seem
to be fairly regular, and also seem to occur at the exact same
interval on every sector, but don't perfectly line up with each other?
WTF is *that* about?
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Jeremy Austin
Sent: Wednesday, February 15, 2017 8:44 PM
To: Adam Moffett; [email protected]
Subject: Re: [Telrad] Uplink throughput again
Adam, I'm going to assume that no other traffic on the same equipment
(sans EPC and ENB) show this periodicity?
I have seen something in the same ballpark, but not identical, since
August. I have been planning to post it to the list to get more eyes
on it (after letting Telrad have some time to look at it first).
Just wanted to check that you had isolated the behavior entirely to
LTE, and not routers/backhauls/switches.
On Wed, Feb 15, 2017 at 7:15 PM Adam Moffett
<[email protected]> wrote:
Weird. Maybe overflow from the dedicated bearer falls into the
default bearer? I also have to wonder if it's a bug in the UE. It
seems like it must fall on the UE to ultimately enforce the rate
limit.
In our uplink throughput issue, I might have tripped over something
of interest. I originally reported to Telrad that I was getting
about half of what I expect for UL throughput. Now I think we
actually do get the expected throughput, but only for a moment. Five
seconds later there's next to nothing, then 5 seconds later back to
full speed, and so on. I see it when looking at the realtime traffic
display on our switch port, but on your typical chart with a 5 minute
average it just looks like you're getting half speed.
Weird thing is that it's not happening all the time. I started iPerf
on 6 UE at one site at 4am the other day and when looking at traffic
at the switch port I saw a perfect sine wave with 10 seconds peak to
peak. Later that day I repeated the test to show one of my
co-workers and the damn thing wouldn't do it.
I don't know what to make of it yet.
------ Original Message ------
From: "Nathan Anderson" <[email protected]>
To: "[email protected]" <[email protected]>; "'Adam Moffett'"
<[email protected]>
Sent: 2/10/2017 3:59:40 PM
Subject: RE: [Telrad] Uplink throughput again
So last night, I re-ran this test again, and captured the whole
thing not just at the edge of the LTE network coming out of the EPC,
but between the EPC and eNB, so that I could grab the user traffic
together with the encapsulating GTP headers.
What I found was that when traffic comes from behind the UE with the
proper DSCP value set, it DOES get transmitted by the UE on the
dedicated bearer, but the MBR is still not being enforced. I had a
10Mbit/s UL AMBR configured and a 256Kbit/s UL MBR set on the
dedicated bearer, and when I ran an upload test on the dedicated
bearer, it hit 10 megs. (Download test on the dedicated bearer was
limited to the configured 256Kbit/s DL MBR.)
What makes this so bizarre is that even if there is a bug that
causes the system (which part?) to not enforce the configured rate
limit for the dedicated bearer on the uplink, the UE AMBR should not
be taken into account for GBR bearers, as discussed before. But it
sure seems like what is happening is that whatever is supposed to be
policing the uplink is mistakenly enforcing the UE UL AMBR on the
dedicated bearer instead of the UL MBR.
Ticket opened with Telrad.
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Nathan Anderson
Sent: Monday, February 06, 2017 3:56 PM
To: 'Adam Moffett'; [email protected] <mailto:[email protected]>
Subject: Re: [Telrad] Uplink throughput again
Then maybe the problem is not that the properly-marked upload
traffic isn't getting transmitted on the right bearer, but rather
that the UL GBR/MBR are not being enforced?
Whose responsibility is enforcement of bitrates on uplink? The
UE's? The eNB? The EPC? A little of columns A, B, and C?
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Adam Moffett
Sent: Monday, February 06, 2017 2:50 PM
To: [email protected] <mailto:[email protected]>
Subject: Re: [Telrad] Uplink throughput again
Somewhere there must be traffic counters for each QCI, or for
individual bearers, or something. Without seeing them it's hard to
say for sure.
On a busy eNB (50+ UE), I tried changing the mgmt DSCP value on an
individual UE from 6 to 5 and testing before and after.
With the UE set to DSCP 5 for mgmt, I get 0.1 mbps upload and 7%
packet loss (500 byte pings, 0.1 second interval)
On DSCP 6 I get 0.5mbps and 0% packet loss.
That's not scientific rigor, but it seems like it's working.
On a lighter loaded eNB I was actually getting slightly more UL
throughput with the UE Mgmt DSCP set to 5. I don't know why.
-Adam
------ Original Message ------
From: "Nathan Anderson" <[email protected]>
To: "[email protected]" <[email protected]>; "'Adam Moffett'"
<[email protected]>
Sent: 2/6/2017 5:11:49 PM
Subject: RE: [Telrad] Uplink throughput again
...also, I still remain unconvinced that the UEs are transmitting
any upload traffic -- even when properly marked with the right DSCP
-- on the dedicated bearer. Until it is proven beyond a doubt that
this works, testing upload capacity using dedicated bearers is
probably a waste of time because it isn't doing what you think it
is doing.
I have tested both CPE7000 and CPE8000 at this point, and have the
same issue on both, so I don't think it is a CPE firmware bug (that
would be a freaky coincidence, given that both CPEs are
contract-manufactured by different companies). So I don't know if
this is me being stupid and not configuring my EPCs correctly, or
what. But something is not working here.
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Nathan Anderson
Sent: Monday, February 06, 2017 2:06 PM
To: 'Adam Moffett'; [email protected] <mailto:[email protected]>
Subject: Re: [Telrad] Uplink throughput again
Something that I learned that I should point out:
A dedicated bearer with a higher priority should take precedence
over default bearer traffic, yes. But from what I can tell, LTE
spec. does not have a way of putting a total speed cap on the
entire UE across any and all bearers. The UE AMBRs only restrict
all non-GBR bearers (default or not, even across multiple APNs) but
does NOT take into account GBR bearers, and QCI 1 is GBR.
What this means is that, for example, if you have a default bearer
with QCI 6, and dedicated bearer with QCI 1, and the UE DL and UL
AMBRs are set to 10 and 1 Mbit/s respectively, and your dedicated
bearer's MBRs are set to 5 and 0.5 (half of the UE AMBRs, for the
sake of this example), you haven't actually set up things such that
up to half of the subscriber's AMBRs are given priority on the
dedicated bearer, leaving that user half of his total bandwidth if
you end up filling the dedicated bearer up to its MBR in both
directions. No, instead because the GBR QCIs are not accounted for
within the AMBR, the user can move up to 5x0.5 on the dedicated
bearer and *simultaneously* also move up to 10x1 (assuming there is
enough sector capacity at the time) on the default bearer.
Maybe in some cases, this is desireable. If you use QCI 1 for
VoIP, for example, then you are effectively providing the customer
with a separate channel for their voice calls that does not dip
into their configured speed package, but is instead additive. But
it is something to keep in mind as you are planning and building
your network as well as running tests.
-- Nathan
From:[email protected] [mailto:[email protected]] On
Behalf Of Adam Moffett
Sent: Monday, February 06, 2017 1:48 PM
To: [email protected] <mailto:[email protected]>
Subject: Re: [Telrad] Uplink throughput again
The EPC and most of the eNB are running the latest general release
available on Zendesk.
A couple of eNB are running some kind of maintenance release that
support wanted us to try.
I'm making sure to run iPerf on the dedicated bearer to eliminate
other user traffic from weaker UE as a factor. At QCI 1 it should
take precedence over the default bearer traffic.
I would definitely take the time to set one up, not necessarily for
this purpose, but rather to ensure you always have access to your
UE. If the default bearer is hosed with a torrent and you don't
have a dedicated bearer for management access then you can be
completely locked out of the unit. Monitoring, management access,
and firmware updates all work more reliably with the dedicated
bearer and I'd strongly recommend it. There's a knowledge base
article in Zendesk about it. Use DSCP 6 because that's tagged by
default in the UE.
------ Original Message ------
From: "Jeremy Austin" <[email protected]>
To: "Adam Moffett" <[email protected]>; [email protected]
Sent: 2/6/2017 4:30:43 PM
Subject: Re: [Telrad] Uplink throughput again
On Mon, Feb 6, 2017 at 12:20 PM, Adam Moffett
<[email protected]> wrote:
Can somebody tell me if they're getting expected uplink
throughput?
What ENB and EPC revisions are you at, Adam?
We're investigating this same issue ourselves, although we haven't
tried a dedicated bearer.
--
Jeremy Austin
(907) 895-2311 <tel:(907)%20895-2311>
(907) 803-5422 <tel:(907)%20803-5422>
[email protected]
Heritage NetWorks
Whitestone Power & Communications
Vertical Broadband, LLC
Schedule a meeting: http://doodle.com/jermudgeon
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad
--
Jeremy Austin
(907) 895-2311
(907) 803-5422
[email protected]
Heritage NetWorks
Whitestone Power & Communications
Vertical Broadband, LLC
Schedule a meeting: http://doodle.com/jermudgeon
<http://doodle.com/jermudgeon>
_______________________________________________
Telrad mailing list
[email protected]
http://lists.wispa.org/mailman/listinfo/telrad