Your post comes at an interesting time. For the last few weeks, we have been fighting with Telrad engineering over multiple issues. The vast majority of them have been on the EPC side (many of the bugs have been crashers or otherwise extremely service-affecting!), some of which I have detailed in past posts here. Fortunately, within the last couple of weeks we have actually made significant progress with these and the latest EPC code is the most stable we have seen in some time (though there are still one or two outstanding issues; fortunately they do not seem to rear their heads very often so by-and-large things are running smoothly).
I am still unconvinced that the dips I screenshotted a while back as recorded by my realtime SNMP grapher are "real" and not just a function of the utility I am using. As such I haven't spent much time chasing that particular issue down. However, on the eNB side we have definitely run into the exact same issue as your #2 in your list. So far, we have only noticed it on the eNBs that are running the pre-release code, and we have only upgraded our 3 most heavily-loaded eNBs to this release while the rest remain on 6.6 GA (4013). (If the other eNBs were more loaded, we might feel pressure to upgrade them past GA in order to resolve the upload performance issues that exist with the GA release but which seems to largely be dealt with in the pre-release code. As things stand, though, those sectors seem to be performing fine, so until they either start seeing more traffic/UEs or I know that this particular problem has been licked in an as-yet-unreleased eNB firmware, I will hold off on further upgrades.) We also seem to be fortunate in that it sounds like it is happening much less often to us than it is to others (you). We ran pre-release eNB code for a good 2 weeks before we first encountered the issue (or, at least before we actually noticed it). We discovered resetting RF interface cured it (S1 reset, quickly toggle spectrum analyzer on/off, etc.; reboot takes way too long). It didn't happen again for a couple of days but then happened 2 days in a row on the same eNB. During one of those times, we managed to get one of the Israeli support guys to remote in and investigate, collect logs, etc., but then after we thought he was done and we reset the S1 to that eNB, he said there were a few more things he forgot to collect and to let them know when it happens again. And of course, it hasn't! (Actually not true...it happened on a second sector, but I knew he wasn't going to be around at that very moment and I wanted to test and see if kicking all UEs off worked just as well as a full S1 reset, and sure enough it did. It simply hasn't happened again since then, so there hasn't been another opportunity for them to remote in and collect more data.) So in our case, it can sometimes be several days in between incidences. Also, sometimes we will notice (by reviewing BreezeView KPIs) that it has actually occurred but then self-correct with no action on our part, sometimes in as little as 5-10 minutes (1-2 samples). The guy I was interacting with wouldn't commit to a yes or no when I point-blank asked him if this was "yet another bug" we were chasing (I wanted to say "yet another @#$@#$%@ bug" but I bit my tongue ;-)). But given that it is now clear (thanks to your post) that others are seeing the exact same thing, I'm sure they know that they have another problem on their hands (as they CAN'T just have heard about this from the two of us), as elusive as it might be (impossible to reproduce at-will, etc.). I, too, share your view of things: frustrated with the state of the product and support infrastructure as a whole, but unwilling to pin blame on any individual I have been in contact with. At some level, I empathize with them because I have been in similar positions, chasing problems that are elusive and hard to reproduce while at the same time having (legitimately) upset customers beating you up over them. It *really* sucks, and I am *sure* that they acutely feel the pressure to get these things fixed. At the same time, the sheer quantity of issues we have experienced over the time we have owned this gear is somewhat staggering, and often it seems like we trade one issue in for another, which makes applying upgrades a scary prospect ("what new regression will we end up fighting with this version?"). Over the last few months, we have scheduled maintenance windows more times than I can count at the drop of a hat, often several times in a given week, and have allowed bleeding-edge/hot off the presses code to touch our *production* infrastructure, in essence allowing Telrad to use our network as a guinea pig so that we can aid their engineers as they work to reproduce these issues (since many of them have not been reproducible in a lab.) I personally have lost countless hours of sleep and built up a tremendous sleep debt over maintaining this system, and have fallen behind on other duties (as well as life in general) as a result. I am trying not to sound snippy here, but it's getting to the point where I'm seriously considering asking the question of what sort of compensation we should expect to get in return for all of this. At the same time, perhaps partly because I see real progress, and partly because I think I'm largely an optimist by nature, I still hold out hope that things are eventually going to work how they ought. I can't remember where I heard or read this, but my impression is that the Compact's WiMAX firmware was in a similar state for a good while when it was first introduced, but is now regarded as rock-solid. So they are probably just in a similar stage of the development and evolution of the LTE product. That doesn't make it any less frustrating, though, that we seem to have been caught in this particular stage for as long as we have. Sadly, I won't be at WISPAmerica. If you manage to get some productive face time with Telrad there, I'd love to hear about it afterward. -- Nathan From: telrad-boun...@wispa.org [mailto:telrad-boun...@wispa.org] On Behalf Of Jeremy Austin Sent: Thursday, March 09, 2017 1:53 PM To: telrad@wispa.org Subject: Re: [Telrad] Uplink throughput again On the other hand, the dips are back for us again. This is getting to be very wearing. To recap: 1) We are running the prerelease code 2) We have been having to reset S1/reboot ENBs periodically (multiple times a day on one particular sector) due to a state of stuck high RF usage 3) The dips are back (60 second cycle, significant drop in throughput for about 7 seconds) 4) Otherwise, throughput is performing *better than ever* We are now going on 8 full months with failure to resolve these. (To be fair, other manufacturers can take a while to fix things as well.) Customers are complaining. Telrad has confirmed (multiple times) that there is nothing wrong with our network/setup/UEs. They have confirmed that we have done every single thing we can do to verify that performance issues are *not* our problem, but Telrad's. However, replacing Telrad is not an option at present. I doubt this falls under any lemon laws, but I can only describe our experience of failures as systematic of core issues with the Telrad business. Individually, I take no issue with either the American or Israeli support team. Collectively, however, we have a significant problem. There is no 24/7 NOC/TAC… and even if there were, the fact that we probably couldn't rouse an engineer to observe/collect data when the trouble is occurring is a serious defect. I'm looking forward to seeing the Telrad team at Wispamerica — but with extremely mixed feelings about the support experience. I have attempted to burn no bridges, but at the same time be very clear about what I perceive as a systematic failure to deliver as promised. I have been fairly quiet on list about our outstanding issues, thinking that they would be better solved by superior troubleshooting and Telrad engineering than by social engineering. Perhaps it is time for that to change. Perhaps I am doing a disservice to other Telrad customers by keeping quiet. Thoughts? On Thu, Feb 16, 2017 at 2:40 AM, Nathan Anderson <nath...@fsr.com<mailto:nath...@fsr.com>> wrote: Ugh, this is what I get for jumping to conclusions and running my mouth off before doing just the slightest bit of investigation. I think it might somehow just be the tool I'm using to do the graphing. If I watch one of the active bandwidth tests closely while also watching the graph of the eNB that UE is attached to, I don't (always) see the same dips. Sooo, false alarm. Possibly. I'll keep watching things and report back. If it's just a graphing error/anomaly, not sure what the problem would be here. Both the tool and the switch that the eNBs are plugged into supposedly support SNMP v2c, so we shouldn't be overrunning a 32-bit integer. -- Nathan From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> [mailto:telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org>] On Behalf Of Adam Moffett Sent: Thursday, February 16, 2017 2:18 AM To: telrad@wispa.org<mailto:telrad@wispa.org> Subject: Re: [Telrad] Uplink throughput again Interesting. ------ Original Message ------ From: "Nathan Anderson" <nath...@fsr.com<mailto:nath...@fsr.com>> To: "telrad@wispa.org<mailto:telrad@wispa.org>" <telrad@wispa.org<mailto:telrad@wispa.org>> Sent: 2/16/2017 4:24:00 AM Subject: Re: [Telrad] Uplink throughput again Jeremy mentioned his periodic traffic dips to me recently off-list. I haven't seen anything exactly like what either of you two are talking about, but...attached is an interesting screenshot I just took of downlink usage on 3 separate eNBs on our network, each of which I am currently saturating (off-hours) with MT download bandwidth test (occurring behind 1 UE on each sector, and each UE has been temporarily granted 100Mbit downlink AMBR). Notice the little icicle-like formations? Also notice how they seem to be fairly regular, and also seem to occur at the exact same interval on every sector, but don't perfectly line up with each other? WTF is *that* about? -- Nathan From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> [mailto:telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org>] On Behalf Of Jeremy Austin Sent: Wednesday, February 15, 2017 8:44 PM To: Adam Moffett; telrad@wispa.org<mailto:telrad@wispa.org> Subject: Re: [Telrad] Uplink throughput again Adam, I'm going to assume that no other traffic on the same equipment (sans EPC and ENB) show this periodicity? I have seen something in the same ballpark, but not identical, since August. I have been planning to post it to the list to get more eyes on it (after letting Telrad have some time to look at it first). Just wanted to check that you had isolated the behavior entirely to LTE, and not routers/backhauls/switches. On Wed, Feb 15, 2017 at 7:15 PM Adam Moffett <ad...@clarityconnect.com<mailto:ad...@clarityconnect.com>> wrote: Weird. Maybe overflow from the dedicated bearer falls into the default bearer? I also have to wonder if it's a bug in the UE. It seems like it must fall on the UE to ultimately enforce the rate limit. In our uplink throughput issue, I might have tripped over something of interest. I originally reported to Telrad that I was getting about half of what I expect for UL throughput. Now I think we actually do get the expected throughput, but only for a moment. Five seconds later there's next to nothing, then 5 seconds later back to full speed, and so on. I see it when looking at the realtime traffic display on our switch port, but on your typical chart with a 5 minute average it just looks like you're getting half speed. Weird thing is that it's not happening all the time. I started iPerf on 6 UE at one site at 4am the other day and when looking at traffic at the switch port I saw a perfect sine wave with 10 seconds peak to peak. Later that day I repeated the test to show one of my co-workers and the damn thing wouldn't do it. I don't know what to make of it yet. ------ Original Message ------ From: "Nathan Anderson" <nath...@fsr.com<mailto:nath...@fsr.com>> To: "telrad@wispa.org<mailto:telrad@wispa.org>" <telrad@wispa.org<mailto:telrad@wispa.org>>; "'Adam Moffett'" <ad...@clarityconnect.com<mailto:ad...@clarityconnect.com>> Sent: 2/10/2017 3:59:40 PM Subject: RE: [Telrad] Uplink throughput again So last night, I re-ran this test again, and captured the whole thing not just at the edge of the LTE network coming out of the EPC, but between the EPC and eNB, so that I could grab the user traffic together with the encapsulating GTP headers. What I found was that when traffic comes from behind the UE with the proper DSCP value set, it DOES get transmitted by the UE on the dedicated bearer, but the MBR is still not being enforced. I had a 10Mbit/s UL AMBR configured and a 256Kbit/s UL MBR set on the dedicated bearer, and when I ran an upload test on the dedicated bearer, it hit 10 megs. (Download test on the dedicated bearer was limited to the configured 256Kbit/s DL MBR.) What makes this so bizarre is that even if there is a bug that causes the system (which part?) to not enforce the configured rate limit for the dedicated bearer on the uplink, the UE AMBR should not be taken into account for GBR bearers, as discussed before. But it sure seems like what is happening is that whatever is supposed to be policing the uplink is mistakenly enforcing the UE UL AMBR on the dedicated bearer instead of the UL MBR. Ticket opened with Telrad. -- Nathan From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> [mailto:telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org>] On Behalf Of Nathan Anderson Sent: Monday, February 06, 2017 3:56 PM To: 'Adam Moffett'; telrad@wispa.org<mailto:telrad@wispa.org> Subject: Re: [Telrad] Uplink throughput again Then maybe the problem is not that the properly-marked upload traffic isn't getting transmitted on the right bearer, but rather that the UL GBR/MBR are not being enforced? Whose responsibility is enforcement of bitrates on uplink? The UE's? The eNB? The EPC? A little of columns A, B, and C? -- Nathan From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> [mailto:telrad-boun...@wispa.org] On Behalf Of Adam Moffett Sent: Monday, February 06, 2017 2:50 PM To: telrad@wispa.org<mailto:telrad@wispa.org> Subject: Re: [Telrad] Uplink throughput again Somewhere there must be traffic counters for each QCI, or for individual bearers, or something. Without seeing them it's hard to say for sure. On a busy eNB (50+ UE), I tried changing the mgmt DSCP value on an individual UE from 6 to 5 and testing before and after. With the UE set to DSCP 5 for mgmt, I get 0.1 mbps upload and 7% packet loss (500 byte pings, 0.1 second interval) On DSCP 6 I get 0.5mbps and 0% packet loss. That's not scientific rigor, but it seems like it's working. On a lighter loaded eNB I was actually getting slightly more UL throughput with the UE Mgmt DSCP set to 5. I don't know why. -Adam ------ Original Message ------ From: "Nathan Anderson" <nath...@fsr.com<mailto:nath...@fsr.com>> To: "telrad@wispa.org<mailto:telrad@wispa.org>" <telrad@wispa.org<mailto:telrad@wispa.org>>; "'Adam Moffett'" <ad...@clarityconnect.com<mailto:ad...@clarityconnect.com>> Sent: 2/6/2017 5:11:49 PM Subject: RE: [Telrad] Uplink throughput again ...also, I still remain unconvinced that the UEs are transmitting any upload traffic -- even when properly marked with the right DSCP -- on the dedicated bearer. Until it is proven beyond a doubt that this works, testing upload capacity using dedicated bearers is probably a waste of time because it isn't doing what you think it is doing. I have tested both CPE7000 and CPE8000 at this point, and have the same issue on both, so I don't think it is a CPE firmware bug (that would be a freaky coincidence, given that both CPEs are contract-manufactured by different companies). So I don't know if this is me being stupid and not configuring my EPCs correctly, or what. But something is not working here. -- Nathan From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> [mailto:telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org>] On Behalf Of Nathan Anderson Sent: Monday, February 06, 2017 2:06 PM To: 'Adam Moffett'; telrad@wispa.org<mailto:telrad@wispa.org> Subject: Re: [Telrad] Uplink throughput again Something that I learned that I should point out: A dedicated bearer with a higher priority should take precedence over default bearer traffic, yes. But from what I can tell, LTE spec. does not have a way of putting a total speed cap on the entire UE across any and all bearers. The UE AMBRs only restrict all non-GBR bearers (default or not, even across multiple APNs) but does NOT take into account GBR bearers, and QCI 1 is GBR. What this means is that, for example, if you have a default bearer with QCI 6, and dedicated bearer with QCI 1, and the UE DL and UL AMBRs are set to 10 and 1 Mbit/s respectively, and your dedicated bearer's MBRs are set to 5 and 0.5 (half of the UE AMBRs, for the sake of this example), you haven't actually set up things such that up to half of the subscriber's AMBRs are given priority on the dedicated bearer, leaving that user half of his total bandwidth if you end up filling the dedicated bearer up to its MBR in both directions. No, instead because the GBR QCIs are not accounted for within the AMBR, the user can move up to 5x0.5 on the dedicated bearer and *simultaneously* also move up to 10x1 (assuming there is enough sector capacity at the time) on the default bearer. Maybe in some cases, this is desireable. If you use QCI 1 for VoIP, for example, then you are effectively providing the customer with a separate channel for their voice calls that does not dip into their configured speed package, but is instead additive. But it is something to keep in mind as you are planning and building your network as well as running tests. -- Nathan From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> [mailto:telrad-boun...@wispa.org] On Behalf Of Adam Moffett Sent: Monday, February 06, 2017 1:48 PM To: telrad@wispa.org<mailto:telrad@wispa.org> Subject: Re: [Telrad] Uplink throughput again The EPC and most of the eNB are running the latest general release available on Zendesk. A couple of eNB are running some kind of maintenance release that support wanted us to try. I'm making sure to run iPerf on the dedicated bearer to eliminate other user traffic from weaker UE as a factor. At QCI 1 it should take precedence over the default bearer traffic. I would definitely take the time to set one up, not necessarily for this purpose, but rather to ensure you always have access to your UE. If the default bearer is hosed with a torrent and you don't have a dedicated bearer for management access then you can be completely locked out of the unit. Monitoring, management access, and firmware updates all work more reliably with the dedicated bearer and I'd strongly recommend it. There's a knowledge base article in Zendesk about it. Use DSCP 6 because that's tagged by default in the UE. ------ Original Message ------ From: "Jeremy Austin" <jhaus...@gmail.com<mailto:jhaus...@gmail.com>> To: "Adam Moffett" <ad...@clarityconnect.com<mailto:ad...@clarityconnect.com>>; telrad@wispa.org<mailto:telrad@wispa.org> Sent: 2/6/2017 4:30:43 PM Subject: Re: [Telrad] Uplink throughput again On Mon, Feb 6, 2017 at 12:20 PM, Adam Moffett <ad...@clarityconnect.com<mailto:ad...@clarityconnect.com>> wrote: Can somebody tell me if they're getting expected uplink throughput? What ENB and EPC revisions are you at, Adam? We're investigating this same issue ourselves, although we haven't tried a dedicated bearer. -- Jeremy Austin (907) 895-2311<tel:(907)%20895-2311> (907) 803-5422<tel:(907)%20803-5422> jhaus...@gmail.com<mailto:jhaus...@gmail.com> Heritage NetWorks Whitestone Power & Communications Vertical Broadband, LLC Schedule a meeting: http://doodle.com/jermudgeon _______________________________________________ Telrad mailing list Telrad@wispa.org<mailto:Telrad@wispa.org> http://lists.wispa.org/mailman/listinfo/telrad _______________________________________________ Telrad mailing list Telrad@wispa.org<mailto:Telrad@wispa.org> http://lists.wispa.org/mailman/listinfo/telrad -- Jeremy Austin (907) 895-2311 (907) 803-5422 jhaus...@gmail.com<mailto:jhaus...@gmail.com> Heritage NetWorks Whitestone Power & Communications Vertical Broadband, LLC Schedule a meeting: http://doodle.com/jermudgeon
_______________________________________________ Telrad mailing list Telrad@wispa.org http://lists.wispa.org/mailman/listinfo/telrad