So, we do have an update on this situation.

The FCS errors that we get on ports 9-12 appear to just be symptomatic of a 
bigger problem.  They occur on those ports (and only those ports) whenever the 
EPC decides to reinitialize all of the ethernet ports / the switching 
infrastructure, which it decides to do (at least in our case) when it gets into 
a certain state where it claims that the transmit queues are "full" (according 
to the log entries we see).  When this happens, it reinitializes the ports, 
which makes the EPC unavailable/unreachable for a good 4-5 seconds.  During 
this reinit, we see 2 frames with incorrectly computed CRC received from ports 
9-12 but not the other ports.

Moving away from any port 9-12 only made it so that we don't see those FCS 
errors.  It did not address the underlying problem, which is that the EPC 
switch gets into a state that it apparently only knows how to deal with by 
resetting it completely, which knocks the EPC off-line.  This is the actual 
reason why we are experiencing the SCTP failures to some of our eNBs.  The FCS 
errors on those particular ports are just another symptom (and apparently 
harmless).

I happened to be watching the network at the exact instant that one of these 
network "burps" happened, and I was also tailing the tlsyslog file at that same 
moment as well.  Here is what was written to tlsyslog when the EPC became 
unresponsive:

16118:2017-02-20,14:34:05.859171:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5696):Queue
 is full for port 102
16119:2017-02-20,14:34:05.859638:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5701):Tx
 counter: 16681286/0
16120:2017-02-20,14:34:05.859712:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5716):Save
 state for port 102, queue full:Yes, mcqueue full:No, cnt:16681286
16121:2017-02-20,14:34:14.569762:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5696):Queue
 is full for port 102
16122:2017-02-20,14:34:14.570168:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5701):Tx
 counter: 16681286/16681286
16123:2017-02-20,14:34:14.570279:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5706):Queue
 full condition detected for port 102
16124:2017-02-20,14:34:21.717955:NOTICE:0:06.06.02161:MEAAPI:1008:mea_adapter.cpp:QueueFullControlHandler(5789):MEA
 FPGA reinit is triggered by queue control engine

I found similar-looking errors in tlsyslogs from the past, and I have a feeling 
that if I took the time, I could correlate every SCTP failure that we have seen 
(as well as every logged FCS error) to one of these.

I have already updated my tickets with Telrad to include this new finding.

-- Nathan

From: telrad-boun...@wispa.org [mailto:telrad-boun...@wispa.org] On Behalf Of 
Tristan Johnson
Sent: Monday, February 20, 2017 3:00 PM
To: telrad@wispa.org
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

Our low usage EPC uses Network and access on the same port.
Plugged in to port 9 I think on the EPC (we wanted to take advantage of the 
small surge protection on those last 4 ports), into a MT CRS125-24G-1S we have 
no FCS errors. Probably a 3' cat5e patch cable.

We just put our first customer on it a couple days ago though.

Thanks,
Tristan Johnson
Owner
[cid:image001.png@01D28B8F.FBFE0AC0]
www.wirelessdatanet.net<http://wirelessdatanet.net/>
309-893-4152



------ Original Message ------
From: "Nathan Anderson" <nath...@fsr.com<mailto:nath...@fsr.com>>
To: "'telrad@wispa.org'" <telrad@wispa.org<mailto:telrad@wispa.org>>
Sent: 2/17/2017 2:11:51 PM
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

Update on this: to be clear, we are running (as I think I have stated in the 
past) a network-if + access-if config.  And it was the access-if port that has 
been showing the FCS errors.

I realized that the network-if port, which is plugged into an 1100AHx2, has 
been fine.  1100 shows no FCS receive errors.  Huh.  The network-if port 
happened to be port 8, so I got the idea of swapping my network-if port and 
access-if port, and now network-if is port 11 and access-if is port 8.

No more FCS errors, now that I am using port *8*.

??????????????

Also, no FCS errors on port 11 now that it is configured as a network-if port.

?!?!?!?!?!?!?!?!?!?!?!?!?!

Our backup EPC, which is configured identically, has network-if on 8 and 
access-if on 11, and I cannot reproduce the FCS error problem on that box, 
either.

This is SUPER bizarre.  On the bright side, I haven't had an SCTP reset since I 
made this change.

-- Nathan

From: Nathan Anderson
Sent: Friday, February 17, 2017 2:01 AM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: RE: [Telrad] Ethernet RX FCS errors from BreezeWay?

I decided to take a switch that we had laying around, and put it in between the 
EPC and the CCR.

Wouldn't you know it: the FCS errors started showing up on the switchport that 
the EPC is plugged into.  This isn't an EPC <-> CCR issue.  This, at least for 
us, is purely an EPC issue.

This particular issue is really starting to get on my nerves...

-- Nathan

From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> 
[mailto:telrad-boun...@wispa.org] On Behalf Of Nathan Anderson
Sent: Thursday, February 16, 2017 11:11 PM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

The cables we were using were 7ft pre-manufactured 5e patch cables.  So for 
grins, I took 10ft of Cat. 6 off of a spool and put my own ends on.

10 minutes later, what happened?  FCS ERROR.

https://getyarn.io/yarn-clip/0f901d47-7954-40e1-a539-b6cea39d93c0
https://getyarn.io/yarn-clip/a8ae5833-4912-4a5a-81ce-ae8d2cdd2823

-- Nathan

From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> 
[mailto:telrad-boun...@wispa.org] On Behalf Of Skywerx Support
Sent: Thursday, February 16, 2017 5:53 AM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

We went from like 1 meter to 2 meter and from port 5 into port nine on two 
different EPC's and that did the trick at the locations we were having an issue.
--
Justin Davis
COO
SkyWerx Industries, LLC

On Feb 16, 2017, at 5:54 AM, Nathan Anderson 
<nath...@fsr.com<mailto:nath...@fsr.com>> wrote:
Port 9 was the other port I tried.  No difference.

How much is "a bit" longer?  The cable currently being used is I think between 
7 and 10 ft (I'll check tomorrow).

-- Nathan
________________________________
From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> 
<telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org>> on behalf of 
Skywerx Support <jus...@skywerxsupport.com<mailto:jus...@skywerxsupport.com>>
Sent: Thursday, February 16, 2017 4:19 AM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

Nathan have you used a bit longer cable from EPC to router and use port 9.  
This fixed our FCS error issue.
--
Justin Davis
COO
SkyWerx Industries, LLC

On Feb 16, 2017, at 5:00 AM, Nathan Anderson 
<nath...@fsr.com<mailto:nath...@fsr.com>> wrote:
So, as an update to this, we are still seeing these FCS errors between the EPC 
(port 11) and the CCR-1036.

I know it is not just a reporting error, because we have been having issues for 
a while with UEs dropping off randomly, and until now we just assumed the 
underlying cause was whatever issue supposedly solves this for others in the 
upcoming 6.6M2 release.  However, looking at the EPC logs, it is clear that the 
UEs are dropping at least in our case because the SCTP session between the EPC 
and some eNB on the network breaks down.  And every time this happens, I can 
take the timestamp from the EPC log of the SCTP failure, go over to the CCR, 
look through its logs, and find an FCS error log entry with a timestamp that 
matches up with the SCTP failure *exactly*.  Not every FCS error results in an 
eNB disconnect, but every eNB disconnect can be traced back to an FCS error.

Soo, clearly we need to get this fixed.  To that end, I have tried the 
following:

1) Replaced the ethernet cable
2) Tried a different port on the CCR
3) Tried a different port on the EPC (that's always fun, because "reboots")
4) REPLACED THE CCR

We are still seeing FCS receive errors on that port and that port only.

Anybody have any other ideas?

-- Nathan

From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> 
[mailto:telrad-boun...@wispa.org] On Behalf Of Nathan Anderson
Sent: Monday, January 30, 2017 1:31 PM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

We have always had a habit of doing ths.  And we are on 6.32.x.

Searching through the ROS changelogs (which are a lot more verbose these days 
than they used to be) for "FCS" revealed nothing interesting.

-- Nathan

From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> 
[mailto:telrad-boun...@wispa.org] On Behalf Of Shayne Lebrun
Sent: Monday, January 30, 2017 7:15 AM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

And always make sure, when upgrading  RouterOS, that you then upgrade the 
Routerboard firmware.  I think it happens semi-automatically now; you no longer 
need to do a /system routerboard upgrade,’ but you do still need a second 
reboot.

From: telrad-boun...@wispa.org<mailto:telrad-boun...@wispa.org> 
[mailto:telrad-boun...@wispa.org] On Behalf Of Jesse Dupont
Sent: Monday, January 30, 2017 8:51 AM
To: telrad@wispa.org<mailto:telrad@wispa.org>
Subject: Re: [Telrad] Ethernet RX FCS errors from BreezeWay?

What RouterOS version is on this CCR? I think around the 6.34 mark, these 
started showing up in some circumstances, but a RouterOS upgrade resolves them 
(may just be a reporting error or hardware driver bug).


On Mon, Jan 30, 2017 at 12:37 AM -0700, "Nathan Anderson" 
<nath...@fsr.com<mailto:nath...@fsr.com>> wrote:
We have recently noticed a new problem: we have an access port on our BreezeWay 
(in this case, so happens it's the BW's port 10, but may not be relevant) that 
is plugged into a MikroTik CCR 1036.  The MikroTik is reporting that it is 
sporadically seeing FCS errors on frames received from the BreezeWay.  I have 
replaced the ethernet cable and also tried moving to a different ethernet port 
on the CCR.  Neither has made a difference.

I'm wondering if there is any way I can see any ethernet stats or diagnostic 
information from the BreezeWay's perspective.  In my poking around, so far I 
have come up empty.

I'm not necessarily convinced at this point that this is the BreezeWay's fault, 
mind you.  I'm just wondering if anybody else has seen something similar, and 
how to best go about chasing this problem down.  My perception is that the CCRs 
in particular have had a troubled history when it comes to its copper gig 
ports...search for "Ubiquiti AirFiber MikroTik CCR" if you want some fun 
afternoon light reading.

Thanks,

--
Nathan Anderson
First Step Internet, LLC
nath...@fsr.com<mailto:nath...@fsr.com>

_______________________________________________
Telrad mailing list
Telrad@wispa.org<mailto:Telrad@wispa.org>
http://lists.wispa.org/mailman/listinfo/telrad
_______________________________________________
Telrad mailing list
Telrad@wispa.org<mailto:Telrad@wispa.org>
http://lists.wispa.org/mailman/listinfo/telrad
_______________________________________________
Telrad mailing list
Telrad@wispa.org<mailto:Telrad@wispa.org>
http://lists.wispa.org/mailman/listinfo/telrad

________________________________
[Avast 
logo]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>


This email has been checked for viruses by Avast antivirus software.
www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>


_______________________________________________
Telrad mailing list
Telrad@wispa.org
http://lists.wispa.org/mailman/listinfo/telrad

Reply via email to