#19500: Mikrotik Routerboard RB493G switch flakiness
-----------------------+------------------------
Reporter: russell@… | Owner: developers
Type: defect | Status: new
Priority: normal | Milestone:
Component: packages | Version: Trunk
Keywords: |
-----------------------+------------------------
This device has 9 gigabit ports, divided between two ar8316 switch chips
on the board.
Here is a fragment of the bootlog, with a recent build, r45385:
{{{
[ 3.190000] switch0: Atheros AR8316 rev. 1 switch registered on gpio-
ffffffff
[ 3.200000] libphy: GPIO Bitbanged MDIO: probed
[ 3.230000] switch1: Atheros AR8316 rev. 1 switch registered on ag71xx-
mdio.0
[ 3.240000] libphy: ag71xx_mdio: probed
[ 3.540000] ar8316: Using port 4 as switch port
[ 3.680000] ag71xx ag71xx.1: connected to PHY at gpio-ffffffff:00
[uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
[ 3.690000] eth0: Atheros AG71xx at 0xba000000, irq 5, mode:RGMII
[ 4.000000] ar8316: Using port 4 as switch port
[ 4.140000] ag71xx ag71xx.0: connected to PHY at ag71xx-mdio.0:00
[uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
[ 4.150000] eth1: Atheros AG71xx at 0xb9000000, irq 4, mode:RGMII
}}}
I reported this on #openwrt-devel a few years ago, but apparently never
followed up with a ticket. So, here's the ticket. I see strange console
messages regarding one of the switch chips (attached to eth0), indicating
speed and duplex changes, and or port status changes. The ports involved
attach to a couple ubiquiti bullet MxHP's on my roof. The bullets are not
misbehaving, afaict. Here is a sampling of the messages:
{{{
[285254.030000] eth0: link up (1000Mbps/Full duplex)
[286028.070000] eth0: link up (1000Mbps/Half duplex)
[286030.070000] eth0: link up (1000Mbps/Full duplex)
[286622.070000] eth0: link up (1000Mbps/Half duplex)
[286624.070000] eth0: link up (1000Mbps/Full duplex)
[286854.070000] eth0: link up (10Mbps/Half duplex)
[286856.070000] eth0: link up (1000Mbps/Full duplex)
[289766.140000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
down
[289768.150000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
up
[297282.220000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
down
[297284.220000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
up
[299710.220000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
down
[299712.230000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
up
[303780.230000] eth0: link up (10Mbps/Half duplex)
[303782.230000] eth0: link up (1000Mbps/Full duplex)
[304704.230000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
down
[304706.230000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
up
[313782.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
down
[313784.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
up
[315150.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
down
[315152.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
up
[316114.260000] eth0: link up (1000Mbps/Half duplex)
[316116.260000] eth0: link up (1000Mbps/Full duplex)
[318226.270000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
down
[318228.270000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
up
[319014.270000] eth0: link up (10Mbps/Half duplex)
[319016.270000] eth0: link up (1000Mbps/Full duplex)
[319936.270000] eth0: link up (1000Mbps/Half duplex)
[319938.270000] eth0: link up (1000Mbps/Full duplex)
[323430.300000] eth0: link up (1000Mbps/Half duplex)
[323432.300000] eth0: link up (1000Mbps/Full duplex)
[327352.370000] eth0: link up (1000Mbps/Half duplex)
[327354.370000] eth0: link up (1000Mbps/Full duplex)
[344916.630000] eth0: link up (10Mbps/Half duplex)
[344918.630000] eth0: link up (1000Mbps/Full duplex)
}}}
Note the two second misbehavior and recovery, which is classic.
Historically, from time to time, the whole switch seems to stop passing
traffic. I have a watchdog script that tries to ping both the Ubiquiti
devices, and if it can't ping either one it reboots the RB493G. I haven't
seen that since the recent image was flashed, but it's only been a few
days.
Does anyone know what those messages indicate, and how to make them stop?
Or at least, how to diagnose this?
--
Ticket URL: <https://dev.openwrt.org/ticket/19500>
OpenWrt <http://openwrt.org>
Opensource Wireless Router Technology
_______________________________________________
openwrt-tickets mailing list
[email protected]
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-tickets