#19500: Mikrotik Routerboard RB493G switch flakiness
-----------------------+------------------------
 Reporter:  russell@…  |      Owner:  developers
     Type:  defect     |     Status:  new
 Priority:  normal     |  Milestone:
Component:  packages   |    Version:  Trunk
 Keywords:             |
-----------------------+------------------------
 This device has 9 gigabit ports, divided between two ar8316 switch chips
 on the board.
 Here is a fragment of the bootlog, with a recent build, r45385:

 {{{
 [    3.190000] switch0: Atheros AR8316 rev. 1 switch registered on gpio-
 ffffffff
 [    3.200000] libphy: GPIO Bitbanged MDIO: probed
 [    3.230000] switch1: Atheros AR8316 rev. 1 switch registered on ag71xx-
 mdio.0
 [    3.240000] libphy: ag71xx_mdio: probed
 [    3.540000] ar8316: Using port 4 as switch port
 [    3.680000] ag71xx ag71xx.1: connected to PHY at gpio-ffffffff:00
 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
 [    3.690000] eth0: Atheros AG71xx at 0xba000000, irq 5, mode:RGMII
 [    4.000000] ar8316: Using port 4 as switch port
 [    4.140000] ag71xx ag71xx.0: connected to PHY at ag71xx-mdio.0:00
 [uid=004dd041, driver=Atheros AR8216/AR8236/AR8316]
 [    4.150000] eth1: Atheros AG71xx at 0xb9000000, irq 4, mode:RGMII
 }}}

 I reported this on #openwrt-devel a few years ago, but apparently never
 followed up with a ticket.  So, here's the ticket.  I see strange console
 messages regarding one of the switch chips (attached to eth0), indicating
 speed and duplex changes, and or port status changes.  The ports involved
 attach to a couple ubiquiti bullet MxHP's on my roof.  The bullets are not
 misbehaving, afaict.  Here is a sampling of the messages:

 {{{
 [285254.030000] eth0: link up (1000Mbps/Full duplex)
 [286028.070000] eth0: link up (1000Mbps/Half duplex)
 [286030.070000] eth0: link up (1000Mbps/Full duplex)
 [286622.070000] eth0: link up (1000Mbps/Half duplex)
 [286624.070000] eth0: link up (1000Mbps/Full duplex)
 [286854.070000] eth0: link up (10Mbps/Half duplex)
 [286856.070000] eth0: link up (1000Mbps/Full duplex)
 [289766.140000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 down
 [289768.150000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 up
 [297282.220000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
 down
 [297284.220000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
 up
 [299710.220000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 down
 [299712.230000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 up
 [303780.230000] eth0: link up (10Mbps/Half duplex)
 [303782.230000] eth0: link up (1000Mbps/Full duplex)
 [304704.230000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
 down
 [304706.230000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 2 is
 up
 [313782.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 down
 [313784.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 up
 [315150.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 down
 [315152.240000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 up
 [316114.260000] eth0: link up (1000Mbps/Half duplex)
 [316116.260000] eth0: link up (1000Mbps/Full duplex)
 [318226.270000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 down
 [318228.270000] Atheros AR8216/AR8236/AR8316 gpio-ffffffff:00: Port 3 is
 up
 [319014.270000] eth0: link up (10Mbps/Half duplex)
 [319016.270000] eth0: link up (1000Mbps/Full duplex)
 [319936.270000] eth0: link up (1000Mbps/Half duplex)
 [319938.270000] eth0: link up (1000Mbps/Full duplex)
 [323430.300000] eth0: link up (1000Mbps/Half duplex)
 [323432.300000] eth0: link up (1000Mbps/Full duplex)
 [327352.370000] eth0: link up (1000Mbps/Half duplex)
 [327354.370000] eth0: link up (1000Mbps/Full duplex)
 [344916.630000] eth0: link up (10Mbps/Half duplex)
 [344918.630000] eth0: link up (1000Mbps/Full duplex)
 }}}

 Note the two second misbehavior and recovery, which is classic.

 Historically, from time to time, the whole switch seems to stop passing
 traffic.  I have a watchdog script that tries to ping both the Ubiquiti
 devices, and if it can't ping either one it reboots the RB493G.  I haven't
 seen that since the recent image was flashed, but it's only been a few
 days.

 Does anyone know what those messages indicate, and how to make them stop?
 Or at least, how to diagnose this?

--
Ticket URL: <https://dev.openwrt.org/ticket/19500>
OpenWrt <http://openwrt.org>
Opensource Wireless Router Technology
_______________________________________________
openwrt-tickets mailing list
[email protected]
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-tickets

Reply via email to