On 03/02/18(Sat) 18:08, Alexander Bluhm wrote:
> Hi,
> 
> I have seen some network hangs with my gem(4) interface on sparc64.
> Receiving packets stopped sometimes, ifconfig down/up made it work
> again.
> 
> In the tick timeout handler refill the receive ring if it is empty.
> The logic is taken from hme(4).  Also splnet() should protect the
> register access and ifp counters.
> 
> As this did not help, I looked into gem_rx_watchdog().  There is a
> workaround for a hardware bug.  It resets the hardware when there
> is no progress.  I my case the fifo pointers advanced a bit, but
> finally got stuck anyway.  So I restart the receive watchdog timeout
> again.
> 
> Since this fix the interface keeps running.
> 
> ok?

I've seen this problem sporadically on macppc has well.  So ok with me.

> Index: dev/ic/gem.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/ic/gem.c,v
> retrieving revision 1.122
> diff -u -p -r1.122 gem.c
> --- dev/ic/gem.c      8 Jun 2017 01:34:00 -0000       1.122
> +++ dev/ic/gem.c      3 Feb 2018 16:49:36 -0000
> @@ -423,6 +423,7 @@ gem_tick(void *arg)
>       int s;
>       u_int32_t v;
>  
> +     s = splnet();
>       /* unload collisions counters */
>       v = bus_space_read_4(t, mac, GEM_MAC_EXCESS_COLL_CNT) +
>           bus_space_read_4(t, mac, GEM_MAC_LATE_COLL_CNT);
> @@ -448,7 +449,15 @@ gem_tick(void *arg)
>       bus_space_write_4(t, mac, GEM_MAC_RX_CRC_ERR_CNT, 0);
>       bus_space_write_4(t, mac, GEM_MAC_RX_CODE_VIOL, 0);
>  
> -     s = splnet();
> +     /*
> +      * If buffer allocation fails, the receive ring may become
> +      * empty. There is no receive interrupt to recover from that.
> +      */
> +     if (if_rxr_inuse(&sc->sc_rx_ring) == 0) {
> +             gem_fill_rx_ring(sc);
> +             bus_space_write_4(t, mac, GEM_RX_KICK, sc->sc_rx_prod);
> +     }
> +
>       mii_tick(&sc->sc_mii);
>       splx(s);
>  
> @@ -1200,17 +1209,26 @@ gem_rx_watchdog(void *arg)
>       rx_fifo_wr_ptr = bus_space_read_4(t, h, GEM_RX_FIFO_WR_PTR);
>       rx_fifo_rd_ptr = bus_space_read_4(t, h, GEM_RX_FIFO_RD_PTR);
>       state = bus_space_read_4(t, h, GEM_MAC_MAC_STATE);
> -     if ((state & GEM_MAC_STATE_OVERFLOW) == GEM_MAC_STATE_OVERFLOW &&
> -         ((rx_fifo_wr_ptr == rx_fifo_rd_ptr) ||
> -          ((sc->sc_rx_fifo_wr_ptr == rx_fifo_wr_ptr) &&
> -           (sc->sc_rx_fifo_rd_ptr == rx_fifo_rd_ptr)))) {
> -             /*
> -              * The RX state machine is still in overflow state and
> -              * the RX FIFO write and read pointers seem to be
> -              * stuck.  Whack the chip over the head to get things
> -              * going again.
> -              */
> -             gem_init(ifp);
> +     if ((state & GEM_MAC_STATE_OVERFLOW) == GEM_MAC_STATE_OVERFLOW) {
> +             if ((rx_fifo_wr_ptr == rx_fifo_rd_ptr) ||
> +                  ((sc->sc_rx_fifo_wr_ptr == rx_fifo_wr_ptr) &&
> +                   (sc->sc_rx_fifo_rd_ptr == rx_fifo_rd_ptr))) {
> +                     /*
> +                      * The RX state machine is still in overflow state and
> +                      * the RX FIFO write and read pointers seem to be
> +                      * stuck.  Whack the chip over the head to get things
> +                      * going again.
> +                      */
> +                     gem_init(ifp);
> +             } else {
> +                     /*
> +                      * We made some progress, but is not certain that the
> +                      * overflow condition has been resolved.  Check again.
> +                      */
> +                     sc->sc_rx_fifo_wr_ptr = rx_fifo_wr_ptr;
> +                     sc->sc_rx_fifo_rd_ptr = rx_fifo_rd_ptr;
> +                     timeout_add_msec(&sc->sc_rx_watchdog, 400);
> +             }
>       }
>  }
>  
> 

Reply via email to