Still looking for ok's...

On Sat, Feb 11, 2012 at 01:27 +0100, Mike Belopuhov wrote:
> Hi,
> 
> As it became evident, ix is driven by Low Latency Interrupts
> on 82599 to do all sorts of processing instead of the regular
> Rx/Tx queue interrupts.  LLI is an additional facility that
> is aimed to do out of band signalling for certain conditions.
> Therefore it has it's own interrupt moderation settings
> independent of the Tx/Rx queue interrupts.  There are bunch
> of conditions that cause LLI but only one of them is turned
> on by default: signal when rx ring shrinks below 64
> descriptors.  Certainly, MCLGETI causes loads of this
> interrupts because most of the time we're running with less
> than 64 rx descriptors.  On top of that this works as a
> positive feedback: you get more LLIs, you lose ticks, ring
> shrinks, you get more LLIs and so on.
> 
> Previously we had LLIs completely unmoderated: 30-50k intr/s
> under high load isn't something we haven't seen.  At that
> point Claudio and I have figured that these were LLIs that
> we killing us.  To mitigate that Claudio has found a way to
> moderate them.  Unfortunately the longest inter-interrupt
> interval Intel allows us to set up for LLI is 60 us.  That
> accounts for 16 666 intr/s and this is a precise number you'd
> see with a 82599.  According to the (surprisingly incorrectly
> documented) interrupt throttling parameters used in the
> driver, regular interrupt would only happen once in 500 us.
> 
> So in my previous attempt at fixing ix performance I made
> rxeof do more work, staying longer in the interrupt context
> and processing more than 4 rx descriptors per interrupt:
> http://marc.info/?l=openbsd-tech&m=132222989805096&w=2
> Obviously, I was working around LLIs.
> 
> To disable LLI firing in the event of the Rx ring depletion
> one needs to write 0 to the threshold part of the SRRCTL
> register.  In the diff below I suggest starting with a 
> zeroed value of the register and build gradually from that.
> 
> I have also changed how we setup the inter-interrupt
> interval to calculate an appropriate register value based
> on the desired number of interrupts per second.  Tests
> have shown that 8000 intr/s (or 125 us intervals) are good
> enough throughput and latency wise.  On a selected platform
> (HP DL160 G6 with Xeon E5504, 2Ghz) 82599 has shown the
> following results:
> 
>  Interrupt rate, | Throughput for 64 byte | Average latency,
>     intr/s       |     packet, kpps       |       ms
>  ----------------+------------------------+-----------------
>       8000       |          535           |      0.1
>       4000       |          550           |      0.3
> 
> As you can see OpenBSD performs very well within the 125us ..
> 250us range, though just a 3% increase in throughput triples
> the average latency.
> 
> Similar results were obtained from Core2 duo and Xeon E56xx
> series CPUs.
> 
> [Please note that test kernels included the well-known
>  icmp error copy fix to achieve better performance]
> 
> However it's possible to tune it a bit more, I suggest
> a simple diff below for inclusion into 5.1.
> 
> OK?
> 
> Index: if_ix.c
> ===================================================================
> RCS file: /cvs/openbsd/src/sys/dev/pci/if_ix.c,v
> retrieving revision 1.60
> diff -u -p -r1.60 if_ix.c
> --- if_ix.c   20 Jan 2012 14:48:49 -0000      1.60
> +++ if_ix.c   10 Feb 2012 20:52:17 -0000
> @@ -634,8 +634,8 @@ ixgbe_init(void *arg)
>       struct ix_softc *sc = (struct ix_softc *)arg;
>       struct ifnet    *ifp = &sc->arpcom.ac_if;
>       struct rx_ring  *rxr = sc->rx_rings;
> -     uint32_t         k, txdctl, rxdctl, rxctrl, mhadd, gpie;
> -     int              i, s, err, llimode = 0;
> +     uint32_t         k, txdctl, rxdctl, rxctrl, mhadd, gpie, itr;
> +     int              i, s, err;
>  
>       INIT_DEBUGOUT("ixgbe_init: begin");
>  
> @@ -703,7 +703,6 @@ ixgbe_init(void *arg)
>                * interrupts hitting the card when the ring is getting full.
>                */
>               gpie |= 0xf << IXGBE_GPIE_LLI_DELAY_SHIFT;
> -             llimode = IXGBE_EITR_LLI_MOD;
>       }
>  
>       if (sc->msix > 1) {
> @@ -807,9 +806,14 @@ ixgbe_init(void *arg)
>               }
>       }
>  
> -     /* Set moderation on the Link interrupt */
> -     IXGBE_WRITE_REG(&sc->hw, IXGBE_EITR(sc->linkvec),
> -         IXGBE_LINK_ITR | llimode);
> +     /* Setup interrupt moderation */
> +     if (sc->hw.mac.type == ixgbe_mac_82598EB)
> +             itr = (8000000 / IXGBE_INTS_PER_SEC) & 0xff8;
> +     else {
> +             itr = (4000000 / IXGBE_INTS_PER_SEC) & 0xff8;
> +             itr |= IXGBE_EITR_LLI_MOD | IXGBE_EITR_CNT_WDIS;
> +     }
> +     IXGBE_WRITE_REG(&sc->hw, IXGBE_EITR(0), itr);
>  
>       /* Config/Enable Link */
>       ixgbe_config_link(sc);
> @@ -2832,10 +2836,7 @@ ixgbe_initialize_receive_units(struct ix
>                   sc->num_rx_desc * sizeof(union ixgbe_adv_rx_desc));
>  
>               /* Set up the SRRCTL register */
> -             srrctl = IXGBE_READ_REG(&sc->hw, IXGBE_SRRCTL(i));
> -             srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
> -             srrctl &= ~IXGBE_SRRCTL_BSIZEPKT_MASK;
> -             srrctl |= bufsz;
> +             srrctl = bufsz;
>               if (rxr->hdr_split) {
>                       /* Use a standard mbuf for the header */
>                       srrctl |= ((IXGBE_RX_HDR <<
> Index: if_ix.h
> ===================================================================
> RCS file: /cvs/openbsd/src/sys/dev/pci/if_ix.h,v
> retrieving revision 1.13
> diff -u -p -r1.13 if_ix.h
> --- if_ix.h   15 Jun 2011 00:03:00 -0000      1.13
> +++ if_ix.h   10 Feb 2012 20:50:59 -0000
> @@ -119,12 +119,9 @@
>  #define IXGBE_QUEUE_HUNG                2
>  
>  /*
> - * Interrupt Moderation parameters 
> + * Interrupt Moderation parameters
>   */
> -#define IXGBE_LOW_LATENCY       128
> -#define IXGBE_AVE_LATENCY       400
> -#define IXGBE_BULK_LATENCY      1200
> -#define IXGBE_LINK_ITR          2000
> +#define IXGBE_INTS_PER_SEC           8000
>  
>  /* Used for auto RX queue configuration */
>  extern int mp_ncpus;

Reply via email to