Hi,

As it became evident, ix is driven by Low Latency Interrupts
on 82599 to do all sorts of processing instead of the regular
Rx/Tx queue interrupts.  LLI is an additional facility that
is aimed to do out of band signalling for certain conditions.
Therefore it has it's own interrupt moderation settings
independent of the Tx/Rx queue interrupts.  There are bunch
of conditions that cause LLI but only one of them is turned
on by default: signal when rx ring shrinks below 64
descriptors.  Certainly, MCLGETI causes loads of this
interrupts because most of the time we're running with less
than 64 rx descriptors.  On top of that this works as a
positive feedback: you get more LLIs, you lose ticks, ring
shrinks, you get more LLIs and so on.

Previously we had LLIs completely unmoderated: 30-50k intr/s
under high load isn't something we haven't seen.  At that
point Claudio and I have figured that these were LLIs that
we killing us.  To mitigate that Claudio has found a way to
moderate them.  Unfortunately the longest inter-interrupt
interval Intel allows us to set up for LLI is 60 us.  That
accounts for 16 666 intr/s and this is a precise number you'd
see with a 82599.  According to the (surprisingly incorrectly
documented) interrupt throttling parameters used in the
driver, regular interrupt would only happen once in 500 us.

So in my previous attempt at fixing ix performance I made
rxeof do more work, staying longer in the interrupt context
and processing more than 4 rx descriptors per interrupt:
http://marc.info/?l=openbsd-tech&m=132222989805096&w=2
Obviously, I was working around LLIs.

To disable LLI firing in the event of the Rx ring depletion
one needs to write 0 to the threshold part of the SRRCTL
register.  In the diff below I suggest starting with a 
zeroed value of the register and build gradually from that.

I have also changed how we setup the inter-interrupt
interval to calculate an appropriate register value based
on the desired number of interrupts per second.  Tests
have shown that 8000 intr/s (or 125 us intervals) are good
enough throughput and latency wise.  On a selected platform
(HP DL160 G6 with Xeon E5504, 2Ghz) 82599 has shown the
following results:

 Interrupt rate, | Throughput for 64 byte | Average latency,
    intr/s       |     packet, kpps       |       ms
 ----------------+------------------------+-----------------
      8000       |          535           |      0.1
      4000       |          550           |      0.3

As you can see OpenBSD performs very well within the 125us ..
250us range, though just a 3% increase in throughput triples
the average latency.

Similar results were obtained from Core2 duo and Xeon E56xx
series CPUs.

[Please note that test kernels included the well-known
 icmp error copy fix to achieve better performance]

However it's possible to tune it a bit more, I suggest
a simple diff below for inclusion into 5.1.

OK?

Index: if_ix.c
===================================================================
RCS file: /cvs/openbsd/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.60
diff -u -p -r1.60 if_ix.c
--- if_ix.c     20 Jan 2012 14:48:49 -0000      1.60
+++ if_ix.c     10 Feb 2012 20:52:17 -0000
@@ -634,8 +634,8 @@ ixgbe_init(void *arg)
        struct ix_softc *sc = (struct ix_softc *)arg;
        struct ifnet    *ifp = &sc->arpcom.ac_if;
        struct rx_ring  *rxr = sc->rx_rings;
-       uint32_t         k, txdctl, rxdctl, rxctrl, mhadd, gpie;
-       int              i, s, err, llimode = 0;
+       uint32_t         k, txdctl, rxdctl, rxctrl, mhadd, gpie, itr;
+       int              i, s, err;
 
        INIT_DEBUGOUT("ixgbe_init: begin");
 
@@ -703,7 +703,6 @@ ixgbe_init(void *arg)
                 * interrupts hitting the card when the ring is getting full.
                 */
                gpie |= 0xf << IXGBE_GPIE_LLI_DELAY_SHIFT;
-               llimode = IXGBE_EITR_LLI_MOD;
        }
 
        if (sc->msix > 1) {
@@ -807,9 +806,14 @@ ixgbe_init(void *arg)
                }
        }
 
-       /* Set moderation on the Link interrupt */
-       IXGBE_WRITE_REG(&sc->hw, IXGBE_EITR(sc->linkvec),
-           IXGBE_LINK_ITR | llimode);
+       /* Setup interrupt moderation */
+       if (sc->hw.mac.type == ixgbe_mac_82598EB)
+               itr = (8000000 / IXGBE_INTS_PER_SEC) & 0xff8;
+       else {
+               itr = (4000000 / IXGBE_INTS_PER_SEC) & 0xff8;
+               itr |= IXGBE_EITR_LLI_MOD | IXGBE_EITR_CNT_WDIS;
+       }
+       IXGBE_WRITE_REG(&sc->hw, IXGBE_EITR(0), itr);
 
        /* Config/Enable Link */
        ixgbe_config_link(sc);
@@ -2832,10 +2836,7 @@ ixgbe_initialize_receive_units(struct ix
                    sc->num_rx_desc * sizeof(union ixgbe_adv_rx_desc));
 
                /* Set up the SRRCTL register */
-               srrctl = IXGBE_READ_REG(&sc->hw, IXGBE_SRRCTL(i));
-               srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK;
-               srrctl &= ~IXGBE_SRRCTL_BSIZEPKT_MASK;
-               srrctl |= bufsz;
+               srrctl = bufsz;
                if (rxr->hdr_split) {
                        /* Use a standard mbuf for the header */
                        srrctl |= ((IXGBE_RX_HDR <<
Index: if_ix.h
===================================================================
RCS file: /cvs/openbsd/src/sys/dev/pci/if_ix.h,v
retrieving revision 1.13
diff -u -p -r1.13 if_ix.h
--- if_ix.h     15 Jun 2011 00:03:00 -0000      1.13
+++ if_ix.h     10 Feb 2012 20:50:59 -0000
@@ -119,12 +119,9 @@
 #define IXGBE_QUEUE_HUNG                2
 
 /*
- * Interrupt Moderation parameters 
+ * Interrupt Moderation parameters
  */
-#define IXGBE_LOW_LATENCY       128
-#define IXGBE_AVE_LATENCY       400
-#define IXGBE_BULK_LATENCY      1200
-#define IXGBE_LINK_ITR          2000
+#define IXGBE_INTS_PER_SEC             8000
 
 /* Used for auto RX queue configuration */
 extern int mp_ncpus;

Reply via email to