Hi, As it became evident, ix is driven by Low Latency Interrupts on 82599 to do all sorts of processing instead of the regular Rx/Tx queue interrupts. LLI is an additional facility that is aimed to do out of band signalling for certain conditions. Therefore it has it's own interrupt moderation settings independent of the Tx/Rx queue interrupts. There are bunch of conditions that cause LLI but only one of them is turned on by default: signal when rx ring shrinks below 64 descriptors. Certainly, MCLGETI causes loads of this interrupts because most of the time we're running with less than 64 rx descriptors. On top of that this works as a positive feedback: you get more LLIs, you lose ticks, ring shrinks, you get more LLIs and so on.
Previously we had LLIs completely unmoderated: 30-50k intr/s under high load isn't something we haven't seen. At that point Claudio and I have figured that these were LLIs that we killing us. To mitigate that Claudio has found a way to moderate them. Unfortunately the longest inter-interrupt interval Intel allows us to set up for LLI is 60 us. That accounts for 16 666 intr/s and this is a precise number you'd see with a 82599. According to the (surprisingly incorrectly documented) interrupt throttling parameters used in the driver, regular interrupt would only happen once in 500 us. So in my previous attempt at fixing ix performance I made rxeof do more work, staying longer in the interrupt context and processing more than 4 rx descriptors per interrupt: http://marc.info/?l=openbsd-tech&m=132222989805096&w=2 Obviously, I was working around LLIs. To disable LLI firing in the event of the Rx ring depletion one needs to write 0 to the threshold part of the SRRCTL register. In the diff below I suggest starting with a zeroed value of the register and build gradually from that. I have also changed how we setup the inter-interrupt interval to calculate an appropriate register value based on the desired number of interrupts per second. Tests have shown that 8000 intr/s (or 125 us intervals) are good enough throughput and latency wise. On a selected platform (HP DL160 G6 with Xeon E5504, 2Ghz) 82599 has shown the following results: Interrupt rate, | Throughput for 64 byte | Average latency, intr/s | packet, kpps | ms ----------------+------------------------+----------------- 8000 | 535 | 0.1 4000 | 550 | 0.3 As you can see OpenBSD performs very well within the 125us .. 250us range, though just a 3% increase in throughput triples the average latency. Similar results were obtained from Core2 duo and Xeon E56xx series CPUs. [Please note that test kernels included the well-known icmp error copy fix to achieve better performance] However it's possible to tune it a bit more, I suggest a simple diff below for inclusion into 5.1. OK? Index: if_ix.c =================================================================== RCS file: /cvs/openbsd/src/sys/dev/pci/if_ix.c,v retrieving revision 1.60 diff -u -p -r1.60 if_ix.c --- if_ix.c 20 Jan 2012 14:48:49 -0000 1.60 +++ if_ix.c 10 Feb 2012 20:52:17 -0000 @@ -634,8 +634,8 @@ ixgbe_init(void *arg) struct ix_softc *sc = (struct ix_softc *)arg; struct ifnet *ifp = &sc->arpcom.ac_if; struct rx_ring *rxr = sc->rx_rings; - uint32_t k, txdctl, rxdctl, rxctrl, mhadd, gpie; - int i, s, err, llimode = 0; + uint32_t k, txdctl, rxdctl, rxctrl, mhadd, gpie, itr; + int i, s, err; INIT_DEBUGOUT("ixgbe_init: begin"); @@ -703,7 +703,6 @@ ixgbe_init(void *arg) * interrupts hitting the card when the ring is getting full. */ gpie |= 0xf << IXGBE_GPIE_LLI_DELAY_SHIFT; - llimode = IXGBE_EITR_LLI_MOD; } if (sc->msix > 1) { @@ -807,9 +806,14 @@ ixgbe_init(void *arg) } } - /* Set moderation on the Link interrupt */ - IXGBE_WRITE_REG(&sc->hw, IXGBE_EITR(sc->linkvec), - IXGBE_LINK_ITR | llimode); + /* Setup interrupt moderation */ + if (sc->hw.mac.type == ixgbe_mac_82598EB) + itr = (8000000 / IXGBE_INTS_PER_SEC) & 0xff8; + else { + itr = (4000000 / IXGBE_INTS_PER_SEC) & 0xff8; + itr |= IXGBE_EITR_LLI_MOD | IXGBE_EITR_CNT_WDIS; + } + IXGBE_WRITE_REG(&sc->hw, IXGBE_EITR(0), itr); /* Config/Enable Link */ ixgbe_config_link(sc); @@ -2832,10 +2836,7 @@ ixgbe_initialize_receive_units(struct ix sc->num_rx_desc * sizeof(union ixgbe_adv_rx_desc)); /* Set up the SRRCTL register */ - srrctl = IXGBE_READ_REG(&sc->hw, IXGBE_SRRCTL(i)); - srrctl &= ~IXGBE_SRRCTL_BSIZEHDR_MASK; - srrctl &= ~IXGBE_SRRCTL_BSIZEPKT_MASK; - srrctl |= bufsz; + srrctl = bufsz; if (rxr->hdr_split) { /* Use a standard mbuf for the header */ srrctl |= ((IXGBE_RX_HDR << Index: if_ix.h =================================================================== RCS file: /cvs/openbsd/src/sys/dev/pci/if_ix.h,v retrieving revision 1.13 diff -u -p -r1.13 if_ix.h --- if_ix.h 15 Jun 2011 00:03:00 -0000 1.13 +++ if_ix.h 10 Feb 2012 20:50:59 -0000 @@ -119,12 +119,9 @@ #define IXGBE_QUEUE_HUNG 2 /* - * Interrupt Moderation parameters + * Interrupt Moderation parameters */ -#define IXGBE_LOW_LATENCY 128 -#define IXGBE_AVE_LATENCY 400 -#define IXGBE_BULK_LATENCY 1200 -#define IXGBE_LINK_ITR 2000 +#define IXGBE_INTS_PER_SEC 8000 /* Used for auto RX queue configuration */ extern int mp_ncpus;
