On Tue, Feb 25, 2020 at 03:07:47PM +0100, Stefan Sperling wrote:
> On Tue, Feb 25, 2020 at 01:06:31PM +0100, Tobias Heider wrote:
> > Hi,
> >
> > I tried to figure out the reason for the lost Tx receive interrupt
> > by comparing iwm with iwlwifi and I think our handling of the
> > periodic RX interrupt is a bit off.
>
> Very nice find.
>
> Not sure if the lost Tx interrupt is related, though.
I can't say whether it is actually related. I think writing
IWM_CSR_FH_INT_STATUS when we should not could potentially lead to lost
interrupts. In the end our only chance to find out is to reproduce the
issue and test with and without this patch.
> > In linux on receive of any of the possible RX interrupts the periodic
> > interrupt is disabled and then reenabled. As far as I understand this
> > is to make sure that no matter what kind of RX interrupt was received,
> > we make sure there will be another one in 8 ms when the newly activated
> > periodic interrupt will trigger.
>
> So, as far as I understand:
>
> The interrupt is called "periodic" but that is in fact misleading.
> The firmware offers a periodic interrupt mechanism which the Linux driver
> uses as a one-shot. The driver's goal is to generate just *one* interrupt
> in another 8 ms to address a producer / consumer race between firmware
> and driver during Rx.
>
> See Linux commit 74ba67edfcb235c0415a62d37493866c8380dc1d.
> Could we copy some of the elaborate comments from there?
>
Diff with more comments below.
> > iwm on the other hand disables the periodic interrupt only when a
> > periodic interrupt was received AND at the same time SW_RX or FH_RX is
> > set, and then reenables it only on a SW_RX or FH_RX interrupt that was
> > not periodic.
>
> Do you see a lower interrupt rate under Rx load with this diff?
>
> Currently, if the device is loaded with as much Rx as we can take then
> the system spends a lot of time in interrupt context. There have been
> reports of systems becoming noticeably less responsive in such situations.
> Is the periodic interrupt perhaps firing too often because we leave it
> enabled too long because of the bug you identified?
I have tested this with a 7260 and 8260 in different conditions and couldn't
see any noticeable difference with systat. With and without diff the time
spent in interrupts was at most 10%. The absolute numbers were also quite
similar. Maybe it's a hardware thing and mine is not affected.
> > A discrepany between iwm and iwlwifi is that in iwm
> > IWM_CSR_INT_BIT_RX_PERIODIC
> > alone can lead to a write of IWM_CSR_FH_INT_RX_MASK to IWM_CSR_FH_INT_STATUS
> > (which might be the receive confirmation), which in iwlwifi only happens if
> > SW_RX or FH_RX are set.
> >
> > The attached diff makes iwm behave more like iwlwifi in the cases described
> > above.
> >
> > ok?
>
> Which devices have you tested this with?
I tested with a 7260 and a 8260, patrick@ tested the diff on his 9260.
diff --git a/sys/dev/pci/if_iwm.c b/sys/dev/pci/if_iwm.c
index 2a276915cfe..66e6ad25ba2 100644
--- a/sys/dev/pci/if_iwm.c
+++ b/sys/dev/pci/if_iwm.c
@@ -8615,7 +8615,6 @@ iwm_intr(void *arg)
struct iwm_softc *sc = arg;
int handled = 0;
int rv = 0;
- int isperiodic = 0;
uint32_t r1, r2;
IWM_WRITE(sc, IWM_CSR_INT_MASK, 0);
@@ -8722,27 +8721,32 @@ iwm_intr(void *arg)
wakeup(&sc->sc_fw);
}
- if (r1 & IWM_CSR_INT_BIT_RX_PERIODIC) {
- handled |= IWM_CSR_INT_BIT_RX_PERIODIC;
- IWM_WRITE(sc, IWM_CSR_INT, IWM_CSR_INT_BIT_RX_PERIODIC);
- if ((r1 & (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX)) == 0)
- IWM_WRITE_1(sc,
- IWM_CSR_INT_PERIODIC_REG, IWM_CSR_INT_PERIODIC_DIS);
- isperiodic = 1;
- }
-
- if ((r1 & (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX)) ||
- isperiodic) {
- handled |= (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX);
- IWM_WRITE(sc, IWM_CSR_FH_INT_STATUS, IWM_CSR_FH_INT_RX_MASK);
+ if (r1 & (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX |
+ IWM_CSR_INT_BIT_RX_PERIODIC)) {
+ if (r1 & (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX)) {
+ handled |= (IWM_CSR_INT_BIT_FH_RX |
IWM_CSR_INT_BIT_SW_RX);
+ IWM_WRITE(sc, IWM_CSR_FH_INT_STATUS,
IWM_CSR_FH_INT_RX_MASK);
+ }
+ if (r1 & IWM_CSR_INT_BIT_RX_PERIODIC) {
+ handled |= IWM_CSR_INT_BIT_RX_PERIODIC;
+ IWM_WRITE(sc, IWM_CSR_INT, IWM_CSR_INT_BIT_RX_PERIODIC);
+ }
- iwm_notif_intr(sc);
+ /* Disable periodic interrupt; we use it as just a one-shot. */
+ IWM_WRITE_1(sc, IWM_CSR_INT_PERIODIC_REG,
IWM_CSR_INT_PERIODIC_DIS);
- /* enable periodic interrupt, see above */
- if (r1 & (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX) &&
- !isperiodic)
+ /*
+ * Enable periodic interrupt in 8 msec only if we received
+ * real RX interrupt (instead of just periodic int), to catch
+ * any dangling Rx interrupt. If it was just the periodic
+ * interrupt, there was no dangling Rx activity, and no need
+ * to extend the periodic interrupt; one-shot is enough.
+ */
+ if (r1 & (IWM_CSR_INT_BIT_FH_RX | IWM_CSR_INT_BIT_SW_RX))
IWM_WRITE_1(sc, IWM_CSR_INT_PERIODIC_REG,
IWM_CSR_INT_PERIODIC_ENA);
+
+ iwm_notif_intr(sc);
}
rv = 1;