While testing my RA patches, I've seen iwn "hang" even though the AP and
iwn client were still exchanging packets at the wifi layer, but the upper
layers IP/UDP/TCP etc. were stuck. I could easily trigger this by running
tcpbench and moving towards the edge of the range of my AP.
Traffic would recover by itself after a while.

I can now explain why this is happening and suggest a fix.

When a packet on an aggregation queue fails, the driver sends a block ack
request to the receiver. This request contains the current starting sequence
number (SSN) of the firmware's block ack window. The purpose of this request
is to let the receiver know that any frames lower than the SSN should be
discarded. Only frames within (SSN, SSN+window-size) are valid.
In other words, we are trying to "resync" our block ack Tx window with
the peer after a Tx failure.

Most of the time, this works fine. We send one block ack request with SSN X
and the receiver sends a block ack, having adjusted its receive window such
that X becomes its new lower bound. For example:

  iwn: bock ack request SSN=2551
  AP: bock ack SSN=2551

Sometimes however, the firmware (not the driver) sends another block ack
request immediately after the AP's block ack is received, and such block
ack requests contain a bogus SSN. This shows up in monitor mode traces:

  iwn: bock ack request SSN=2551
  AP: bock ack SSN=2551
  iwn firmware: bock ack request SSN=0
  AP: bock ack SSN=0

Now the receiver is out of sync, and will discard frames until iwn's sending
window wraps back to zero.  The firmware will happily keep transmitting frames
with sequence numbers 2552, 2553, and traffic is restored when it finally
wraps around at 0xfff == 4095.

In the cases I observed, the driver-generated BA request was sent at 6 Mbit/s,
which is expected. However, the second frame was sent at 24 Mbit/s, which
indicates that the firmware could be retrying the BA request (frames sent
at a different Tx rate than specified by the driver are generally retries).

BA req frames are control frames, and our driver is sending any such non-data
frames via firmware's broadcast node. This node does not represent the AP.

Sending BA req frames with the firmware node which represents the AP seems to
fix the problem. I have not yet managed to trigger it again with this patch.
My best explanation is that this allows the firmware to retry block ack
requests properly, and to stop retrying once a BA is received from the AP.

ok?

diff 1ff4cf56fdff3473d72fc4b29d69428c688d47c6 /usr/src (staged changes)
blob - afeb963ef626d2e98018b1d405c35936d96ba4e1
blob + 50005e1511b06c99e72952110e4f06fb30cb818b
--- sys/dev/pci/if_iwn.c
+++ sys/dev/pci/if_iwn.c
@@ -3505,7 +3505,10 @@ iwn_tx(struct iwn_softc *sc, struct mbuf *m, struct ie
                }
        }
 
-       if (IEEE80211_IS_MULTICAST(wh->i_addr1) ||
+       if (type == IEEE80211_FC0_TYPE_CTL &&
+           subtype == IEEE80211_FC0_SUBTYPE_BAR)
+               tx->id = wn->id;
+       else if (IEEE80211_IS_MULTICAST(wh->i_addr1) ||
            type != IEEE80211_FC0_TYPE_DATA)
                tx->id = sc->broadcast_id;
        else

Reply via email to