This diff works around FIFO_UNDERRUN (0x84) Tx errors being reported by iwn(4) firmware. When this error occurs, it tends to occur multiple times in a row. Affected frames are lost and never get transmitted, and traffic stalls for a while. This affects tcpbench very visibly.
I don't understand what is causing this. I have found that it only occurs when we ask the firmware to use its multi-rate retry table. If we send frames at a fixed rate, it does not happen. This error is particularly problematic with block ack, because the failed frames disappear and leave a hole in the receivers block ack window. The receiver will then have to wait a while for the lost frames until it eventually decides to skip them. This problem effectively makes Tx aggegration unusable on iwn(4). My workaround is to always use a fixed Tx rate for aggregation queues. This is not ideal for single frames which also get send from such queues when traffic is low and may now be more likely to fail. But if the firmware decides to aggregate frames during traffic bursts, all frames contained in an aggregate are always sent together at the same rate anyway, so in this case we don't loose anything. I would like to find a better fix, but this allows me to proceed with additional fixes for aggregation support and together with those fixes this seems better than the lossy behaviour we have now. diff 0eca04344da7ad4deb76485dcef00cdf88803be4 6f793971788fd7061f66330336cbeb5103b717c3 blob - 14c2d9a35e2973feb1ec2347eeef3d6041864291 blob + 110bbe97b980b338acd8aa61fbabd33926c207be --- sys/dev/pci/if_iwn.c +++ sys/dev/pci/if_iwn.c @@ -3513,10 +3513,12 @@ iwn_tx(struct iwn_softc *sc, struct mbuf *m, struct ie else tx->rflags = rinfo->flags; /* - * Skip rate control if our Tx rate is fixed. - * Keep the Tx rate constant while mira is probing. + * Keep the Tx rate constant while mira is probing, or if this is + * an aggregation queue in which case a fixed Tx rate works around + * FIFO_UNDERRUN Tx errors. */ if (tx->id == sc->broadcast_id || ieee80211_mira_is_probing(&wn->mn) || + qid >= sc->first_agg_txq || ic->ic_fixed_mcs != -1 || ic->ic_fixed_rate != -1) { /* Group or management frame, or probing, or fixed Tx rate. */ tx->linkq = 0;