Re: iwm(4) timeout cancellation fix

2020-04-23 Thread Jeremie Courreges-Anglas
On Thu, Apr 23 2020, Stefan Sperling  wrote:
> I have observed a uvm fault in ieee80211_mira_probe_timeout_up() while
> testing with iwm(4) and tcpbench:
>
> void
> ieee80211_mira_probe_timeout_up(void *arg)
> {
>   struct ieee80211_mira_node *mn = arg;
>   int s;
>
>   s = splnet();
>   mn->probe_timer_expired[IEEE80211_MIRA_PROBE_TO_UP] = 1;
>   DPRINTFN(3, ("probe up timeout fired\n"));
>   splx(s);
> }
>
> One obvious possibility is that the 'mn' pointer became invalid before the
> timeout was executed. But I am not certain what happened exactly; the info
> in ddb was inconclusive since the console switching ran into splassert
> failures and I didn't see a good backtrace. But r12 in 'show regs' contained
> the address of ieee80211_mira_probe_timeout_up() and it looked like the
> kernel was in softclock context.
>
> In any case, it looks like cancelling timeouts before scheduling the
> iwm_newstate_task can lead to a race:
>
>  - Timeouts are cancelled and iwm_newstate_task is scheduled
>  - Tx done interrupts feed frames to MiRA which adds a new timeout
>  - iwm_newstate_task runs and switches state without cancelling this timeout
>  
> So cancel timeouts when we are actually switching state in the task.
>
> While here, initialize MiRA timeouts and other rate scaling state earlier,
> when the node is allocated.
>
> ok?

Works fine so far on

  iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
  iwm0: hw rev 0x230, fw ver 34.0.1, address f8:59:71:xx:xx:xx

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



iwm(4) timeout cancellation fix

2020-04-23 Thread Stefan Sperling
I have observed a uvm fault in ieee80211_mira_probe_timeout_up() while
testing with iwm(4) and tcpbench:

void
ieee80211_mira_probe_timeout_up(void *arg)
{
struct ieee80211_mira_node *mn = arg;
int s;

s = splnet();
mn->probe_timer_expired[IEEE80211_MIRA_PROBE_TO_UP] = 1;
DPRINTFN(3, ("probe up timeout fired\n"));
splx(s);
}

One obvious possibility is that the 'mn' pointer became invalid before the
timeout was executed. But I am not certain what happened exactly; the info
in ddb was inconclusive since the console switching ran into splassert
failures and I didn't see a good backtrace. But r12 in 'show regs' contained
the address of ieee80211_mira_probe_timeout_up() and it looked like the
kernel was in softclock context.

In any case, it looks like cancelling timeouts before scheduling the
iwm_newstate_task can lead to a race:

 - Timeouts are cancelled and iwm_newstate_task is scheduled
 - Tx done interrupts feed frames to MiRA which adds a new timeout
 - iwm_newstate_task runs and switches state without cancelling this timeout
 
So cancel timeouts when we are actually switching state in the task.

While here, initialize MiRA timeouts and other rate scaling state earlier,
when the node is allocated.

ok?

diff 9b00dd5006bdfc0cf9d71dac4defe47f2ccfda0d 
caa4131c31c0809410871043d8c29cf157d68ff3
blob - 01c059a512b8d83fe9900942e07b4c1b128f660c
blob + 196b7814bbfaf625495278070374393409f7cb69
--- sys/dev/pci/if_iwm.c
+++ sys/dev/pci/if_iwm.c
@@ -6770,7 +6770,16 @@ iwm_run_stop(struct iwm_softc *sc)
 struct ieee80211_node *
 iwm_node_alloc(struct ieee80211com *ic)
 {
-   return malloc(sizeof (struct iwm_node), M_DEVBUF, M_NOWAIT | M_ZERO);
+   struct iwm_softc *sc = IC2IFP(ic)->if_softc;
+   struct iwm_node *in;
+
+   in = malloc(sizeof (struct iwm_node), M_DEVBUF, M_NOWAIT | M_ZERO);
+   if (in == NULL)
+   return NULL;
+
+   ieee80211_amrr_node_init(&sc->sc_amrr, &in->in_amn);
+   ieee80211_mira_node_init(&in->in_mn);
+   return (struct ieee80211_node *)in;
 }
 
 void
@@ -6961,6 +6970,14 @@ iwm_newstate_task(void *psc)
int arg = sc->ns_arg;
int err = 0, s = splnet();
 
+   if (ic->ic_state == IEEE80211_S_RUN) {
+   struct iwm_node *in = (void *)ic->ic_bss;
+   timeout_del(&sc->sc_calib_to);
+   ieee80211_mira_cancel_timeouts(&in->in_mn);
+   iwm_del_task(sc, systq, &sc->ba_task);
+   iwm_del_task(sc, systq, &sc->htprot_task);
+   }
+
if (sc->sc_flags & IWM_FLAG_SHUTDOWN) {
/* iwm_stop() is waiting for us. */
refcnt_rele_wake(&sc->task_refs);
@@ -7057,14 +7074,6 @@ iwm_newstate(struct ieee80211com *ic, enum ieee80211_s
 {
struct ifnet *ifp = IC2IFP(ic);
struct iwm_softc *sc = ifp->if_softc;
-   struct iwm_node *in = (void *)ic->ic_bss;
-
-   if (ic->ic_state == IEEE80211_S_RUN) {
-   timeout_del(&sc->sc_calib_to);
-   ieee80211_mira_cancel_timeouts(&in->in_mn);
-   iwm_del_task(sc, systq, &sc->ba_task);
-   iwm_del_task(sc, systq, &sc->htprot_task);
-   }
 
sc->ns_nstate = nstate;
sc->ns_arg = arg;