Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
On Thu, Mar 16, 2017 at 12:38:57PM -0700, Brian Norris wrote: > On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote: > > On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote: > > > You need to check this flag before queueing firmware dump work, and > > > make sure it is not racy with setting this flag in mwifiex_pcie_remove() > > > (and sdio). > > > > That's another approach that could work, but it's a little more > > invasive. > > Never mind, that isn't too invasive. There's only one schedule_work() in > pcie.c and two in sdio.c. We could even factor out a helper, that knows > how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted > to... OK, so I took a crack at implementing this, and after thinking about it, the "make sure it is not racy with setting this flag" part is tougher than it seems. In the end, I think the key is that to eliminate the race between setting and checking the flag, we just want to halt all sources of more work -- e.g., commands (which could time out), or debugfs entries (which could trigger a FW dump manually) -- without fiddling with extra flags. We do this already in the first half of mwifiex_remove_card(), when we terminate the main workqueue(s) and unregister the net and wiphy devices. IOW, we can move the cancel_work_sync() into the .cleanup_if() callback, which occurs after the above described teardown, but before the PCIe driver has actually called things like pci_disable_device() [1]. Then we don't need any DONT_RUN flag either. I'll test the above a bit more here, then send a v3 myself, with the above reasoning captured. I *think* that should eliminate all the races we've discussed here. Brian [1] BTW, I think I previously blamed mwifiex_init_shutdown_fw() for racing with the FW dumper; I think that is not actually the smoking gun (it was an educated guess). Based on testing, I see aborts if we're still accessing the PCIe device (e.g., in the FW dumper) after mwifiex_cleanup_pcie() -> pci_disable_device().
Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
Hi Dmitry and Amit, On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote: > On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote: > > On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote: > > > We observed a SHUTDOWN command timeout during reboot stress test > > > due to a corner case firmware bug. It leads to use-after-free on > > > adapter structure pointer and crash. > > > > > > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing BTW, the 'DONT_RUN' suggestion was more of a pseudo-code suggestion than a real name, but I guess it's not terrible :) > > > any work scheduled after cancel_work_sync() call in teardown path > > > to resolve the issue. > > > > > > Signed-off-by: Amitkumar Karwar> > > --- > > > v2: New work_flag has been added to resolve the issue cleanly as per > > > Brian's suggestion. > > > --- > > > drivers/net/wireless/marvell/mwifiex/main.h | 1 + > > > drivers/net/wireless/marvell/mwifiex/pcie.c | 4 > > > drivers/net/wireless/marvell/mwifiex/sdio.c | 4 > > > 3 files changed, 9 insertions(+) > > > > > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h > > > b/drivers/net/wireless/marvell/mwifiex/main.h > > > index 5c82972..d5b1fd6 100644 > > > --- a/drivers/net/wireless/marvell/mwifiex/main.h > > > +++ b/drivers/net/wireless/marvell/mwifiex/main.h > > > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg { > > > enum mwifiex_iface_work_flags { > > > MWIFIEX_IFACE_WORK_DEVICE_DUMP, > > > MWIFIEX_IFACE_WORK_CARD_RESET, > > > + MWIFIEX_IFACE_WORK_DONT_RUN, > > > }; > > > > > > struct mwifiex_private { > > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c > > > b/drivers/net/wireless/marvell/mwifiex/pcie.c > > > index a0d9180..bb3d798 100644 > > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > > > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) > > > if (!adapter || !adapter->priv_num) > > > return; > > > > > > + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags); > > > cancel_work_sync(>work); > > > > > > reg = card->pcie.reg; > > > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct > > > *work) > > > struct pcie_service_card *card = > > > container_of(work, struct pcie_service_card, work); > > > > > > + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags)) > > > + return; > > > > I do not see how this could possible prevent use-after-free, assuming > > that the "card" memory is gone by the time mwifiex_pcie_work() gets to > > run. > > The 'card' memory isn't getting freed; it's the 'adapter' memory we're > worried about. This is either already freed (because the FW init > procedure failed), or else it's freed later in this function via > mwifiex_remove_card(). I guess there was a slight miscommunication here: Dmitry pointed out to me that he *was* actually talking about 'card' getting freed -- when it gets freed after remove() finishes. So the sequence would have to go like: 1. enter remove() 2. set DONT_RUN flag; cancel_work_sync() 3. begin to shutdown firmware 4. hit, e.g., a command timeout that schedules the work again 5. ** scheduler decides not to schedule the work for a while ** 6. we finish mwifiex_remove_card(), and exit from remove() successfully 7. devm_* frees the pcie_service_card (and enclosed work_struct) 8. scheduler tries to run our work item 9. use-after-free! However unlikely that the delay from 4 to 8 might be, this is indeed a race condition. > (We're also worried about having the FW dump race with the FW shutdown > sequence, which can begin later in this function. This patch blocks both > races AFAICT.) > > > You need to check this flag before queueing firmware dump work, and > > make sure it is not racy with setting this flag in mwifiex_pcie_remove() > > (and sdio). > > That's another approach that could work, but it's a little more > invasive. Never mind, that isn't too invasive. There's only one schedule_work() in pcie.c and two in sdio.c. We could even factor out a helper, that knows how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted to... Brian
Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
Hi Dmitry, On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote: > On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote: > > We observed a SHUTDOWN command timeout during reboot stress test > > due to a corner case firmware bug. It leads to use-after-free on > > adapter structure pointer and crash. > > > > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing > > any work scheduled after cancel_work_sync() call in teardown path > > to resolve the issue. > > > > Signed-off-by: Amitkumar Karwar> > --- > > v2: New work_flag has been added to resolve the issue cleanly as per > > Brian's suggestion. > > --- > > drivers/net/wireless/marvell/mwifiex/main.h | 1 + > > drivers/net/wireless/marvell/mwifiex/pcie.c | 4 > > drivers/net/wireless/marvell/mwifiex/sdio.c | 4 > > 3 files changed, 9 insertions(+) > > > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h > > b/drivers/net/wireless/marvell/mwifiex/main.h > > index 5c82972..d5b1fd6 100644 > > --- a/drivers/net/wireless/marvell/mwifiex/main.h > > +++ b/drivers/net/wireless/marvell/mwifiex/main.h > > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg { > > enum mwifiex_iface_work_flags { > > MWIFIEX_IFACE_WORK_DEVICE_DUMP, > > MWIFIEX_IFACE_WORK_CARD_RESET, > > + MWIFIEX_IFACE_WORK_DONT_RUN, > > }; > > > > struct mwifiex_private { > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c > > b/drivers/net/wireless/marvell/mwifiex/pcie.c > > index a0d9180..bb3d798 100644 > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) > > if (!adapter || !adapter->priv_num) > > return; > > > > + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags); > > cancel_work_sync(>work); > > > > reg = card->pcie.reg; > > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct > > *work) > > struct pcie_service_card *card = > > container_of(work, struct pcie_service_card, work); > > > > + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags)) > > + return; > > I do not see how this could possible prevent use-after-free, assuming > that the "card" memory is gone by the time mwifiex_pcie_work() gets to > run. The 'card' memory isn't getting freed; it's the 'adapter' memory we're worried about. This is either already freed (because the FW init procedure failed), or else it's freed later in this function via mwifiex_remove_card(). (We're also worried about having the FW dump race with the FW shutdown sequence, which can begin later in this function. This patch blocks both races AFAICT.) > You need to check this flag before queueing firmware dump work, and > make sure it is not racy with setting this flag in mwifiex_pcie_remove() > (and sdio). That's another approach that could work, but it's a little more invasive. I'm still reviewing and testing this, but I believe this is nearly equivalent to my own draft version, which tested out fine. Brian
Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote: > We observed a SHUTDOWN command timeout during reboot stress test > due to a corner case firmware bug. It leads to use-after-free on > adapter structure pointer and crash. > > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing > any work scheduled after cancel_work_sync() call in teardown path > to resolve the issue. > > Signed-off-by: Amitkumar Karwar> --- > v2: New work_flag has been added to resolve the issue cleanly as per > Brian's suggestion. > --- > drivers/net/wireless/marvell/mwifiex/main.h | 1 + > drivers/net/wireless/marvell/mwifiex/pcie.c | 4 > drivers/net/wireless/marvell/mwifiex/sdio.c | 4 > 3 files changed, 9 insertions(+) > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h > b/drivers/net/wireless/marvell/mwifiex/main.h > index 5c82972..d5b1fd6 100644 > --- a/drivers/net/wireless/marvell/mwifiex/main.h > +++ b/drivers/net/wireless/marvell/mwifiex/main.h > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg { > enum mwifiex_iface_work_flags { > MWIFIEX_IFACE_WORK_DEVICE_DUMP, > MWIFIEX_IFACE_WORK_CARD_RESET, > + MWIFIEX_IFACE_WORK_DONT_RUN, > }; > > struct mwifiex_private { > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c > b/drivers/net/wireless/marvell/mwifiex/pcie.c > index a0d9180..bb3d798 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) > if (!adapter || !adapter->priv_num) > return; > > + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags); > cancel_work_sync(>work); > > reg = card->pcie.reg; > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work) > struct pcie_service_card *card = > container_of(work, struct pcie_service_card, work); > > + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags)) > + return; I do not see how this could possible prevent use-after-free, assuming that the "card" memory is gone by the time mwifiex_pcie_work() gets to run. You need to check this flag before queueing firmware dump work, and make sure it is not racy with setting this flag in mwifiex_pcie_remove() (and sdio). Thanks. -- Dmitry
[PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
We observed a SHUTDOWN command timeout during reboot stress test due to a corner case firmware bug. It leads to use-after-free on adapter structure pointer and crash. Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing any work scheduled after cancel_work_sync() call in teardown path to resolve the issue. Signed-off-by: Amitkumar Karwar--- v2: New work_flag has been added to resolve the issue cleanly as per Brian's suggestion. --- drivers/net/wireless/marvell/mwifiex/main.h | 1 + drivers/net/wireless/marvell/mwifiex/pcie.c | 4 drivers/net/wireless/marvell/mwifiex/sdio.c | 4 3 files changed, 9 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index 5c82972..d5b1fd6 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg { enum mwifiex_iface_work_flags { MWIFIEX_IFACE_WORK_DEVICE_DUMP, MWIFIEX_IFACE_WORK_CARD_RESET, + MWIFIEX_IFACE_WORK_DONT_RUN, }; struct mwifiex_private { diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index a0d9180..bb3d798 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) if (!adapter || !adapter->priv_num) return; + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags); cancel_work_sync(>work); reg = card->pcie.reg; @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work) struct pcie_service_card *card = container_of(work, struct pcie_service_card, work); + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags)) + return; + if (test_and_clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, >work_flags)) mwifiex_pcie_device_dump_work(card->adapter); diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c index a4b356d..8140bb4 100644 --- a/drivers/net/wireless/marvell/mwifiex/sdio.c +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c @@ -387,6 +387,7 @@ static int mwifiex_check_winner_status(struct mwifiex_adapter *adapter) if (!adapter || !adapter->priv_num) return; + set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags); cancel_work_sync(>work); mwifiex_dbg(adapter, INFO, "info: SDIO func num=%d\n", func->num); @@ -2514,6 +2515,9 @@ static void mwifiex_sdio_work(struct work_struct *work) struct sdio_mmc_card *card = container_of(work, struct sdio_mmc_card, work); + if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, >work_flags)) + return; + if (test_and_clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, >work_flags)) mwifiex_sdio_device_dump_work(card->adapter); -- 1.9.1