Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process
Hi, On Thu, Dec 01, 2016 at 02:02:43PM +, Amitkumar Karwar wrote: > > > I could not find async version of cancel_work(). > > > > cancel_work() *is* asynchronous. It does not synchronize with the last > > event, so you won't have the deadlock. (Remember: the synchronous > > version is cancel_work_sync().) > > My bad! What I meant is "I could not find async version of cancel_work_sync()" > cancel_work() isn't available in > http://lxr.free-electrons.com/source/kernel/workqueue.c It's in 4.9-rc1 (and it's available at the above link, at least by now). See: commit f72b8792d180948b4b3898374998f5ac8c02e539 Author: Jens AxboeDate: Wed Aug 24 15:51:50 2016 -0600 workqueue: add cancel_work() But anyway: > Anyways, clear_bit() after remove() during card reset would address the > problem. Yes, I think that's OK. Brian
RE: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process
Hi Brian, > From: Brian Norris [mailto:briannor...@chromium.org] > Sent: Thursday, December 01, 2016 12:04 AM > To: Amitkumar Karwar > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; > raja...@google.com; dmitry.torok...@gmail.com; Xinming Hu > Subject: Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during > card remove process > > On Wed, Nov 30, 2016 at 12:39:11PM +, Amitkumar Karwar wrote: > > > > Ugh, yet another band-aid? You might do better to still cancel any > > > pending work, just don't do it synchronously. i.e., do > cancel_work() > > > after you've removed the card. It doesn't make sense to do a FW > dump > > > on the "new" adapter when it was requested for the old one. > > > > I could not find async version of cancel_work(). > > cancel_work() *is* asynchronous. It does not synchronize with the last > event, so you won't have the deadlock. (Remember: the synchronous > version is cancel_work_sync().) My bad! What I meant is "I could not find async version of cancel_work_sync()" cancel_work() isn't available in http://lxr.free-electrons.com/source/kernel/workqueue.c Anyways, clear_bit() after remove() during card reset would address the problem. > > > We can fix this problem with below change at the end of > > mwifiex_sdio_work(). All pending work requests would be ignored. > > > > > > @ -2571,6 +2571,8 @@ static void mwifiex_sdio_work(struct work_struct > *work) > > if (test_and_clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, > >_work_flags)) > > mwifiex_sdio_card_reset_work(save_adapter); > > + clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, _work_flags); > > + clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, _work_flags); > > } > > -- > > I don't think that's exactly what you want. That might lose events, > won't it? I'd rather this sort of hack go into > mwifiex_recreate_adapter(), in between the remove() and probe() calls, > where you don't expect any new events to trigger. And maybe include a > comment as to why. Right. I have just posted a patch for this. > > > > I think I've asked elsewhere but didn't receive an answer: why is > > > SDIO's mwifiex_recreate_adapter() so much different from PCIe's > > > mwifiex_do_flr()? It seems like the latter should be refactored to > > > remove some of the PCIe-specific cruft from main.c and then reused > > > for SDIO. > > > > Our initial SDIO card reset implementation was based on MMC APIs > where > > remove() and probe() would automatically get called by MMC subsystem > > after power cycle. > > https://www.spinics.net/lists/linux-wireless/msg98435.html > > Later it was improved by Andreas Fenkart by replacing those power > > cycle APIs with mmc_hw_reset(). > > Right. > > > For PCIe, function level reset is standard feature. We implemented > > ".reset_notify" handler which gets called after and before FLR. > > OK. > > > You are right. We can have SDIO's handling similar to PCIe and avoid > > destroying+recreating adapter/card. > > So all in all, you're saying it's just an artifact of history, and > there's no good reason they are so different? If so, then this looks > like another instance where you would have done well to refactor and > improve the existing mechanisms at the same time as you added new > features (i.e., PCIe FLR). I've seen this problem already several > times, where it seems development for your SDIO/PCIe/USB interface > drivers occur almost in isolation. IMO, it'd do you well to notice > these patterns while implementing features in the first place. The more > code you can share, the fewer bugs you (or I) will have to chase down. Thanks for your guidance. I'll follow this for future development. Regards, Amitkumar
RE: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process
Hi Brian, > > > > * > > > > @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct > > > sdio_mmc_card *card) > > > > * discovered and initializes them from scratch. > > > > */ > > > > > > > > - mwifiex_sdio_remove(func); > > > > + __mwifiex_sdio_remove(func); > > > > > > ^^ So here, you're trying to avoid syncing with the card-reset work > > > event, except that function will free up all your resources > > > (including the static save_adapter). Thus, you're explicitly > > > allowing a use-after- free error here. That seems unwise. > > > > Even if firmware dump is triggered after card reset is started, it > > will execute after card reset is completed as discussed above. Only > > problem I can see is with "save_adapter". We can put new_adapter > > pointer in "save_adapter" at the end of mwifiex_recreate_adapter() to > > solve the issue. > > Ugh, yet another band-aid? You might do better to still cancel any > pending work, just don't do it synchronously. i.e., do cancel_work() > after you've removed the card. It doesn't make sense to do a FW dump on > the "new" adapter when it was requested for the old one. I could not find async version of cancel_work(). We can fix this problem with below change at the end of mwifiex_sdio_work(). All pending work requests would be ignored. @ -2571,6 +2571,8 @@ static void mwifiex_sdio_work(struct work_struct *work) if (test_and_clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, _work_flags)) mwifiex_sdio_card_reset_work(save_adapter); + clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, _work_flags); + clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, _work_flags); } -- > > > > Instead, you should actually retain the invariant that you're doing > > > a full remove/reinitialize here, which includes doing the *same* > > > cancel_work_sync() here in mwifiex_recreate_adapter() as you would > > > in any other remove(). > > > > We are executing mwifiex_recreate_adapter() as a part of sdio_work(). > > Calling > > cancel_work_sync() here would cause deadlock. The API is supposed to > > waits until sdio_work() is finished. > > You are correct. So using the _sync() version would be wrong. > > > > > > > IOW, kill the __mwifiex_sdio_remove() and just call > > > mwifiex_sdio_remove() as you were. > > > > > > That also means that you can do the same per-adapter cleanup in the > > > following patch as you do for PCIe. > > > > Currently as sdio_work recreates "card", the work structure can't be > moved inside card structure. > > Let me know your suggestions. > > If you address the TODO in mwifiex_create_adapter() instead, you can > get past this problem: > > /* TODO mmc_hw_reset does not require destroying and re-probing > the > * whole adapter. Hence there was no need to for this rube- > goldberg > * design to reload the fw from an external workqueue. If we > don't > * destroy the adapter we could reload the fw from > * mwifiex_main_work_queue directly. > > The "save_adapter" is an abomination that should be terminated swiftly, > but it is perpetuated in part by the hacks noted in the TODO. > > So I'd recommend addressing the TODO ASAP, but in the meantime, a hack > like my suggestion (cancel the FW dump work w/o synchronizing) or -- > less preferably -- yours (manually set 'save_adapter' again) might be > OK. > > I think I've asked elsewhere but didn't receive an answer: why is > SDIO's mwifiex_recreate_adapter() so much different from PCIe's > mwifiex_do_flr()? It seems like the latter should be refactored to > remove some of the PCIe-specific cruft from main.c and then reused for > SDIO. Our initial SDIO card reset implementation was based on MMC APIs where remove() and probe() would automatically get called by MMC subsystem after power cycle. https://www.spinics.net/lists/linux-wireless/msg98435.html Later it was improved by Andreas Fenkart by replacing those power cycle APIs with mmc_hw_reset(). For PCIe, function level reset is standard feature. We implemented ".reset_notify" handler which gets called after and before FLR. You are right. We can have SDIO's handling similar to PCIe and avoid destroying+recreating adapter/card. We have started working on this. We will get rid of global save_adapter, sdio_work etc. Meanwhile I will post above mentioned change if it looks good to you. Regards, Amitkumar
Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process
Hi Amit, On Thu, Nov 24, 2016 at 12:14:07PM +, Amitkumar Karwar wrote: > > From: Brian Norris [mailto:briannor...@chromium.org] > > Sent: Monday, November 21, 2016 11:06 PM > > To: Amitkumar Karwar > > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; > > raja...@google.com; dmitry.torok...@gmail.com; Xinming Hu > > Subject: Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during > > card remove process > > > > On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote: > > > From: Xinming Hu <h...@marvell.com> > > > > > > Wait for firmware dump complete in card remove function. > > > For sdio interface, there are two diffenrent cases, card reset > > trigger > > > sdio_work and firmware dump trigger sdio_work. > > > Do code rearrangement for distinguish between these two cases. > > > > On second review of the SDIO card reset code (which I'll repeat is > > quite ugly), you seem to be making a bad distinction here. What if > > there is a firmware dump happening concurrently with your card-reset > > handling? > > You *do* want to synchronize with the firmware dump before completing the > > card reset, or else you might be freeing up internal card resources > > that are still in use. See below. > > I ran some tests and observed that if same work function is scheduled > by two threads, it won't have re-entrant calls. They will be executed > one after another. In SDIO work function, we have SDIO card reset call > after completing firmware dump. So firmware dump won't run > concurrently with card-reset as per my understanding. Ah, you're correct. It's somewhat obscure and potentially fragile, but correct AFAICT. As you noted though, you do still have a use-after-free bug, even if the concurrency isn't quite as high as I thought. > > > Signed-off-by: Xinming Hu <h...@marvell.com> > > > Signed-off-by: Amitkumar Karwar <akar...@marvell.com> > > > --- ... > > > --- a/drivers/net/wireless/marvell/mwifiex/sdio.c > > > +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c > > > @@ -46,6 +46,9 @@ > > > */ > > > static u8 user_rmmod; > > > > > > +static void mwifiex_sdio_work(struct work_struct *work); static > > > +DECLARE_WORK(sdio_work, mwifiex_sdio_work); > > > + > > > static struct mwifiex_if_ops sdio_ops; static unsigned long > > > iface_work_flags; > > > > > > @@ -220,7 +223,7 @@ static int mwifiex_sdio_resume(struct device > > *dev) > > > * This function removes the interface and frees up the card > > structure. > > > */ > > > static void > > > -mwifiex_sdio_remove(struct sdio_func *func) > > > +__mwifiex_sdio_remove(struct sdio_func *func) > > > { > > > struct sdio_mmc_card *card; > > > struct mwifiex_adapter *adapter; > > > @@ -249,6 +252,13 @@ static int mwifiex_sdio_resume(struct device > > *dev) > > > mwifiex_remove_card(adapter); > > > } > > > > > > +static void > > > +mwifiex_sdio_remove(struct sdio_func *func) { > > > + cancel_work_sync(_work); > > > + __mwifiex_sdio_remove(func); > > > +} > > > + > > > /* > > > * SDIO suspend. > > > * > > > @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct > > sdio_mmc_card *card) > > >* discovered and initializes them from scratch. > > >*/ > > > > > > - mwifiex_sdio_remove(func); > > > + __mwifiex_sdio_remove(func); > > > > ^^ So here, you're trying to avoid syncing with the card-reset work > > event, except that function will free up all your resources (including > > the static save_adapter). Thus, you're explicitly allowing a use-after- > > free error here. That seems unwise. > > Even if firmware dump is triggered after card reset is started, it > will execute after card reset is completed as discussed above. Only > problem I can see is with "save_adapter". We can put new_adapter > pointer in "save_adapter" at the end of mwifiex_recreate_adapter() to > solve the issue. Ugh, yet another band-aid? You might do better to still cancel any pending work, just don't do it synchronously. i.e., do cancel_work() after you've removed the card. It doesn't make sense to do a FW dump on the "new" adapter when it was requested for the old one. > > Instead, you should actually retain the invariant that you're doing a > > full remove/reinitialize here, which includes doing the *same* > > cancel_work_sync
Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process
Hi, On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote: > From: Xinming Hu> > Wait for firmware dump complete in card remove function. > For sdio interface, there are two diffenrent cases, > card reset trigger sdio_work and firmware dump trigger sdio_work. > Do code rearrangement for distinguish between these two cases. On second review of the SDIO card reset code (which I'll repeat is quite ugly), you seem to be making a bad distinction here. What if there is a firmware dump happening concurrently with your card-reset handling? You *do* want to synchronize with the firmware dump before completing the card reset, or else you might be freeing up internal card resources that are still in use. See below. > > Signed-off-by: Xinming Hu > Signed-off-by: Amitkumar Karwar > --- > v2: 1. Get rid of reset_triggered flag. Instead split the code and use > __mwifiex_sdio_remove() (Brian Norris/Dmitry Torokhov) > 2. "v1 4/5 mwifiex: firmware dump code rearrangement.." is dropped. So > rebased accordingly. > v3: same as [v2,5/5]. The improvement of 'moving pcie_work to card struct' > suggested by Brian is taken care in next patch. > --- > drivers/net/wireless/marvell/mwifiex/pcie.c | 6 +- > drivers/net/wireless/marvell/mwifiex/sdio.c | 15 --- > 2 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c > b/drivers/net/wireless/marvell/mwifiex/pcie.c > index dd8f7aa..c8e69a4 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > @@ -51,6 +51,9 @@ static int mwifiex_pcie_probe_of(struct device *dev) > return 0; > } > > +static void mwifiex_pcie_work(struct work_struct *work); > +static DECLARE_WORK(pcie_work, mwifiex_pcie_work); > + > static int > mwifiex_map_pci_memory(struct mwifiex_adapter *adapter, struct sk_buff *skb, > size_t size, int flags) > @@ -254,6 +257,8 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) > if (!adapter || !adapter->priv_num) > return; > > + cancel_work_sync(_work); > + > if (user_rmmod && !adapter->mfg_mode) { > mwifiex_deauthenticate_all(adapter); > > @@ -2722,7 +2727,6 @@ static void mwifiex_pcie_work(struct work_struct *work) > mwifiex_pcie_device_dump_work(save_adapter); > } > > -static DECLARE_WORK(pcie_work, mwifiex_pcie_work); > /* This function dumps FW information */ > static void mwifiex_pcie_device_dump(struct mwifiex_adapter *adapter) > { > diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c > b/drivers/net/wireless/marvell/mwifiex/sdio.c > index 16d1d30..78f2cc9 100644 > --- a/drivers/net/wireless/marvell/mwifiex/sdio.c > +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c > @@ -46,6 +46,9 @@ > */ > static u8 user_rmmod; > > +static void mwifiex_sdio_work(struct work_struct *work); > +static DECLARE_WORK(sdio_work, mwifiex_sdio_work); > + > static struct mwifiex_if_ops sdio_ops; > static unsigned long iface_work_flags; > > @@ -220,7 +223,7 @@ static int mwifiex_sdio_resume(struct device *dev) > * This function removes the interface and frees up the card structure. > */ > static void > -mwifiex_sdio_remove(struct sdio_func *func) > +__mwifiex_sdio_remove(struct sdio_func *func) > { > struct sdio_mmc_card *card; > struct mwifiex_adapter *adapter; > @@ -249,6 +252,13 @@ static int mwifiex_sdio_resume(struct device *dev) > mwifiex_remove_card(adapter); > } > > +static void > +mwifiex_sdio_remove(struct sdio_func *func) > +{ > + cancel_work_sync(_work); > + __mwifiex_sdio_remove(func); > +} > + > /* > * SDIO suspend. > * > @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct > sdio_mmc_card *card) >* discovered and initializes them from scratch. >*/ > > - mwifiex_sdio_remove(func); > + __mwifiex_sdio_remove(func); ^^ So here, you're trying to avoid syncing with the card-reset work event, except that function will free up all your resources (including the static save_adapter). Thus, you're explicitly allowing a use-after-free error here. That seems unwise. Instead, you should actually retain the invariant that you're doing a full remove/reinitialize here, which includes doing the *same* cancel_work_sync() here in mwifiex_recreate_adapter() as you would in any other remove(). IOW, kill the __mwifiex_sdio_remove() and just call mwifiex_sdio_remove() as you were. That also means that you can do the same per-adapter cleanup in the following patch as you do for PCIe. Brian > > /* >* Normally, we would let the driver core take care of releasing these. > @@ -2568,7 +2578,6 @@ static void mwifiex_sdio_work(struct work_struct *work) > mwifiex_sdio_card_reset_work(save_adapter); > } > > -static DECLARE_WORK(sdio_work, mwifiex_sdio_work); >
Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process
On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote: > From: Xinming Hu> > Wait for firmware dump complete in card remove function. > For sdio interface, there are two diffenrent cases, > card reset trigger sdio_work and firmware dump trigger sdio_work. > Do code rearrangement for distinguish between these two cases. > > Signed-off-by: Xinming Hu > Signed-off-by: Amitkumar Karwar > --- > v2: 1. Get rid of reset_triggered flag. Instead split the code and use > __mwifiex_sdio_remove() (Brian Norris/Dmitry Torokhov) > 2. "v1 4/5 mwifiex: firmware dump code rearrangement.." is dropped. So > rebased accordingly. > v3: same as [v2,5/5]. The improvement of 'moving pcie_work to card struct' > suggested by Brian is taken care in next patch. > --- > drivers/net/wireless/marvell/mwifiex/pcie.c | 6 +- > drivers/net/wireless/marvell/mwifiex/sdio.c | 15 --- > 2 files changed, 17 insertions(+), 4 deletions(-) I've expressed my unhappiness with the SDIO card-reset code that already exists (and prevents doing sane code improvements on the rest of the driver), but the current change looks OK anyway. I haven't reviewed patch 1 as closely, but for 2 to 5: Reviewed-by: Brian Norris