Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process

2017-01-03 Thread Brian Norris
Hi,

On Thu, Dec 01, 2016 at 02:02:43PM +, Amitkumar Karwar wrote:
> > > I could not find async version of cancel_work().
> > 
> > cancel_work() *is* asynchronous. It does not synchronize with the last
> > event, so you won't have the deadlock. (Remember: the synchronous
> > version is cancel_work_sync().)
> 
> My bad! What I meant is "I could not find async version of cancel_work_sync()"
> cancel_work() isn't available in 
> http://lxr.free-electrons.com/source/kernel/workqueue.c

It's in 4.9-rc1 (and it's available at the above link, at least by now).
See:

commit f72b8792d180948b4b3898374998f5ac8c02e539
Author: Jens Axboe 
Date:   Wed Aug 24 15:51:50 2016 -0600

workqueue: add cancel_work()

But anyway:

> Anyways, clear_bit() after remove() during card reset would address the 
> problem.

Yes, I think that's OK.

Brian


RE: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process

2016-12-01 Thread Amitkumar Karwar
Hi Brian,

> From: Brian Norris [mailto:briannor...@chromium.org]
> Sent: Thursday, December 01, 2016 12:04 AM
> To: Amitkumar Karwar
> Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> raja...@google.com; dmitry.torok...@gmail.com; Xinming Hu
> Subject: Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during
> card remove process
> 
> On Wed, Nov 30, 2016 at 12:39:11PM +, Amitkumar Karwar wrote:
> 
> > > Ugh, yet another band-aid? You might do better to still cancel any
> > > pending work, just don't do it synchronously. i.e., do
> cancel_work()
> > > after you've removed the card. It doesn't make sense to do a FW
> dump
> > > on the "new" adapter when it was requested for the old one.
> >
> > I could not find async version of cancel_work().
> 
> cancel_work() *is* asynchronous. It does not synchronize with the last
> event, so you won't have the deadlock. (Remember: the synchronous
> version is cancel_work_sync().)

My bad! What I meant is "I could not find async version of cancel_work_sync()"
cancel_work() isn't available in 
http://lxr.free-electrons.com/source/kernel/workqueue.c
Anyways, clear_bit() after remove() during card reset would address the problem.

> 
> > We can fix this problem with below change at the end of
> > mwifiex_sdio_work(). All pending work requests would be ignored.
> >
> > 
> > @ -2571,6 +2571,8 @@ static void mwifiex_sdio_work(struct work_struct
> *work)
> > if (test_and_clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET,
> >_work_flags))
> > mwifiex_sdio_card_reset_work(save_adapter);
> > +   clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, _work_flags);
> > +   clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, _work_flags);
> >  }
> > --
> 
> I don't think that's exactly what you want. That might lose events,
> won't it? I'd rather this sort of hack go into
> mwifiex_recreate_adapter(), in between the remove() and probe() calls,
> where you don't expect any new events to trigger. And maybe include a
> comment as to why.

Right. I have just posted a patch for this.

> 
> > > I think I've asked elsewhere but didn't receive an answer: why is
> > > SDIO's mwifiex_recreate_adapter() so much different from PCIe's
> > > mwifiex_do_flr()? It seems like the latter should be refactored to
> > > remove some of the PCIe-specific cruft from main.c and then reused
> > > for SDIO.
> >
> > Our initial SDIO card reset implementation was based on MMC APIs
> where
> > remove() and probe() would automatically get called by MMC subsystem
> > after power cycle.
> > https://www.spinics.net/lists/linux-wireless/msg98435.html
> > Later it was improved by Andreas Fenkart by replacing those power
> > cycle APIs with mmc_hw_reset().
> 
> Right.
> 
> > For PCIe, function level reset is standard feature. We implemented
> > ".reset_notify" handler which gets called after and before FLR.
> 
> OK.
> 
> > You are right. We can have SDIO's handling similar to PCIe and avoid
> > destroying+recreating adapter/card.
> 
> So all in all, you're saying it's just an artifact of history, and
> there's no good reason they are so different? If so, then this looks
> like another instance where you would have done well to refactor and
> improve the existing mechanisms at the same time as you added new
> features (i.e., PCIe FLR). I've seen this problem already several
> times, where it seems development for your SDIO/PCIe/USB interface
> drivers occur almost in isolation. IMO, it'd do you well to notice
> these patterns while implementing features in the first place. The more
> code you can share, the fewer bugs you (or I) will have to chase down.

Thanks for your guidance. I'll follow this for future development.

Regards,
Amitkumar


RE: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process

2016-11-30 Thread Amitkumar Karwar
Hi Brian,

> > > >   *
> > > > @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct
> > > sdio_mmc_card *card)
> > > >  * discovered and initializes them from scratch.
> > > >  */
> > > >
> > > > -   mwifiex_sdio_remove(func);
> > > > +   __mwifiex_sdio_remove(func);
> > >
> > > ^^ So here, you're trying to avoid syncing with the card-reset work
> > > event, except that function will free up all your resources
> > > (including the static save_adapter). Thus, you're explicitly
> > > allowing a use-after- free error here. That seems unwise.
> >
> > Even if firmware dump is triggered after card reset is started, it
> > will execute after card reset is completed as discussed above. Only
> > problem I can see is with "save_adapter". We can put new_adapter
> > pointer in "save_adapter" at the end of mwifiex_recreate_adapter() to
> > solve the issue.
> 
> Ugh, yet another band-aid? You might do better to still cancel any
> pending work, just don't do it synchronously. i.e., do cancel_work()
> after you've removed the card. It doesn't make sense to do a FW dump on
> the "new" adapter when it was requested for the old one.

I could not find async version of cancel_work().
We can fix this problem with below change at the end of mwifiex_sdio_work(). 
All pending work requests would be ignored.


@ -2571,6 +2571,8 @@ static void mwifiex_sdio_work(struct work_struct *work)
if (test_and_clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET,
   _work_flags))
mwifiex_sdio_card_reset_work(save_adapter);
+   clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, _work_flags);
+   clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, _work_flags);
 }
--

> 
> > > Instead, you should actually retain the invariant that you're doing
> > > a full remove/reinitialize here, which includes doing the *same*
> > > cancel_work_sync() here in mwifiex_recreate_adapter() as you would
> > > in any other remove().
> >
> > We are executing mwifiex_recreate_adapter() as a part of sdio_work().
> > Calling
> > cancel_work_sync() here would cause deadlock. The API is supposed to
> > waits until sdio_work() is finished.
> 
> You are correct. So using the _sync() version would be wrong.
> 
> > >
> > > IOW, kill the __mwifiex_sdio_remove() and just call
> > > mwifiex_sdio_remove() as you were.
> > >
> > > That also means that you can do the same per-adapter cleanup in the
> > > following patch as you do for PCIe.
> >
> > Currently as sdio_work recreates "card", the work structure can't be
> moved inside card structure.
> > Let me know your suggestions.
> 
> If you address the TODO in mwifiex_create_adapter() instead, you can
> get past this problem:
> 
> /* TODO mmc_hw_reset does not require destroying and re-probing
> the
>  * whole adapter. Hence there was no need to for this rube-
> goldberg
>  * design to reload the fw from an external workqueue. If we
> don't
>  * destroy the adapter we could reload the fw from
>  * mwifiex_main_work_queue directly.
> 
> The "save_adapter" is an abomination that should be terminated swiftly,
> but it is perpetuated in part by the hacks noted in the TODO.
> 
> So I'd recommend addressing the TODO ASAP, but in the meantime, a hack
> like my suggestion (cancel the FW dump work w/o synchronizing) or --
> less preferably -- yours (manually set 'save_adapter' again) might be
> OK.
> 
> I think I've asked elsewhere but didn't receive an answer: why is
> SDIO's mwifiex_recreate_adapter() so much different from PCIe's
> mwifiex_do_flr()? It seems like the latter should be refactored to
> remove some of the PCIe-specific cruft from main.c and then reused for
> SDIO.

Our initial SDIO card reset implementation was based on MMC APIs where remove() 
and probe() would automatically get called by MMC subsystem after power cycle.
https://www.spinics.net/lists/linux-wireless/msg98435.html
Later it was improved by Andreas Fenkart by replacing those power cycle APIs 
with mmc_hw_reset().

For PCIe, function level reset is standard feature. We implemented 
".reset_notify" handler which gets called after and before FLR.

You are right. We can have SDIO's handling similar to PCIe and avoid 
destroying+recreating adapter/card.
We have started working on this. We will get rid of global save_adapter, 
sdio_work etc. Meanwhile I will post above mentioned change if it looks good to 
you.

Regards,
Amitkumar


Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process

2016-11-28 Thread Brian Norris
Hi Amit,

On Thu, Nov 24, 2016 at 12:14:07PM +, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannor...@chromium.org]
> > Sent: Monday, November 21, 2016 11:06 PM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> > raja...@google.com; dmitry.torok...@gmail.com; Xinming Hu
> > Subject: Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during
> > card remove process
> > 
> > On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <h...@marvell.com>
> > >
> > > Wait for firmware dump complete in card remove function.
> > > For sdio interface, there are two diffenrent cases, card reset
> > trigger
> > > sdio_work and firmware dump trigger sdio_work.
> > > Do code rearrangement for distinguish between these two cases.
> > 
> > On second review of the SDIO card reset code (which I'll repeat is
> > quite ugly), you seem to be making a bad distinction here. What if
> > there is a firmware dump happening concurrently with your card-reset
> > handling?
> > You *do* want to synchronize with the firmware dump before completing the
> > card reset, or else you might be freeing up internal card resources
> > that are still in use. See below.
> 
> I ran some tests and observed that if same work function is scheduled
> by two threads, it won't have re-entrant calls. They will be executed
> one after another. In SDIO work function, we have SDIO card reset call
> after completing firmware dump. So firmware dump won't run
> concurrently with card-reset as per my understanding.

Ah, you're correct. It's somewhat obscure and potentially fragile, but
correct AFAICT. As you noted though, you do still have a use-after-free
bug, even if the concurrency isn't quite as high as I thought.

> > > Signed-off-by: Xinming Hu <h...@marvell.com>
> > > Signed-off-by: Amitkumar Karwar <akar...@marvell.com>
> > > ---

...

> > > --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> > > @@ -46,6 +46,9 @@
> > >   */
> > >  static u8 user_rmmod;
> > >
> > > +static void mwifiex_sdio_work(struct work_struct *work); static
> > > +DECLARE_WORK(sdio_work, mwifiex_sdio_work);
> > > +
> > >  static struct mwifiex_if_ops sdio_ops;  static unsigned long
> > > iface_work_flags;
> > >
> > > @@ -220,7 +223,7 @@ static int mwifiex_sdio_resume(struct device
> > *dev)
> > >   * This function removes the interface and frees up the card
> > structure.
> > >   */
> > >  static void
> > > -mwifiex_sdio_remove(struct sdio_func *func)
> > > +__mwifiex_sdio_remove(struct sdio_func *func)
> > >  {
> > >   struct sdio_mmc_card *card;
> > >   struct mwifiex_adapter *adapter;
> > > @@ -249,6 +252,13 @@ static int mwifiex_sdio_resume(struct device
> > *dev)
> > >   mwifiex_remove_card(adapter);
> > >  }
> > >
> > > +static void
> > > +mwifiex_sdio_remove(struct sdio_func *func) {
> > > + cancel_work_sync(_work);
> > > + __mwifiex_sdio_remove(func);
> > > +}
> > > +
> > >  /*
> > >   * SDIO suspend.
> > >   *
> > > @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct
> > sdio_mmc_card *card)
> > >* discovered and initializes them from scratch.
> > >*/
> > >
> > > - mwifiex_sdio_remove(func);
> > > + __mwifiex_sdio_remove(func);
> > 
> > ^^ So here, you're trying to avoid syncing with the card-reset work
> > event, except that function will free up all your resources (including
> > the static save_adapter). Thus, you're explicitly allowing a use-after-
> > free error here. That seems unwise.
> 
> Even if firmware dump is triggered after card reset is started, it
> will execute after card reset is completed as discussed above. Only
> problem I can see is with "save_adapter". We can put new_adapter
> pointer in "save_adapter" at the end of mwifiex_recreate_adapter() to
> solve the issue.

Ugh, yet another band-aid? You might do better to still cancel any
pending work, just don't do it synchronously. i.e., do cancel_work()
after you've removed the card. It doesn't make sense to do a FW dump on
the "new" adapter when it was requested for the old one.

> > Instead, you should actually retain the invariant that you're doing a
> > full remove/reinitialize here, which includes doing the *same*
> > cancel_work_sync

Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process

2016-11-21 Thread Brian Norris
Hi,

On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote:
> From: Xinming Hu 
> 
> Wait for firmware dump complete in card remove function.
> For sdio interface, there are two diffenrent cases,
> card reset trigger sdio_work and firmware dump trigger sdio_work.
> Do code rearrangement for distinguish between these two cases.

On second review of the SDIO card reset code (which I'll repeat is quite
ugly), you seem to be making a bad distinction here. What if there is a
firmware dump happening concurrently with your card-reset handling? You
*do* want to synchronize with the firmware dump before completing the
card reset, or else you might be freeing up internal card resources that
are still in use. See below.

> 
> Signed-off-by: Xinming Hu 
> Signed-off-by: Amitkumar Karwar 
> ---
> v2: 1. Get rid of reset_triggered flag. Instead split the code and use
> __mwifiex_sdio_remove() (Brian Norris/Dmitry Torokhov)
> 2. "v1 4/5 mwifiex: firmware dump code rearrangement.." is dropped. So
> rebased accordingly.
> v3: same as [v2,5/5]. The improvement of 'moving pcie_work to card struct'
> suggested by Brian is taken care in next patch.
> ---
>  drivers/net/wireless/marvell/mwifiex/pcie.c |  6 +-
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 15 ---
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c 
> b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index dd8f7aa..c8e69a4 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -51,6 +51,9 @@ static int mwifiex_pcie_probe_of(struct device *dev)
>   return 0;
>  }
>  
> +static void mwifiex_pcie_work(struct work_struct *work);
> +static DECLARE_WORK(pcie_work, mwifiex_pcie_work);
> +
>  static int
>  mwifiex_map_pci_memory(struct mwifiex_adapter *adapter, struct sk_buff *skb,
>  size_t size, int flags)
> @@ -254,6 +257,8 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
>   if (!adapter || !adapter->priv_num)
>   return;
>  
> + cancel_work_sync(_work);
> +
>   if (user_rmmod && !adapter->mfg_mode) {
>   mwifiex_deauthenticate_all(adapter);
>  
> @@ -2722,7 +2727,6 @@ static void mwifiex_pcie_work(struct work_struct *work)
>   mwifiex_pcie_device_dump_work(save_adapter);
>  }
>  
> -static DECLARE_WORK(pcie_work, mwifiex_pcie_work);
>  /* This function dumps FW information */
>  static void mwifiex_pcie_device_dump(struct mwifiex_adapter *adapter)
>  {
> diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c 
> b/drivers/net/wireless/marvell/mwifiex/sdio.c
> index 16d1d30..78f2cc9 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> @@ -46,6 +46,9 @@
>   */
>  static u8 user_rmmod;
>  
> +static void mwifiex_sdio_work(struct work_struct *work);
> +static DECLARE_WORK(sdio_work, mwifiex_sdio_work);
> +
>  static struct mwifiex_if_ops sdio_ops;
>  static unsigned long iface_work_flags;
>  
> @@ -220,7 +223,7 @@ static int mwifiex_sdio_resume(struct device *dev)
>   * This function removes the interface and frees up the card structure.
>   */
>  static void
> -mwifiex_sdio_remove(struct sdio_func *func)
> +__mwifiex_sdio_remove(struct sdio_func *func)
>  {
>   struct sdio_mmc_card *card;
>   struct mwifiex_adapter *adapter;
> @@ -249,6 +252,13 @@ static int mwifiex_sdio_resume(struct device *dev)
>   mwifiex_remove_card(adapter);
>  }
>  
> +static void
> +mwifiex_sdio_remove(struct sdio_func *func)
> +{
> + cancel_work_sync(_work);
> + __mwifiex_sdio_remove(func);
> +}
> +
>  /*
>   * SDIO suspend.
>   *
> @@ -2227,7 +2237,7 @@ static void mwifiex_recreate_adapter(struct 
> sdio_mmc_card *card)
>* discovered and initializes them from scratch.
>*/
>  
> - mwifiex_sdio_remove(func);
> + __mwifiex_sdio_remove(func);

^^ So here, you're trying to avoid syncing with the card-reset work
event, except that function will free up all your resources (including
the static save_adapter). Thus, you're explicitly allowing a
use-after-free error here. That seems unwise.

Instead, you should actually retain the invariant that you're doing a
full remove/reinitialize here, which includes doing the *same*
cancel_work_sync() here in mwifiex_recreate_adapter() as you would in
any other remove().

IOW, kill the __mwifiex_sdio_remove() and just call
mwifiex_sdio_remove() as you were.

That also means that you can do the same per-adapter cleanup in the
following patch as you do for PCIe.

Brian

>  
>   /*
>* Normally, we would let the driver core take care of releasing these.
> @@ -2568,7 +2578,6 @@ static void mwifiex_sdio_work(struct work_struct *work)
>   mwifiex_sdio_card_reset_work(save_adapter);
>  }
>  
> -static DECLARE_WORK(sdio_work, mwifiex_sdio_work);
>  

Re: [PATCH v3 4/5] mwifiex: wait firmware dump complete during card remove process

2016-11-16 Thread Brian Norris
On Wed, Nov 16, 2016 at 06:39:08PM +0530, Amitkumar Karwar wrote:
> From: Xinming Hu 
> 
> Wait for firmware dump complete in card remove function.
> For sdio interface, there are two diffenrent cases,
> card reset trigger sdio_work and firmware dump trigger sdio_work.
> Do code rearrangement for distinguish between these two cases.
> 
> Signed-off-by: Xinming Hu 
> Signed-off-by: Amitkumar Karwar 
> ---
> v2: 1. Get rid of reset_triggered flag. Instead split the code and use
> __mwifiex_sdio_remove() (Brian Norris/Dmitry Torokhov)
> 2. "v1 4/5 mwifiex: firmware dump code rearrangement.." is dropped. So
> rebased accordingly.
> v3: same as [v2,5/5]. The improvement of 'moving pcie_work to card struct'
> suggested by Brian is taken care in next patch.
> ---
>  drivers/net/wireless/marvell/mwifiex/pcie.c |  6 +-
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 15 ---
>  2 files changed, 17 insertions(+), 4 deletions(-)

I've expressed my unhappiness with the SDIO card-reset code that already
exists (and prevents doing sane code improvements on the rest of the
driver), but the current change looks OK anyway.

I haven't reviewed patch 1 as closely, but for 2 to 5:

Reviewed-by: Brian Norris