Re: [PATCH] ioat: fix tasklet tear down

2014-02-21 Thread Thomas Gleixner
On Thu, 20 Feb 2014, Dan Williams wrote:
> On Thu, Feb 20, 2014 at 2:30 AM, Thomas Gleixner  wrote:
> > That's 18 of 30 usage sites. Impressive
> >
> > We need to poke the relevant maintainers to get this solved.
> >
> 
> Maybe also rename tasklet_disable() to tasklet_pause() to make it
> clearer "this isn't the tasklet cleanup routine you're looking for"?

Indeed.

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-21 Thread Thomas Gleixner
On Thu, 20 Feb 2014, Dan Williams wrote:
 On Thu, Feb 20, 2014 at 2:30 AM, Thomas Gleixner t...@linutronix.de wrote:
  That's 18 of 30 usage sites. Impressive
 
  We need to poke the relevant maintainers to get this solved.
 
 
 Maybe also rename tasklet_disable() to tasklet_pause() to make it
 clearer this isn't the tasklet cleanup routine you're looking for?

Indeed.

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Dan Williams
On Thu, Feb 20, 2014 at 2:30 AM, Thomas Gleixner  wrote:
> B1;3202;0cOn Wed, 19 Feb 2014, Dan Williams wrote:
>
>> Since commit 77873803363c "net_dma: mark broken" we no longer pin dma
>> engines active for the network-receive-offload use case.  As a result
>> the ->free_chan_resources() that occurs after the driver self-test no
>> longer has a NET_DMA induced ->alloc_chan_resources() to back it up.  A
>> late firing irq can lead to ksoftirqd spinning indefinitely due to the
>> tasklet_disable() performed by ->free_chan_resources().  Only
>> ->alloc_chan_resources() can clear this condition in affected kernels.
>>
>> This problem has been present since commit 3e037454bcfa "I/OAT: Add
>> support for MSI and MSI-X" in 2.6.24, but is now exposed. Given the
>> NET_DMA use case is deprecated we can revisit moving the driver to use
>> threaded irqs.  For now, just tear down the irq and tasklet properly by:
>
> Right, moving to threaded irqs would get rid of the whole tasklet
> mess.
>
>> 1/ Disable the irq from triggering the tasklet
>>
>> 2/ Disable the irq from re-arming
>>
>> 3/ Flush inflight interrupts
>>
>> 4/ Flush the timer
>>
>> 5/ Flush inflight tasklets
>>
>> References:
>> https://lkml.org/lkml/2014/1/27/282
>> https://lkml.org/lkml/2014/2/19/672
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Steven Rostedt 
>> Cc: 
>> Reported-by: Mike Galbraith 
>> Reported-by: Stanislav Fomichev 
>> Signed-off-by: Dan Williams 
>
> Reviewed-by: Thomas Gleixner 
>
> As Mike pointed out tsi721_free_chan_resources() has the same issue.
>
> I did a quick scan of all tasklet_disable() sites. The teardown or
> similar wreckage is available in:
>
> drivers/atm/he.c
> drivers/dma/at_hdmac.c
> drivers/dma/pch_dma.c
> drivers/input/keyboard/omap-keypad.c
> drivers/isdn/gigaset/interface.c
> drivers/media/pci/mantis/mantis_dvb.c
> drivers/mmc/host/s3cmci.c
> drivers/net/ethernet/jme.c
> drivers/net/ethernet/silan/sc92031.c
> drivers/net/usb/r8152.c
> drivers/net/wireless/mwl8k.c
> drivers/ntb/ntb_hw.c
> drivers/rapidio/devices/tsi721_dma.c
> drivers/s390/crypto/ap_bus.c
> drivers/spi/spi-pl022.c
> drivers/staging/cxt1e1/linux.c
> drivers/staging/ozwpan/ozhcd.c
> drivers/usb/gadget/fsl_qe_udc.c
>
> That's 18 of 30 usage sites. Impressive
>
> We need to poke the relevant maintainers to get this solved.
>

Maybe also rename tasklet_disable() to tasklet_pause() to make it
clearer "this isn't the tasklet cleanup routine you're looking for"?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Thomas Gleixner
B1;3202;0cOn Wed, 19 Feb 2014, Dan Williams wrote:

> Since commit 77873803363c "net_dma: mark broken" we no longer pin dma
> engines active for the network-receive-offload use case.  As a result
> the ->free_chan_resources() that occurs after the driver self-test no
> longer has a NET_DMA induced ->alloc_chan_resources() to back it up.  A
> late firing irq can lead to ksoftirqd spinning indefinitely due to the
> tasklet_disable() performed by ->free_chan_resources().  Only
> ->alloc_chan_resources() can clear this condition in affected kernels.
> 
> This problem has been present since commit 3e037454bcfa "I/OAT: Add
> support for MSI and MSI-X" in 2.6.24, but is now exposed. Given the
> NET_DMA use case is deprecated we can revisit moving the driver to use
> threaded irqs.  For now, just tear down the irq and tasklet properly by:

Right, moving to threaded irqs would get rid of the whole tasklet
mess.
 
> 1/ Disable the irq from triggering the tasklet
> 
> 2/ Disable the irq from re-arming
> 
> 3/ Flush inflight interrupts
> 
> 4/ Flush the timer
> 
> 5/ Flush inflight tasklets
> 
> References:
> https://lkml.org/lkml/2014/1/27/282
> https://lkml.org/lkml/2014/2/19/672
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Steven Rostedt 
> Cc: 
> Reported-by: Mike Galbraith 
> Reported-by: Stanislav Fomichev 
> Signed-off-by: Dan Williams 

Reviewed-by: Thomas Gleixner 

As Mike pointed out tsi721_free_chan_resources() has the same issue.

I did a quick scan of all tasklet_disable() sites. The teardown or
similar wreckage is available in:

drivers/atm/he.c
drivers/dma/at_hdmac.c
drivers/dma/pch_dma.c
drivers/input/keyboard/omap-keypad.c
drivers/isdn/gigaset/interface.c
drivers/media/pci/mantis/mantis_dvb.c
drivers/mmc/host/s3cmci.c
drivers/net/ethernet/jme.c
drivers/net/ethernet/silan/sc92031.c
drivers/net/usb/r8152.c
drivers/net/wireless/mwl8k.c
drivers/ntb/ntb_hw.c
drivers/rapidio/devices/tsi721_dma.c
drivers/s390/crypto/ap_bus.c
drivers/spi/spi-pl022.c
drivers/staging/cxt1e1/linux.c
drivers/staging/ozwpan/ozhcd.c
drivers/usb/gadget/fsl_qe_udc.c

That's 18 of 30 usage sites. Impressive

We need to poke the relevant maintainers to get this solved.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Stanislav Fomichev
> Passes my testing, but would appreciate a tested-by.
Tested-by: Stanislav Fomichev 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Mike Galbraith
That was quick.  Afflicted box is now a happy camper.

(gee, systemd isn't _completely_ evil, it dug up a bug;)

Tested-by: Mike Galbraith 

On Wed, 2014-02-19 at 23:13 -0800, Dan Williams wrote: 
> Since commit 77873803363c "net_dma: mark broken" we no longer pin dma
> engines active for the network-receive-offload use case.  As a result
> the ->free_chan_resources() that occurs after the driver self-test no
> longer has a NET_DMA induced ->alloc_chan_resources() to back it up.  A
> late firing irq can lead to ksoftirqd spinning indefinitely due to the
> tasklet_disable() performed by ->free_chan_resources().  Only
> ->alloc_chan_resources() can clear this condition in affected kernels.
> 
> This problem has been present since commit 3e037454bcfa "I/OAT: Add
> support for MSI and MSI-X" in 2.6.24, but is now exposed. Given the
> NET_DMA use case is deprecated we can revisit moving the driver to use
> threaded irqs.  For now, just tear down the irq and tasklet properly by:
> 
> 1/ Disable the irq from triggering the tasklet
> 
> 2/ Disable the irq from re-arming
> 
> 3/ Flush inflight interrupts
> 
> 4/ Flush the timer
> 
> 5/ Flush inflight tasklets
> 
> References:
> https://lkml.org/lkml/2014/1/27/282
> https://lkml.org/lkml/2014/2/19/672
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Steven Rostedt 
> Cc: 
> Reported-by: Mike Galbraith 
> Reported-by: Stanislav Fomichev 
> Signed-off-by: Dan Williams 
> ---
> 
> Passes my testing, but would appreciate a tested-by.
> 
>  drivers/dma/ioat/dma.c|   52 
> +++--
>  drivers/dma/ioat/dma.h|1 +
>  drivers/dma/ioat/dma_v2.c |   11 --
>  drivers/dma/ioat/dma_v3.c |3 +++
>  4 files changed, 54 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
> index 87529181efcc..4e3549a16132 100644
> --- a/drivers/dma/ioat/dma.c
> +++ b/drivers/dma/ioat/dma.c
> @@ -77,7 +77,8 @@ static irqreturn_t ioat_dma_do_interrupt(int irq, void 
> *data)
>   attnstatus = readl(instance->reg_base + IOAT_ATTNSTATUS_OFFSET);
>   for_each_set_bit(bit, , BITS_PER_LONG) {
>   chan = ioat_chan_by_index(instance, bit);
> - tasklet_schedule(>cleanup_task);
> + if (test_bit(IOAT_RUN, >state))
> + tasklet_schedule(>cleanup_task);
>   }
>  
>   writeb(intrctrl, instance->reg_base + IOAT_INTRCTRL_OFFSET);
> @@ -93,7 +94,8 @@ static irqreturn_t ioat_dma_do_interrupt_msix(int irq, void 
> *data)
>  {
>   struct ioat_chan_common *chan = data;
>  
> - tasklet_schedule(>cleanup_task);
> + if (test_bit(IOAT_RUN, >state))
> + tasklet_schedule(>cleanup_task);
>  
>   return IRQ_HANDLED;
>  }
> @@ -116,7 +118,6 @@ void ioat_init_channel(struct ioatdma_device *device, 
> struct ioat_chan_common *c
>   chan->timer.function = device->timer_fn;
>   chan->timer.data = data;
>   tasklet_init(>cleanup_task, device->cleanup_fn, data);
> - tasklet_disable(>cleanup_task);
>  }
>  
>  /**
> @@ -354,13 +355,49 @@ static int ioat1_dma_alloc_chan_resources(struct 
> dma_chan *c)
>   writel(((u64) chan->completion_dma) >> 32,
>  chan->reg_base + IOAT_CHANCMP_OFFSET_HIGH);
>  
> - tasklet_enable(>cleanup_task);
> + set_bit(IOAT_RUN, >state);
>   ioat1_dma_start_null_desc(ioat);  /* give chain to dma device */
>   dev_dbg(to_dev(chan), "%s: allocated %d descriptors\n",
>   __func__, ioat->desccount);
>   return ioat->desccount;
>  }
>  
> +void ioat_stop(struct ioat_chan_common *chan)
> +{
> + struct ioatdma_device *device = chan->device;
> + struct pci_dev *pdev = device->pdev;
> + int chan_id = chan_num(chan);
> + struct msix_entry *msix;
> +
> + /* 1/ stop irq from firing tasklets
> +  * 2/ stop the tasklet from re-arming irqs
> +  */
> + clear_bit(IOAT_RUN, >state);
> +
> + /* flush inflight interrupts */
> + switch (device->irq_mode) {
> + case IOAT_MSIX:
> + msix = >msix_entries[chan_id];
> + synchronize_irq(msix->vector);
> + break;
> + case IOAT_MSI:
> + case IOAT_INTX:
> + synchronize_irq(pdev->irq);
> + break;
> + default:
> + break;
> + }
> +
> + /* flush inflight timers */
> + del_timer_sync(>timer);
> +
> + /* flush inflight tasklet runs */
> + tasklet_kill(>cleanup_task);
> +
> + /* final cleanup now that everything is quiesced and can't re-arm */
> + device->cleanup_fn((unsigned long) >common);
> +}
> +
>  /**
>   * ioat1_dma_free_chan_resources - release all the descriptors
>   * @chan: the channel to be cleaned
> @@ -379,9 +416,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan 
> *c)
>   if (ioat->desccount == 0)
>   return;
>  
> - tasklet_disable(>cleanup_task);
> - del_timer_sync(>timer);
> - ioat1_cleanup(ioat);
> + ioat_stop(chan);
>  

Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Mike Galbraith
That was quick.  Afflicted box is now a happy camper.

(gee, systemd isn't _completely_ evil, it dug up a bug;)

Tested-by: Mike Galbraith bitbuc...@online.de

On Wed, 2014-02-19 at 23:13 -0800, Dan Williams wrote: 
 Since commit 77873803363c net_dma: mark broken we no longer pin dma
 engines active for the network-receive-offload use case.  As a result
 the -free_chan_resources() that occurs after the driver self-test no
 longer has a NET_DMA induced -alloc_chan_resources() to back it up.  A
 late firing irq can lead to ksoftirqd spinning indefinitely due to the
 tasklet_disable() performed by -free_chan_resources().  Only
 -alloc_chan_resources() can clear this condition in affected kernels.
 
 This problem has been present since commit 3e037454bcfa I/OAT: Add
 support for MSI and MSI-X in 2.6.24, but is now exposed. Given the
 NET_DMA use case is deprecated we can revisit moving the driver to use
 threaded irqs.  For now, just tear down the irq and tasklet properly by:
 
 1/ Disable the irq from triggering the tasklet
 
 2/ Disable the irq from re-arming
 
 3/ Flush inflight interrupts
 
 4/ Flush the timer
 
 5/ Flush inflight tasklets
 
 References:
 https://lkml.org/lkml/2014/1/27/282
 https://lkml.org/lkml/2014/2/19/672
 
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@elte.hu
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: sta...@vger.kernel.org
 Reported-by: Mike Galbraith bitbuc...@online.de
 Reported-by: Stanislav Fomichev stfomic...@yandex-team.ru
 Signed-off-by: Dan Williams dan.j.willi...@intel.com
 ---
 
 Passes my testing, but would appreciate a tested-by.
 
  drivers/dma/ioat/dma.c|   52 
 +++--
  drivers/dma/ioat/dma.h|1 +
  drivers/dma/ioat/dma_v2.c |   11 --
  drivers/dma/ioat/dma_v3.c |3 +++
  4 files changed, 54 insertions(+), 13 deletions(-)
 
 diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
 index 87529181efcc..4e3549a16132 100644
 --- a/drivers/dma/ioat/dma.c
 +++ b/drivers/dma/ioat/dma.c
 @@ -77,7 +77,8 @@ static irqreturn_t ioat_dma_do_interrupt(int irq, void 
 *data)
   attnstatus = readl(instance-reg_base + IOAT_ATTNSTATUS_OFFSET);
   for_each_set_bit(bit, attnstatus, BITS_PER_LONG) {
   chan = ioat_chan_by_index(instance, bit);
 - tasklet_schedule(chan-cleanup_task);
 + if (test_bit(IOAT_RUN, chan-state))
 + tasklet_schedule(chan-cleanup_task);
   }
  
   writeb(intrctrl, instance-reg_base + IOAT_INTRCTRL_OFFSET);
 @@ -93,7 +94,8 @@ static irqreturn_t ioat_dma_do_interrupt_msix(int irq, void 
 *data)
  {
   struct ioat_chan_common *chan = data;
  
 - tasklet_schedule(chan-cleanup_task);
 + if (test_bit(IOAT_RUN, chan-state))
 + tasklet_schedule(chan-cleanup_task);
  
   return IRQ_HANDLED;
  }
 @@ -116,7 +118,6 @@ void ioat_init_channel(struct ioatdma_device *device, 
 struct ioat_chan_common *c
   chan-timer.function = device-timer_fn;
   chan-timer.data = data;
   tasklet_init(chan-cleanup_task, device-cleanup_fn, data);
 - tasklet_disable(chan-cleanup_task);
  }
  
  /**
 @@ -354,13 +355,49 @@ static int ioat1_dma_alloc_chan_resources(struct 
 dma_chan *c)
   writel(((u64) chan-completion_dma)  32,
  chan-reg_base + IOAT_CHANCMP_OFFSET_HIGH);
  
 - tasklet_enable(chan-cleanup_task);
 + set_bit(IOAT_RUN, chan-state);
   ioat1_dma_start_null_desc(ioat);  /* give chain to dma device */
   dev_dbg(to_dev(chan), %s: allocated %d descriptors\n,
   __func__, ioat-desccount);
   return ioat-desccount;
  }
  
 +void ioat_stop(struct ioat_chan_common *chan)
 +{
 + struct ioatdma_device *device = chan-device;
 + struct pci_dev *pdev = device-pdev;
 + int chan_id = chan_num(chan);
 + struct msix_entry *msix;
 +
 + /* 1/ stop irq from firing tasklets
 +  * 2/ stop the tasklet from re-arming irqs
 +  */
 + clear_bit(IOAT_RUN, chan-state);
 +
 + /* flush inflight interrupts */
 + switch (device-irq_mode) {
 + case IOAT_MSIX:
 + msix = device-msix_entries[chan_id];
 + synchronize_irq(msix-vector);
 + break;
 + case IOAT_MSI:
 + case IOAT_INTX:
 + synchronize_irq(pdev-irq);
 + break;
 + default:
 + break;
 + }
 +
 + /* flush inflight timers */
 + del_timer_sync(chan-timer);
 +
 + /* flush inflight tasklet runs */
 + tasklet_kill(chan-cleanup_task);
 +
 + /* final cleanup now that everything is quiesced and can't re-arm */
 + device-cleanup_fn((unsigned long) chan-common);
 +}
 +
  /**
   * ioat1_dma_free_chan_resources - release all the descriptors
   * @chan: the channel to be cleaned
 @@ -379,9 +416,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan 
 *c)
   if (ioat-desccount == 0)
   return;
  
 - tasklet_disable(chan-cleanup_task);
 - 

Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Stanislav Fomichev
 Passes my testing, but would appreciate a tested-by.
Tested-by: Stanislav Fomichev stfomic...@yandex-team.ru
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Thomas Gleixner
B1;3202;0cOn Wed, 19 Feb 2014, Dan Williams wrote:

 Since commit 77873803363c net_dma: mark broken we no longer pin dma
 engines active for the network-receive-offload use case.  As a result
 the -free_chan_resources() that occurs after the driver self-test no
 longer has a NET_DMA induced -alloc_chan_resources() to back it up.  A
 late firing irq can lead to ksoftirqd spinning indefinitely due to the
 tasklet_disable() performed by -free_chan_resources().  Only
 -alloc_chan_resources() can clear this condition in affected kernels.
 
 This problem has been present since commit 3e037454bcfa I/OAT: Add
 support for MSI and MSI-X in 2.6.24, but is now exposed. Given the
 NET_DMA use case is deprecated we can revisit moving the driver to use
 threaded irqs.  For now, just tear down the irq and tasklet properly by:

Right, moving to threaded irqs would get rid of the whole tasklet
mess.
 
 1/ Disable the irq from triggering the tasklet
 
 2/ Disable the irq from re-arming
 
 3/ Flush inflight interrupts
 
 4/ Flush the timer
 
 5/ Flush inflight tasklets
 
 References:
 https://lkml.org/lkml/2014/1/27/282
 https://lkml.org/lkml/2014/2/19/672
 
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@elte.hu
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: sta...@vger.kernel.org
 Reported-by: Mike Galbraith bitbuc...@online.de
 Reported-by: Stanislav Fomichev stfomic...@yandex-team.ru
 Signed-off-by: Dan Williams dan.j.willi...@intel.com

Reviewed-by: Thomas Gleixner t...@linutronix.de

As Mike pointed out tsi721_free_chan_resources() has the same issue.

I did a quick scan of all tasklet_disable() sites. The teardown or
similar wreckage is available in:

drivers/atm/he.c
drivers/dma/at_hdmac.c
drivers/dma/pch_dma.c
drivers/input/keyboard/omap-keypad.c
drivers/isdn/gigaset/interface.c
drivers/media/pci/mantis/mantis_dvb.c
drivers/mmc/host/s3cmci.c
drivers/net/ethernet/jme.c
drivers/net/ethernet/silan/sc92031.c
drivers/net/usb/r8152.c
drivers/net/wireless/mwl8k.c
drivers/ntb/ntb_hw.c
drivers/rapidio/devices/tsi721_dma.c
drivers/s390/crypto/ap_bus.c
drivers/spi/spi-pl022.c
drivers/staging/cxt1e1/linux.c
drivers/staging/ozwpan/ozhcd.c
drivers/usb/gadget/fsl_qe_udc.c

That's 18 of 30 usage sites. Impressive

We need to poke the relevant maintainers to get this solved.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ioat: fix tasklet tear down

2014-02-20 Thread Dan Williams
On Thu, Feb 20, 2014 at 2:30 AM, Thomas Gleixner t...@linutronix.de wrote:
 B1;3202;0cOn Wed, 19 Feb 2014, Dan Williams wrote:

 Since commit 77873803363c net_dma: mark broken we no longer pin dma
 engines active for the network-receive-offload use case.  As a result
 the -free_chan_resources() that occurs after the driver self-test no
 longer has a NET_DMA induced -alloc_chan_resources() to back it up.  A
 late firing irq can lead to ksoftirqd spinning indefinitely due to the
 tasklet_disable() performed by -free_chan_resources().  Only
 -alloc_chan_resources() can clear this condition in affected kernels.

 This problem has been present since commit 3e037454bcfa I/OAT: Add
 support for MSI and MSI-X in 2.6.24, but is now exposed. Given the
 NET_DMA use case is deprecated we can revisit moving the driver to use
 threaded irqs.  For now, just tear down the irq and tasklet properly by:

 Right, moving to threaded irqs would get rid of the whole tasklet
 mess.

 1/ Disable the irq from triggering the tasklet

 2/ Disable the irq from re-arming

 3/ Flush inflight interrupts

 4/ Flush the timer

 5/ Flush inflight tasklets

 References:
 https://lkml.org/lkml/2014/1/27/282
 https://lkml.org/lkml/2014/2/19/672

 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@elte.hu
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: sta...@vger.kernel.org
 Reported-by: Mike Galbraith bitbuc...@online.de
 Reported-by: Stanislav Fomichev stfomic...@yandex-team.ru
 Signed-off-by: Dan Williams dan.j.willi...@intel.com

 Reviewed-by: Thomas Gleixner t...@linutronix.de

 As Mike pointed out tsi721_free_chan_resources() has the same issue.

 I did a quick scan of all tasklet_disable() sites. The teardown or
 similar wreckage is available in:

 drivers/atm/he.c
 drivers/dma/at_hdmac.c
 drivers/dma/pch_dma.c
 drivers/input/keyboard/omap-keypad.c
 drivers/isdn/gigaset/interface.c
 drivers/media/pci/mantis/mantis_dvb.c
 drivers/mmc/host/s3cmci.c
 drivers/net/ethernet/jme.c
 drivers/net/ethernet/silan/sc92031.c
 drivers/net/usb/r8152.c
 drivers/net/wireless/mwl8k.c
 drivers/ntb/ntb_hw.c
 drivers/rapidio/devices/tsi721_dma.c
 drivers/s390/crypto/ap_bus.c
 drivers/spi/spi-pl022.c
 drivers/staging/cxt1e1/linux.c
 drivers/staging/ozwpan/ozhcd.c
 drivers/usb/gadget/fsl_qe_udc.c

 That's 18 of 30 usage sites. Impressive

 We need to poke the relevant maintainers to get this solved.


Maybe also rename tasklet_disable() to tasklet_pause() to make it
clearer this isn't the tasklet cleanup routine you're looking for?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ioat: fix tasklet tear down

2014-02-19 Thread Dan Williams
Since commit 77873803363c "net_dma: mark broken" we no longer pin dma
engines active for the network-receive-offload use case.  As a result
the ->free_chan_resources() that occurs after the driver self-test no
longer has a NET_DMA induced ->alloc_chan_resources() to back it up.  A
late firing irq can lead to ksoftirqd spinning indefinitely due to the
tasklet_disable() performed by ->free_chan_resources().  Only
->alloc_chan_resources() can clear this condition in affected kernels.

This problem has been present since commit 3e037454bcfa "I/OAT: Add
support for MSI and MSI-X" in 2.6.24, but is now exposed. Given the
NET_DMA use case is deprecated we can revisit moving the driver to use
threaded irqs.  For now, just tear down the irq and tasklet properly by:

1/ Disable the irq from triggering the tasklet

2/ Disable the irq from re-arming

3/ Flush inflight interrupts

4/ Flush the timer

5/ Flush inflight tasklets

References:
https://lkml.org/lkml/2014/1/27/282
https://lkml.org/lkml/2014/2/19/672

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Steven Rostedt 
Cc: 
Reported-by: Mike Galbraith 
Reported-by: Stanislav Fomichev 
Signed-off-by: Dan Williams 
---

Passes my testing, but would appreciate a tested-by.

 drivers/dma/ioat/dma.c|   52 +++--
 drivers/dma/ioat/dma.h|1 +
 drivers/dma/ioat/dma_v2.c |   11 --
 drivers/dma/ioat/dma_v3.c |3 +++
 4 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index 87529181efcc..4e3549a16132 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -77,7 +77,8 @@ static irqreturn_t ioat_dma_do_interrupt(int irq, void *data)
attnstatus = readl(instance->reg_base + IOAT_ATTNSTATUS_OFFSET);
for_each_set_bit(bit, , BITS_PER_LONG) {
chan = ioat_chan_by_index(instance, bit);
-   tasklet_schedule(>cleanup_task);
+   if (test_bit(IOAT_RUN, >state))
+   tasklet_schedule(>cleanup_task);
}
 
writeb(intrctrl, instance->reg_base + IOAT_INTRCTRL_OFFSET);
@@ -93,7 +94,8 @@ static irqreturn_t ioat_dma_do_interrupt_msix(int irq, void 
*data)
 {
struct ioat_chan_common *chan = data;
 
-   tasklet_schedule(>cleanup_task);
+   if (test_bit(IOAT_RUN, >state))
+   tasklet_schedule(>cleanup_task);
 
return IRQ_HANDLED;
 }
@@ -116,7 +118,6 @@ void ioat_init_channel(struct ioatdma_device *device, 
struct ioat_chan_common *c
chan->timer.function = device->timer_fn;
chan->timer.data = data;
tasklet_init(>cleanup_task, device->cleanup_fn, data);
-   tasklet_disable(>cleanup_task);
 }
 
 /**
@@ -354,13 +355,49 @@ static int ioat1_dma_alloc_chan_resources(struct dma_chan 
*c)
writel(((u64) chan->completion_dma) >> 32,
   chan->reg_base + IOAT_CHANCMP_OFFSET_HIGH);
 
-   tasklet_enable(>cleanup_task);
+   set_bit(IOAT_RUN, >state);
ioat1_dma_start_null_desc(ioat);  /* give chain to dma device */
dev_dbg(to_dev(chan), "%s: allocated %d descriptors\n",
__func__, ioat->desccount);
return ioat->desccount;
 }
 
+void ioat_stop(struct ioat_chan_common *chan)
+{
+   struct ioatdma_device *device = chan->device;
+   struct pci_dev *pdev = device->pdev;
+   int chan_id = chan_num(chan);
+   struct msix_entry *msix;
+
+   /* 1/ stop irq from firing tasklets
+* 2/ stop the tasklet from re-arming irqs
+*/
+   clear_bit(IOAT_RUN, >state);
+
+   /* flush inflight interrupts */
+   switch (device->irq_mode) {
+   case IOAT_MSIX:
+   msix = >msix_entries[chan_id];
+   synchronize_irq(msix->vector);
+   break;
+   case IOAT_MSI:
+   case IOAT_INTX:
+   synchronize_irq(pdev->irq);
+   break;
+   default:
+   break;
+   }
+
+   /* flush inflight timers */
+   del_timer_sync(>timer);
+
+   /* flush inflight tasklet runs */
+   tasklet_kill(>cleanup_task);
+
+   /* final cleanup now that everything is quiesced and can't re-arm */
+   device->cleanup_fn((unsigned long) >common);
+}
+
 /**
  * ioat1_dma_free_chan_resources - release all the descriptors
  * @chan: the channel to be cleaned
@@ -379,9 +416,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan 
*c)
if (ioat->desccount == 0)
return;
 
-   tasklet_disable(>cleanup_task);
-   del_timer_sync(>timer);
-   ioat1_cleanup(ioat);
+   ioat_stop(chan);
 
/* Delay 100ms after reset to allow internal DMA logic to quiesce
 * before removing DMA descriptor resources.
@@ -526,8 +561,11 @@ ioat1_dma_prep_memcpy(struct dma_chan *c, dma_addr_t 
dma_dest,
 static void ioat1_cleanup_event(unsigned long data)
 {
struct ioat_dma_chan *ioat = to_ioat_chan((void *) data);
+   struct ioat_chan_common 

[PATCH] ioat: fix tasklet tear down

2014-02-19 Thread Dan Williams
Since commit 77873803363c net_dma: mark broken we no longer pin dma
engines active for the network-receive-offload use case.  As a result
the -free_chan_resources() that occurs after the driver self-test no
longer has a NET_DMA induced -alloc_chan_resources() to back it up.  A
late firing irq can lead to ksoftirqd spinning indefinitely due to the
tasklet_disable() performed by -free_chan_resources().  Only
-alloc_chan_resources() can clear this condition in affected kernels.

This problem has been present since commit 3e037454bcfa I/OAT: Add
support for MSI and MSI-X in 2.6.24, but is now exposed. Given the
NET_DMA use case is deprecated we can revisit moving the driver to use
threaded irqs.  For now, just tear down the irq and tasklet properly by:

1/ Disable the irq from triggering the tasklet

2/ Disable the irq from re-arming

3/ Flush inflight interrupts

4/ Flush the timer

5/ Flush inflight tasklets

References:
https://lkml.org/lkml/2014/1/27/282
https://lkml.org/lkml/2014/2/19/672

Cc: Thomas Gleixner t...@linutronix.de
Cc: Ingo Molnar mi...@elte.hu
Cc: Steven Rostedt rost...@goodmis.org
Cc: sta...@vger.kernel.org
Reported-by: Mike Galbraith bitbuc...@online.de
Reported-by: Stanislav Fomichev stfomic...@yandex-team.ru
Signed-off-by: Dan Williams dan.j.willi...@intel.com
---

Passes my testing, but would appreciate a tested-by.

 drivers/dma/ioat/dma.c|   52 +++--
 drivers/dma/ioat/dma.h|1 +
 drivers/dma/ioat/dma_v2.c |   11 --
 drivers/dma/ioat/dma_v3.c |3 +++
 4 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index 87529181efcc..4e3549a16132 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -77,7 +77,8 @@ static irqreturn_t ioat_dma_do_interrupt(int irq, void *data)
attnstatus = readl(instance-reg_base + IOAT_ATTNSTATUS_OFFSET);
for_each_set_bit(bit, attnstatus, BITS_PER_LONG) {
chan = ioat_chan_by_index(instance, bit);
-   tasklet_schedule(chan-cleanup_task);
+   if (test_bit(IOAT_RUN, chan-state))
+   tasklet_schedule(chan-cleanup_task);
}
 
writeb(intrctrl, instance-reg_base + IOAT_INTRCTRL_OFFSET);
@@ -93,7 +94,8 @@ static irqreturn_t ioat_dma_do_interrupt_msix(int irq, void 
*data)
 {
struct ioat_chan_common *chan = data;
 
-   tasklet_schedule(chan-cleanup_task);
+   if (test_bit(IOAT_RUN, chan-state))
+   tasklet_schedule(chan-cleanup_task);
 
return IRQ_HANDLED;
 }
@@ -116,7 +118,6 @@ void ioat_init_channel(struct ioatdma_device *device, 
struct ioat_chan_common *c
chan-timer.function = device-timer_fn;
chan-timer.data = data;
tasklet_init(chan-cleanup_task, device-cleanup_fn, data);
-   tasklet_disable(chan-cleanup_task);
 }
 
 /**
@@ -354,13 +355,49 @@ static int ioat1_dma_alloc_chan_resources(struct dma_chan 
*c)
writel(((u64) chan-completion_dma)  32,
   chan-reg_base + IOAT_CHANCMP_OFFSET_HIGH);
 
-   tasklet_enable(chan-cleanup_task);
+   set_bit(IOAT_RUN, chan-state);
ioat1_dma_start_null_desc(ioat);  /* give chain to dma device */
dev_dbg(to_dev(chan), %s: allocated %d descriptors\n,
__func__, ioat-desccount);
return ioat-desccount;
 }
 
+void ioat_stop(struct ioat_chan_common *chan)
+{
+   struct ioatdma_device *device = chan-device;
+   struct pci_dev *pdev = device-pdev;
+   int chan_id = chan_num(chan);
+   struct msix_entry *msix;
+
+   /* 1/ stop irq from firing tasklets
+* 2/ stop the tasklet from re-arming irqs
+*/
+   clear_bit(IOAT_RUN, chan-state);
+
+   /* flush inflight interrupts */
+   switch (device-irq_mode) {
+   case IOAT_MSIX:
+   msix = device-msix_entries[chan_id];
+   synchronize_irq(msix-vector);
+   break;
+   case IOAT_MSI:
+   case IOAT_INTX:
+   synchronize_irq(pdev-irq);
+   break;
+   default:
+   break;
+   }
+
+   /* flush inflight timers */
+   del_timer_sync(chan-timer);
+
+   /* flush inflight tasklet runs */
+   tasklet_kill(chan-cleanup_task);
+
+   /* final cleanup now that everything is quiesced and can't re-arm */
+   device-cleanup_fn((unsigned long) chan-common);
+}
+
 /**
  * ioat1_dma_free_chan_resources - release all the descriptors
  * @chan: the channel to be cleaned
@@ -379,9 +416,7 @@ static void ioat1_dma_free_chan_resources(struct dma_chan 
*c)
if (ioat-desccount == 0)
return;
 
-   tasklet_disable(chan-cleanup_task);
-   del_timer_sync(chan-timer);
-   ioat1_cleanup(ioat);
+   ioat_stop(chan);
 
/* Delay 100ms after reset to allow internal DMA logic to quiesce
 * before removing DMA descriptor resources.
@@ -526,8 +561,11 @@