Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-06-22 Thread Fredrik Noring
Hi Robin, Christoph,

On Thu, Mar 15, 2018 at 01:11:03PM +, Robin Murphy wrote:
> On 15/03/18 07:58, Christoph Hellwig wrote:
> > On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote:
> > > > So maybe for now the quick fix is to move the sleep check as suggested
> > > > earlier in this thread, but in the long run we probably need to do some
> > > > major rework of how dma_declare_coherent_memory and friends work.
> > > 
> > > Maybe; I do think the specific hcd_alloc_coherent() case could still be
> > > fixed within the scope of the existing code, but it's not quite as clean
> > > and straightforward as I first thought, and the practical impact of
> > > tweaking the WARN should be effectively zero despite the theoretical edge
> > > cases it opens up. Do you want me to write it up as a proper patch?
> > 
> > Yes.  Including a proper comment on why the might_sleep is placed there.
> 
> OK, will do.

How is to going with moving the sleep check?

Another regression triggered by the OHCI turned up in v4.16 with commit
205e1b7f51e4af26 "dma-mapping: warn when there is no coherent_dma_mask"
by Christoph Hellwig:

[ cut here ]
WARNING: CPU: 0 PID: 62 at ./include/linux/dma-mapping.h:516 
ohci_setup+0x41c/0x424 [ohci_hcd]
Modules linked in: ohci_ps2(+) ohci_hcd usbcore usb_common sd_mod iop iop_fio 
iop_module iop_memory sif
CPU: 0 PID: 62 Comm: modprobe Not tainted 4.16.0+ #1533
Stack :   80747392 0037 81c6eb0c 804f32e7 80493b24 003e
80743498 0204 0001 c01c 802a2fa0 10058c00 81ea5a68 804facc0
  8074 0007  0060  
3a6d6d6f  005f 646f6d20 8000  c01e66e8 c01e813c
0009 0204 0001 c01c 0018 80278fe0 0007579f 0001
...
Call Trace:
[<8001d6e4>] show_stack+0x74/0x104
[<800323a8>] __warn+0x118/0x120
[<8003246c>] warn_slowpath_null+0x44/0x58
[] ohci_setup+0x41c/0x424 [ohci_hcd]
[] ohci_ps2_reset+0x30/0x70 [ohci_ps2]
[] usb_add_hcd+0x2d4/0x89c [usbcore]
[] ohci_hcd_ps2_probe+0x284/0x2a4 [ohci_ps2]
[<802a8a74>] platform_drv_probe+0x2c/0x68
[<802a70b4>] driver_probe_device+0x22c/0x2e4
[<802a71f0>] __driver_attach+0x84/0xc8
[<802a53fc>] bus_for_each_dev+0x60/0x90
[<802a6580>] bus_add_driver+0x1b8/0x200
[<802a7980>] driver_register+0xc0/0x100
[<800106bc>] do_one_initcall+0x17c/0x190
[<800841f4>] do_init_module+0x74/0x1f0
[<80082f30>] load_module+0x1680/0x2044
[<80083adc>] SyS_finit_module+0xa0/0xb8
[<8002190c>] syscall_common+0x34/0x58
---[ end trace e71738b5fa6bf9aa ]---

Fredrik
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-15 Thread Robin Murphy

On 15/03/18 07:58, Christoph Hellwig wrote:

On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote:

Looking back I don't really understand why we even indirect the "classic"
per-device dma_declare_coherent_memory use case through the DMA API.


It certainly makes sense for devices which can exist in both shared-memory
and device-local-memory configurations, so the driver doesn't have to care
about the details (particularly on mobile SoCs where the 'local' memory
might just be a chunk of system RAM reserved by the bootloader, and it's
just a matter of different use-cases on identical hardware).


Well, the "classic" case for me is memory buffers in the device.  Setting
some memory aside, either in a global pool as now done for arm-nommu
or even per-device as on some ARM SOCs is different indeed.

As far as I can tell the few devices that use 'local' memory always
use that.


I was thinking mostly of GPUs, where the same IP core might be available 
in a discrete PCIe chip with its own GDDR controller and on-board local 
RAM, and also in an APU/SoC-type configuration reliant on the CPU memory 
bus. But I guess in practice those drivers are already well beyond the 
generic DMA API and into complicated explicitly-managed stuff like TTM.



It seems like a pretty different use case to me.  In the USB case we
also have the following additional twist in that it doesn't even need
the mapping to be coherent.


I'm pretty sure it does (in the sense that it needs to ensure the arch code
makes the mapping non-cacheable), otherwise I can't see how the bouncing
could work properly. I think the last bit of the comment above
hcd_alloc_coherent() is a bit misleading.


Well, if it isn't marked non-cacheable we'd have to do dma_cache_sync
operations for it.  Which would probably still be faster than non-cacheable
mappings.


Ah, I see, we crossed wires there - the *current* implementation 
definitely requires a coherent mapping, but in general you're right that 
there are other ways to solve this particular problem which wouldn't. 
This case really wants to be able to overload the streaming map/unmap 
calls to automatically bounce through device-local memory, but I'm sure 
that's not worth the complexity or effort in general :)



So maybe for now the quick fix is to move the sleep check as suggested
earlier in this thread, but in the long run we probably need to do some
major rework of how dma_declare_coherent_memory and friends work.


Maybe; I do think the specific hcd_alloc_coherent() case could still be
fixed within the scope of the existing code, but it's not quite as clean
and straightforward as I first thought, and the practical impact of
tweaking the WARN should be effectively zero despite the theoretical edge
cases it opens up. Do you want me to write it up as a proper patch?


Yes.  Including a proper comment on why the might_sleep is placed there.


OK, will do.


My mid-term plan was to actually remove the gfp flags argument from
the dma alloc routines as is creates more confusion than it helps.
I guess this means we'll at least need to introduce a DMA_ATTR_NON_BLOCK
or similar flag instead then unfortunately.


At least we already have DMA_ATTR_NO_WARN, which could be implemented 
consistently to clean up the existing __GFP_NOWARN users.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-15 Thread Christoph Hellwig
On Wed, Mar 14, 2018 at 05:43:46PM +, Robin Murphy wrote:
>> Looking back I don't really understand why we even indirect the "classic"
>> per-device dma_declare_coherent_memory use case through the DMA API.
>
> It certainly makes sense for devices which can exist in both shared-memory 
> and device-local-memory configurations, so the driver doesn't have to care 
> about the details (particularly on mobile SoCs where the 'local' memory 
> might just be a chunk of system RAM reserved by the bootloader, and it's 
> just a matter of different use-cases on identical hardware).

Well, the "classic" case for me is memory buffers in the device.  Setting
some memory aside, either in a global pool as now done for arm-nommu
or even per-device as on some ARM SOCs is different indeed.

As far as I can tell the few devices that use 'local' memory always
use that.

>> It seems like a pretty different use case to me.  In the USB case we
>> also have the following additional twist in that it doesn't even need
>> the mapping to be coherent.
>
> I'm pretty sure it does (in the sense that it needs to ensure the arch code 
> makes the mapping non-cacheable), otherwise I can't see how the bouncing 
> could work properly. I think the last bit of the comment above 
> hcd_alloc_coherent() is a bit misleading.

Well, if it isn't marked non-cacheable we'd have to do dma_cache_sync
operations for it.  Which would probably still be faster than non-cacheable
mappings.

>> So maybe for now the quick fix is to move the sleep check as suggested
>> earlier in this thread, but in the long run we probably need to do some
>> major rework of how dma_declare_coherent_memory and friends work.
>
> Maybe; I do think the specific hcd_alloc_coherent() case could still be 
> fixed within the scope of the existing code, but it's not quite as clean 
> and straightforward as I first thought, and the practical impact of 
> tweaking the WARN should be effectively zero despite the theoretical edge 
> cases it opens up. Do you want me to write it up as a proper patch?

Yes.  Including a proper comment on why the might_sleep is placed there.

My mid-term plan was to actually remove the gfp flags argument from
the dma alloc routines as is creates more confusion than it helps.
I guess this means we'll at least need to introduce a DMA_ATTR_NON_BLOCK
or similar flag instead then unfortunately.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-14 Thread Robin Murphy

On 13/03/18 13:17, Christoph Hellwig wrote:

On Tue, Mar 13, 2018 at 12:11:49PM +, Robin Murphy wrote:

Taking a step back, though, provided the original rationale about
dma_declare_coherent_memory() is still valid, I wonder if we should simply
permit the USB code to call dma_{alloc,free}_from_dev_coherent() directly
here instead of being "good" and indirecting through the top-level DMA API
(which is the part which leads to problems). Given that it's a specific DMA
bounce buffer implementation within a core API, not just any old driver
code, I personally would consider that reasonable.


Looking back I don't really understand why we even indirect the "classic"
per-device dma_declare_coherent_memory use case through the DMA API.


It certainly makes sense for devices which can exist in both 
shared-memory and device-local-memory configurations, so the driver 
doesn't have to care about the details (particularly on mobile SoCs 
where the 'local' memory might just be a chunk of system RAM reserved by 
the bootloader, and it's just a matter of different use-cases on 
identical hardware).



It seems like a pretty different use case to me.  In the USB case we
also have the following additional twist in that it doesn't even need
the mapping to be coherent.


I'm pretty sure it does (in the sense that it needs to ensure the arch 
code makes the mapping non-cacheable), otherwise I can't see how the 
bouncing could work properly. I think the last bit of the comment above 
hcd_alloc_coherent() is a bit misleading.



So maybe for now the quick fix is to move the sleep check as suggested
earlier in this thread, but in the long run we probably need to do some
major rework of how dma_declare_coherent_memory and friends work.


Maybe; I do think the specific hcd_alloc_coherent() case could still be 
fixed within the scope of the existing code, but it's not quite as clean 
and straightforward as I first thought, and the practical impact of 
tweaking the WARN should be effectively zero despite the theoretical 
edge cases it opens up. Do you want me to write it up as a proper patch?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-13 Thread Christoph Hellwig
On Tue, Mar 13, 2018 at 12:11:49PM +, Robin Murphy wrote:
> Taking a step back, though, provided the original rationale about 
> dma_declare_coherent_memory() is still valid, I wonder if we should simply 
> permit the USB code to call dma_{alloc,free}_from_dev_coherent() directly 
> here instead of being "good" and indirecting through the top-level DMA API 
> (which is the part which leads to problems). Given that it's a specific DMA 
> bounce buffer implementation within a core API, not just any old driver 
> code, I personally would consider that reasonable.

Looking back I don't really understand why we even indirect the "classic"
per-device dma_declare_coherent_memory use case through the DMA API.

It seems like a pretty different use case to me.  In the USB case we
also have the following additional twist in that it doesn't even need
the mapping to be coherent.

So maybe for now the quick fix is to move the sleep check as suggested
earlier in this thread, but in the long run we probably need to do some
major rework of how dma_declare_coherent_memory and friends work.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-13 Thread Robin Murphy

On 11/03/18 18:01, Fredrik Noring wrote:

Hi Christoph,


The point is that you should always use a pool, period.
dma_alloc*/dma_free* are fundamentally expensive operations on my
architectures, so if you call them from a fast path you are doing
something wrong.


The author's comment in commit b3476675320e "usb: dma bounce buffer support"
seems to suggest that the greatest concern is space efficiency as opposed to
speed. I tried to evaluate strict pool allocations, similar to the patch
below, but that didn't turn out as I expected.

I chose a 64 KiB pool maximum since it has been the largest requested size I
have observed in USB traces (which may not hold in general, of course). This
change caused the USB mass storage driver to get stuck in some kind of memory
deadlock, with endless busy-looping on 64 KiB allocation failures.

I also tried a progression of pool sizes including nonpowers of two, for
example 12288, to make better use of the 256 KiB memory capacity. However,
the following part of dma_pool_create in linux/mm/dmapool.c is somewhat
puzzling:

 if (!boundary)
 boundary = allocation;
 else if ((boundary < size) || (boundary & (boundary - 1)))
 return NULL;

Is the boundary variable required to be a power of two only when it is
explicitly given as nonzero?


I would think so; realistically, the notion of non-power-of-two 
boundaries doesn't make a great deal of sense. IIRC, XHCI has a 64KB 
boundary limitation, but [OE]HCI don't (as far as I could see from the 
specs).




Fredrik

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index 77eef8acff94..8cc8fbc91c76 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -26,7 +26,7 @@

  /* FIXME tune these based on pool statistics ... */
  static size_t pool_max[HCD_BUFFER_POOLS] = {
-   32, 128, 512, 2048,
+   32, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536
  };

  void __init usb_init_pool_max(void)
@@ -76,7 +76,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
 continue;
 snprintf(name, sizeof(name), "buffer-%d", size);
 hcd->pool[i] = dma_pool_create(name, hcd->self.sysdev,
-   size, size, 0);
+   size, min_t(size_t, 4096, size), 0);
 if (!hcd->pool[i]) {
 hcd_buffer_destroy(hcd);
 return -ENOMEM;
@@ -140,7 +140,7 @@ void *hcd_buffer_alloc(
 if (size <= pool_max[i])
 return dma_pool_alloc(hcd->pool[i], mem_flags, dma);
 }
-   return dma_alloc_coherent(hcd->self.sysdev, size, dma, mem_flags);


Taking a step back, though, provided the original rationale about 
dma_declare_coherent_memory() is still valid, I wonder if we should 
simply permit the USB code to call dma_{alloc,free}_from_dev_coherent() 
directly here instead of being "good" and indirecting through the 
top-level DMA API (which is the part which leads to problems). Given 
that it's a specific DMA bounce buffer implementation within a core API, 
not just any old driver code, I personally would consider that reasonable.


Robin.


+   return NULL;
  }

  void hcd_buffer_free(
@@ -169,5 +169,5 @@ void hcd_buffer_free(
 return;
 }
 }
-   dma_free_coherent(hcd->self.sysdev, size, addr, dma);
+   BUG();  /* The buffer could not have been allocated. */
  }
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index 176900528822..2075f1e22e32 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -189,7 +189,7 @@ struct usb_hcd {
 struct usb_hcd  *primary_hcd;


-#define HCD_BUFFER_POOLS   4
+#define HCD_BUFFER_POOLS   11
 struct dma_pool *pool[HCD_BUFFER_POOLS];

 int state;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-11 Thread Fredrik Noring
Hi Christoph,

> The point is that you should always use a pool, period.
> dma_alloc*/dma_free* are fundamentally expensive operations on my
> architectures, so if you call them from a fast path you are doing
> something wrong.

The author's comment in commit b3476675320e "usb: dma bounce buffer support"
seems to suggest that the greatest concern is space efficiency as opposed to
speed. I tried to evaluate strict pool allocations, similar to the patch
below, but that didn't turn out as I expected.

I chose a 64 KiB pool maximum since it has been the largest requested size I
have observed in USB traces (which may not hold in general, of course). This
change caused the USB mass storage driver to get stuck in some kind of memory
deadlock, with endless busy-looping on 64 KiB allocation failures.

I also tried a progression of pool sizes including nonpowers of two, for
example 12288, to make better use of the 256 KiB memory capacity. However,
the following part of dma_pool_create in linux/mm/dmapool.c is somewhat
puzzling:

if (!boundary)
boundary = allocation;
else if ((boundary < size) || (boundary & (boundary - 1)))
return NULL;

Is the boundary variable required to be a power of two only when it is
explicitly given as nonzero?

Fredrik

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index 77eef8acff94..8cc8fbc91c76 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -26,7 +26,7 @@
 
 /* FIXME tune these based on pool statistics ... */
 static size_t pool_max[HCD_BUFFER_POOLS] = {
-   32, 128, 512, 2048,
+   32, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536
 };
 
 void __init usb_init_pool_max(void)
@@ -76,7 +76,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
continue;
snprintf(name, sizeof(name), "buffer-%d", size);
hcd->pool[i] = dma_pool_create(name, hcd->self.sysdev,
-   size, size, 0);
+   size, min_t(size_t, 4096, size), 0);
if (!hcd->pool[i]) {
hcd_buffer_destroy(hcd);
return -ENOMEM;
@@ -140,7 +140,7 @@ void *hcd_buffer_alloc(
if (size <= pool_max[i])
return dma_pool_alloc(hcd->pool[i], mem_flags, dma);
}
-   return dma_alloc_coherent(hcd->self.sysdev, size, dma, mem_flags);
+   return NULL;
 }
 
 void hcd_buffer_free(
@@ -169,5 +169,5 @@ void hcd_buffer_free(
return;
}
}
-   dma_free_coherent(hcd->self.sysdev, size, addr, dma);
+   BUG();  /* The buffer could not have been allocated. */
 }
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index 176900528822..2075f1e22e32 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -189,7 +189,7 @@ struct usb_hcd {
struct usb_hcd  *primary_hcd;
 
 
-#define HCD_BUFFER_POOLS   4
+#define HCD_BUFFER_POOLS   11
struct dma_pool *pool[HCD_BUFFER_POOLS];
 
int state;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-05 Thread Christoph Hellwig
On Sat, Mar 03, 2018 at 07:19:06PM +0100, Fredrik Noring wrote:
> Christoph, Alan,
> 
> > If it is allocating / freeing this memory all the time in the hot path
> > it should really use a dma pool (see include/ilinux/dmapool.h).
> > The dma coherent APIs aren't really built for being called in the
> > hot path.
> 
> hcd_buffer_free uses a combination of dma pools and dma coherent APIs:
> 
>   ...
>   for (i = 0; i < HCD_BUFFER_POOLS; i++) {
>   if (size <= pool_max[i]) {
>   dma_pool_free(hcd->pool[i], addr, dma);
>   return;
>   }
>   }
>   dma_free_coherent(hcd->self.sysdev, size, addr, dma);
> 
> Alan, can dma_free_coherent be delayed to a point when IRQs are enabled?

The point is that you should always use a pool, period.
dma_alloc*/dma_free* are fundamentally expensive operations on my
architectures, so if you call them from a fast path you are doing
something wrong.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-04 Thread Alan Stern
On Sun, 4 Mar 2018, Fredrik Noring wrote:

> Hi Alan Stern,
> 
> > > Alan, can dma_free_coherent be delayed to a point when IRQs are enabled?
> > 
> > Yes, subject to the usual concerns about not being delayed for too 
> > long.  Also, some HCDs are highly memory-constrained.  I don't know if 
> > they use this API, but if they do then delaying a free could result in 
> > not enough memory being available when an allocation is needed.
> 
> The PS2 HCD in this case is limited to 256 KiB and memory allocation
> failures can be observed frequently in USB traces (102 subsequent failures
> in the trace below involving mass storage transactions, for example). Is
> there any smartness to the allocations?

Not as far as I know.

>  My impression is that the USB core
> loops until it gets what it wants, and then happily resumes. Does it busy
> wait?

No, it doesn't loop and it doesn't busy-wait.  It just fails if memory 
can't be allocated immediately.  Maybe the higher-level driver has a 
retry loop of some sort.

> The RT3070 wireless USB device driver, for example, has hardcoded buffer
> limits that exceed the total amount of available memory. It refuses to
> accept devices unless adjusted in the source (as in the patch below), after
> which it works nicely.
> 
> Other USB device drivers such as the one for the AR9271 wireless device
> trigger endlessly repeating kernel warnings claiming
> 
>   BOGUS urb xfer, pipe 1 != type 3
> 
> as shown in this kernel backtrace:

...

> I don't know if this is related to memory limitations or some other problem
> though.

Another problem.  This message indicates there's a bug in the ath9k
driver; it says that the driver submitted an interrupt URB for a bulk
endpoint.

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-04 Thread Fredrik Noring
Hi Alan Stern,

> > Alan, can dma_free_coherent be delayed to a point when IRQs are enabled?
> 
> Yes, subject to the usual concerns about not being delayed for too 
> long.  Also, some HCDs are highly memory-constrained.  I don't know if 
> they use this API, but if they do then delaying a free could result in 
> not enough memory being available when an allocation is needed.

The PS2 HCD in this case is limited to 256 KiB and memory allocation
failures can be observed frequently in USB traces (102 subsequent failures
in the trace below involving mass storage transactions, for example). Is
there any smartness to the allocations? My impression is that the USB core
loops until it gets what it wants, and then happily resumes. Does it busy
wait?

The RT3070 wireless USB device driver, for example, has hardcoded buffer
limits that exceed the total amount of available memory. It refuses to
accept devices unless adjusted in the source (as in the patch below), after
which it works nicely.

Other USB device drivers such as the one for the AR9271 wireless device
trigger endlessly repeating kernel warnings claiming

BOGUS urb xfer, pipe 1 != type 3

as shown in this kernel backtrace:

[ cut here ]
WARNING: CPU: 0 PID: 15 at drivers/usb/core/urb.c:471 usb_submit_urb+0x280/0x528
usb 1-1: BOGUS urb xfer, pipe 1 != type 3
Modules linked in: ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 usbmon
CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: GW4.15.0+ #831
Workqueue: events request_firmware_work_func
Stack : 0009 01d7 0001 81714028 81714000 8006161c 80519af4 000f
805e3b08 01d7 80519a68 81cd3ad4 81714000 10058c00 81cd3aa8 80587340
  03f8 0007  03f8  
0040 0008  81ccd288   803036a8 8053e854
0009 01d7 0001 81714028 0008 ffc7  805e
...
Call Trace:
[<2318fb9b>] show_stack+0x74/0x104
[] __warn+0x118/0x120
[<01359d10>] warn_slowpath_fmt+0x30/0x3c
[] usb_submit_urb+0x280/0x528
[<89eb7b59>] hif_usb_send+0x3b0/0x3e0 [ath9k_htc]
[<51446a9c>] ath9k_wmi_cmd+0x194/0x228 [ath9k_htc]
[<2119badc>] ath9k_regread+0x5c/0x88 [ath9k_htc]
[] ath9k_hw_wait+0xa4/0xc0 [ath9k_hw]
[<368b3c01>] ath9k_hw_set_reset_reg+0x23c/0x288 [ath9k_hw]
[<28807ea6>] ath9k_hw_init+0x3e8/0xa24 [ath9k_hw]
[] ath9k_htc_probe_device+0x38c/0xa2c [ath9k_htc]
[] ath9k_htc_hw_init+0x20/0x4c [ath9k_htc]
[<6500cd86>] ath9k_hif_usb_firmware_cb+0x780/0x854 [ath9k_htc]
[] request_firmware_work_func+0x40/0x70
[<54f8834a>] process_one_work+0x1f4/0x358
[<23b84d12>] worker_thread+0x354/0x4b8
[<583bf61b>] kthread+0x134/0x13c
[<213b5c9e>] ret_from_kernel_thread+0x14/0x1c
---[ end trace 2a97c69dce30b650 ]---

I don't know if this is related to memory limitations or some other problem
though.

Fredrik

--- a/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
@@ -871,7 +871,7 @@ static void rt2800usb_queue_init(struct data_queue *queue)
 
switch (queue->qid) {
case QID_RX:
-   queue->limit = 128;
+   queue->limit = 8;
queue->data_size = AGGREGATION_SIZE;
queue->desc_size = RXINFO_DESC_SIZE;
queue->winfo_size = rxwi_size;
@@ -882,7 +882,7 @@ static void rt2800usb_queue_init(struct data_queue *queue)
case QID_AC_VI:
case QID_AC_BE:
case QID_AC_BK:
-   queue->limit = 16;
+   queue->limit = 8;
queue->data_size = AGGREGATION_SIZE;
queue->desc_size = TXINFO_DESC_SIZE;
queue->winfo_size = txwi_size;

...
81f58280 299082198 C Bi:1:003:1 0 13 = 55534253 ...
81f58280 299082510 S Bo:1:003:2 -150 31 = 55534243 ...
81f58280 299083200 C Bo:1:003:2 0 31 >
81f9aa00 299083298 S Bi:1:003:1 -150 65536 <
81f9a800 299093849 S Ci:1:002:0 s c0 07  1700 0004 4 <
81f9ab00 299095041 S Bi:1:003:1 -150 8192 <
81f9a800 299095229 C Ci:1:002:0 0 4 = 2d00
81f9a680 299096808 S Bi:1:003:1 -150 49152 <
81f9a680 299096838 E Bi:1:003:1 -12 0
81f9a680 299097868 S Bi:1:003:1 -150 49152 <
81f9a680 299097896 E Bi:1:003:1 -12 0
81f9a680 299098894 S Bi:1:003:1 -150 49152 <
... 97 allocation failures repeated ...
81f9a680 299151156 S Bi:1:003:1 -150 49152 <
81f9a680 299151166 E Bi:1:003:1 -12 0
81f9a680 299151179 S Bi:1:003:1 -150 49152 <
81f9a680 299151190 E Bi:1:003:1 -12 0
81f9a680 299151203 S Bi:1:003:1 -150 49152 <
81f9a680 299151216 E Bi:1:003:1 -12 0
81f9a680 299151231 S Bi:1:003:1 -150 49152 <
81f9a680 299159909 S Bi:1:003:1 -150 49152 <
81f9a680 299160814 S Bi:1:003:1 -150 49152 <
81f9a680 299161732 S Bi:1:003:1 -150 49152 <
81f9a680 299162644 S Bi:1:003:1 -150 49152 <
81f9a680 299163574 S Bi:1:003:1 -150 49152 <
81f9a680 299164490 S Bi:1:003:1 -150 49152 <
81f9aa00 299172593 C Bi:1:003:1 0 65536 = f7b07b39 ...
81f9ab00 299175259 C Bi:1:003:1 0 8192 = 26e2c254 

Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-04 Thread Alan Stern
On Sat, 3 Mar 2018, Fredrik Noring wrote:

> Christoph, Alan,
> 
> > If it is allocating / freeing this memory all the time in the hot path
> > it should really use a dma pool (see include/ilinux/dmapool.h).
> > The dma coherent APIs aren't really built for being called in the
> > hot path.
> 
> hcd_buffer_free uses a combination of dma pools and dma coherent APIs:
> 
>   ...
>   for (i = 0; i < HCD_BUFFER_POOLS; i++) {
>   if (size <= pool_max[i]) {
>   dma_pool_free(hcd->pool[i], addr, dma);
>   return;
>   }
>   }
>   dma_free_coherent(hcd->self.sysdev, size, addr, dma);
> 
> Alan, can dma_free_coherent be delayed to a point when IRQs are enabled?

Yes, subject to the usual concerns about not being delayed for too 
long.  Also, some HCDs are highly memory-constrained.  I don't know if 
they use this API, but if they do then delaying a free could result in 
not enough memory being available when an allocation is needed.

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-03 Thread Fredrik Noring
Christoph, Alan,

> If it is allocating / freeing this memory all the time in the hot path
> it should really use a dma pool (see include/ilinux/dmapool.h).
> The dma coherent APIs aren't really built for being called in the
> hot path.

hcd_buffer_free uses a combination of dma pools and dma coherent APIs:

...
for (i = 0; i < HCD_BUFFER_POOLS; i++) {
if (size <= pool_max[i]) {
dma_pool_free(hcd->pool[i], addr, dma);
return;
}
}
dma_free_coherent(hcd->self.sysdev, size, addr, dma);

Alan, can dma_free_coherent be delayed to a point when IRQs are enabled?

[ Links to previous messages on this topic are listed below. ]

Fredrik

https://www.spinics.net/lists/linux-usb/msg162817.html
https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026334.html
https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026335.html
https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026336.html
https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026337.html
https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026338.html
https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026339.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-03 Thread Christoph Hellwig
On Sat, Mar 03, 2018 at 09:22:35AM +0100, Fredrik Noring wrote:
> Critically, it performs the following calls:

If it is allocating / freeing this memory all the time in the hot path
it should really use a dma pool (see include/ilinux/dmapool.h).
The dma coherent APIs aren't really built for being called in the
hot path.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-03 Thread Fredrik Noring
Hi Christoph,

> Why do you want to free coherent dma allocations from irq context?
> They generally are a long-term resource that as a rule of thumb should
> be allocated in ->probe and freed in ->remove.

The device specific HCD only does dma_declare_coherent_memory (with
HCD_LOCAL_MEM) in ->probe, and dma_release_declared_memory in ->remove.

However, the generic USB core does a lot more:

https://lists.linuxfoundation.org/pipermail/iommu/2018-March/026338.html

Critically, it performs the following calls:

   usb_hcd_irq
-> ohci_irq
-> ohci_work
-> process_done_list
-> takeback_td
-> finish_urb
-> usb_hcd_giveback_urb
-> __usb_hcd_giveback_urb
-> unmap_urb_for_dma
-> usb_hcd_unmap_urb_for_dma
-> hcd_free_coherent
-> hcd_buffer_free
-> dma_free_coherent
-> dma_free_attrs

Could this be avoided in a reasonable way?

Fredrik
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-03 Thread Fredrik Noring
Hi Robin,

> Historically, that particular line of code appears to date back to commit
> aa24886e379d (and tracking it's ancestry was quite fun).
> 
> Now, I'm sure not all of the considerations of 11-and-a-half years ago still
> apply, but one certainly does: ARM* still uses non-cacheable mappings for
> coherent allocations on systems which aren't hardware-coherent (i.e. most of
> them), thus the alloc and free paths may respectively set up and tear down
> those mappings, and the latter involves this guy:
> 
>   void vunmap(const void *addr)
>   {
>   BUG_ON(in_interrupt());
>   might_sleep();
>   if (addr)
>   __vunmap(addr, 0);
>   }
> 
> Even in the non-coherent ARM case it *would* technically be viable to free a
> coherent buffer from interrupt context iff it was allocated with GFP_ATOMIC,
> as those are satisfied from a statically-mapped pool rather than dynamically
> vmapped, but that doesn't really expand to the general case, and I certainly
> can't speak for all the other architectures.

Ah, thanks, good to know!

> As Alan implies, I guess the way forward is to figure out how similar
> drivers manage - your backtrace suggests that you might be using
> HCD_LOCAL_MEM, which probably isn't the common case but does seem to appear
> in at least a couple of other OHCI drivers.

There are two other OHCI drivers that do (as far as I understand)
essentially the same thing with dma_declare_coherent_memory and
HCD_LOCAL_MEM:

drivers/usb/host/ohci-sm501.c
drivers/usb/host/ohci-tmio.c

They are simple, but I cannot see how they could possibly avoid this issue
given the design of the USB core and the offending call trace which does not
pass through the device specific HCD:

   usb_hcd_irq
-> ohci_irq
-> ohci_work
-> process_done_list
-> takeback_td
-> finish_urb
-> usb_hcd_giveback_urb
-> __usb_hcd_giveback_urb
-> unmap_urb_for_dma
-> usb_hcd_unmap_urb_for_dma
-> hcd_free_coherent
-> hcd_buffer_free
-> dma_free_coherent
-> dma_free_attrs

The hc_driver struct is set to defaults, and they don't manage DMA
allocations except for probe and remove. How could they avoid the WARN_ON?

[ For reference, I attached the PS2 OHCI driver below. It has been tested
and works well provided the WARN_ON in dma_free_attrs is removed. ]

Fredrik

/*
 * PlayStation 2 USB OHCI HCD (Host Controller Driver)
 *
 * Copyright (C) 2017 Jürgen Urban
 * Copyright (C) 2017 Fredrik Noring
 *
 * SPDX-License-Identifier: GPL-2.0
 */

#include 
#include 
#include 
#include 
#include 

#include 

#include "ohci.h"

/* Enable USB OHCI hardware. */
#define DPCR2_ENA_USB 0x0800

/* Enable PS2DEV (required for PATA and USB). */
#define DPCR2_ENA_PS2DEV 0x0080

#define DRIVER_DESC "OHCI PS2 driver"
#define DRV_NAME "ohci-ps2"

/* Size allocated from IOP heap (maximum size of DMA memory). */
#define DMA_BUFFER_SIZE (256 * 1024)

/* Get driver private data. */
#define hcd_to_priv(hcd) (struct ps2_hcd *)(hcd_to_ohci(hcd)->priv)

struct ps2_hcd {
void __iomem *dpcr2;
dma_addr_t iop_dma_addr;
bool wakeup;/* Saved wake-up state for resume. */
};

static struct hc_driver __read_mostly ohci_ps2_hc_driver;
static irqreturn_t (*ohci_irq)(struct usb_hcd *hcd);

static void ohci_ps2_enable(struct usb_hcd *hcd)
{
struct ohci_hcd *ohci = hcd_to_ohci(hcd);

BUG_ON(!ohci->regs);

/* This is needed to get USB working on PS2. */
ohci_writel(ohci, 1, >regs->roothub.portstatus[11]);
}

static void ohci_ps2_disable(struct usb_hcd *hcd)
{
struct ohci_hcd *ohci = hcd_to_ohci(hcd);

WARN_ON(!ohci->regs);

if (ohci->regs)
ohci_writel(ohci, 0, >regs->roothub.portstatus[11]);
}

static void ohci_ps2_start_hc(struct usb_hcd *hcd)
{
struct ps2_hcd *ps2priv = hcd_to_priv(hcd);

/*
 * Enable USB and PS2DEV.
 *
 * FIXME: What is the purpose of PS2DEV for USB?
 * FIXME: As far as I remember the following call enables the clock,
 * so that ohci->regs->fminterval can count.
 */
writel(readl(ps2priv->dpcr2) | DPCR2_ENA_USB | DPCR2_ENA_PS2DEV,
ps2priv->dpcr2);
}

static void ohci_ps2_stop_hc(struct usb_hcd *hcd)
{
struct ps2_hcd *ps2priv = hcd_to_priv(hcd);

/* Disable USB. Leave PS2DEV enabled (could be still needed for PATA). 
*/
writel(readl(ps2priv->dpcr2) & ~DPCR2_ENA_USB, ps2priv->dpcr2);
}

static int ohci_ps2_reset(struct usb_hcd *hcd)
{
int ret;

ohci_ps2_start_hc(hcd);

ret = ohci_setup(hcd);

if (ret < 0) {
ohci_ps2_stop_hc(hcd);
return ret;
}

ohci_ps2_enable(hcd);

return ret;
}

static irqreturn_t ohci_ps2_irq(struct usb_hcd *hcd)
{
struct ohci_hcd 

Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-02 Thread Christoph Hellwig
On Fri, Mar 02, 2018 at 07:55:48PM +, Robin Murphy wrote:
> On the other hand, HCD_LOCAL_MEM implies a per-device coherent pool. Since 
> those bypass the arch-specific code, then depending on how unlikely we 
> think it is for devices covered by a single driver to exist both with and 
> without their own memory then it *might* also be reasonable to make the 
> change below. Christoph, what do you reckon?

For the case where the coherent pool really is per-device this looks
technically ok.  But it would also cover cases like arm nommu where
we have a single global nommu pool and we wouldn't warn for a driver
that frees with irqs disabled.

That being said I think allowing coherent frees from irqs disabled
context is a very bad idea to do in general, so if we can at all I'd
rather fix the driver than lifting any restrictions.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-02 Thread Christoph Hellwig
On Fri, Mar 02, 2018 at 07:07:06PM +0100, Fredrik Noring wrote:
> What is the best way to proceed? Can dma_free_attrs be reworked to handle
> disabled IRQs?

Why do you want to free coherent dma allocations from irq context?
They generally are a long-term resource that as a rule of thumb should
be allocated in ->probe and freed in ->remove.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARN_ON(irqs_disabled()) in dma_free_attrs?

2018-03-02 Thread Robin Murphy

Hi Fredrik,

On 02/03/18 18:07, Fredrik Noring wrote:

Hello DMA mapping helpers maintainers,

I'm porting the PS2 OHCI driver to v4.15 and it triggers

WARN_ON(irqs_disabled());

in include/linux/dma-mapping.h:dma_free_attrs (trace below). USB maintainer
Alan Stern notes in

https://www.spinics.net/lists/linux-usb/msg162817.html

that

This is caused by a deficiency in the DMA core: dma_free_coherent()
wants interrupts to be enabled when it is called.  I'm not sure how
the other host controller drivers cope with this.

What is the best way to proceed? Can dma_free_attrs be reworked to handle
disabled IRQs?


Historically, that particular line of code appears to date back to 
commit aa24886e379d (and tracking it's ancestry was quite fun).


Now, I'm sure not all of the considerations of 11-and-a-half years ago 
still apply, but one certainly does: ARM* still uses non-cacheable 
mappings for coherent allocations on systems which aren't 
hardware-coherent (i.e. most of them), thus the alloc and free paths may 
respectively set up and tear down those mappings, and the latter 
involves this guy:


void vunmap(const void *addr)
{
BUG_ON(in_interrupt());
might_sleep();
if (addr)
__vunmap(addr, 0);
}

Even in the non-coherent ARM case it *would* technically be viable to 
free a coherent buffer from interrupt context iff it was allocated with 
GFP_ATOMIC, as those are satisfied from a statically-mapped pool rather 
than dynamically vmapped, but that doesn't really expand to the general 
case, and I certainly can't speak for all the other architectures.


As Alan implies, I guess the way forward is to figure out how similar 
drivers manage - your backtrace suggests that you might be using 
HCD_LOCAL_MEM, which probably isn't the common case but does seem to 
appear in at least a couple of other OHCI drivers.


On the other hand, HCD_LOCAL_MEM implies a per-device coherent pool. 
Since those bypass the arch-specific code, then depending on how 
unlikely we think it is for devices covered by a single driver to exist 
both with and without their own memory then it *might* also be 
reasonable to make the change below. Christoph, what do you reckon?


Robin.

*plus now arm64 and probably others too

->8-
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 34fe8463d10e..eb8e42ce89a0 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -542,11 +542,11 @@ static inline void dma_free_attrs(struct device 
*dev, size_t size,

const struct dma_map_ops *ops = get_dma_ops(dev);

BUG_ON(!ops);
-   WARN_ON(irqs_disabled());

if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr))
return;

+   WARN_ON(irqs_disabled());
if (!ops->free || !cpu_addr)
return;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu