Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-07 Thread Albert Herranz
Alan Stern wrote:
On a 64-bit processor, some of the accesses will be 64 bits wide
instead of 32.  Does that matter for your purposes?


The wii uses a 32-bit processor, so this is safe in this case.

What about ohci-hcd and uhci-hcd?  They both use non-32-bit accesses to 
structures in coherent memory.


The wii has no uhci, but has 2 ohci controllers.
For ohci we need a similar approach as done for ehci.

If you do it as described above then the buffers you're worried about
won't be allocated in coherent memory to begin with, so no problems 
will arise.

It turns out that we have more limitations.
The wii has 2 discontiguous memory areas (usually called MEM1 and MEM2). I have 
checked that the ehci controller doesn't work properly when performing dma to 
buffers allocated in MEM1 (it corrupts part of the data) but has no problems if 
the buffers sit within MEM2.
So usb buffers will need to be bounced anyway if they are part of MEM1.

This worked in the original patch as buffers were always bounced to MEM2 
buffers. Sigh.

Alan Stern


Thanks,
Albert

PS: Your reply didn't get to me. I looked at the ML and found it (I'm not 
subscribed). Sorry for the late answer.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-07 Thread Alan Stern
On Sun, 7 Feb 2010, Albert Herranz wrote:

 The wii has no uhci, but has 2 ohci controllers.
 For ohci we need a similar approach as done for ehci.

So you'll need to write a patch splitting up the OHCI data structures 
in the same way the EHCI qh was split up.

 If you do it as described above then the buffers you're worried about
 won't be allocated in coherent memory to begin with, so no problems 
 will arise.
 
 It turns out that we have more limitations.
 The wii has 2 discontiguous memory areas (usually called MEM1 and MEM2). I 
 have checked that the ehci controller doesn't work properly when performing 
 dma to buffers allocated in MEM1 (it corrupts part of the data) but has no 
 problems if the buffers sit within MEM2.
 So usb buffers will need to be bounced anyway if they are part of MEM1.

This sounds like the sort of restriction that dma_map_single() should 
be capable of handling.

Alan Stern

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-07 Thread Albert Herranz
Alan Stern wrote:
 On Sun, 7 Feb 2010, Albert Herranz wrote:
 
 The wii has no uhci, but has 2 ohci controllers.
 For ohci we need a similar approach as done for ehci.
 
 So you'll need to write a patch splitting up the OHCI data structures 
 in the same way the EHCI qh was split up.
 

Yes.

 It turns out that we have more limitations.
 The wii has 2 discontiguous memory areas (usually called MEM1 and MEM2). I 
 have checked that the ehci controller doesn't work properly when performing 
 dma to buffers allocated in MEM1 (it corrupts part of the data) but has no 
 problems if the buffers sit within MEM2.
 So usb buffers will need to be bounced anyway if they are part of MEM1.
 
 This sounds like the sort of restriction that dma_map_single() should 
 be capable of handling.
 

On powerpc you can have per-device specific dma ops.
I'll work on that direction and create a special dma ops set for devices which 
need their dma buffers on mem2, and then use those for ehci-hlwd.

 Alan Stern
 

Thanks,
Albert

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-04 Thread Albert Herranz
Hi Alan,

Alan Stern wrote:
 This description sounds hopelessly confused.  Maybe you're just
 misusing the term coherent.  The patch itself doesn't affect the
 coherent DMA mappings anyway; it affects the streaming mappings.  Or to
 put it another way, what's the justification for replacing a call to
 dma_map_single() with a call to dma_alloc_coherent()?
 
 Since the patch doesn't affect any of the coherent mappings (see for 
 example the calls to dma_pool_create() in ehci-mem.c), I don't see how 
 it can possibly do what you claim.
 

Thanks for your comments. Let's try to hopefully clarify this a bit.

I've used the term coherent as described in Documentation/DMA-API.txt (aka 
consistent as used in PCI-related functions).
I've tried to describe first the limitations of the platform that I'm working 
on. Basically, one of the annoying things of that platform is that writes to 
uncached memory (as used in coherent memory) can only be reliably performed 
in 32-bit accesses.

The USB subsystem ends up using coherent memory for buffers and/or other 
structures in different ways.

The coherent memory allocated in dma_pool_create() in ehci-mem.c that you 
report is not a problem at all because it is always accessed in 32-bit chunks 
(it hasn't been always like that but since commit 
3807e26d69b9ad3864fe03224ebebc9610d5802e USB: EHCI: split ehci_qh into hw and 
sw parts this got addressed as a side effect, so I didn't need to post another 
patch for that).

Other possible interactions with coherent memory are those involving buffers 
used in USB transactions, which may be allocated via the USB subsystem (at 
usb_buffer_alloc() or when bounced via hcd_alloc_coherent()) or which may come 
already allocated and ready for use (URB_NO_{SETUP,TRANSFER}_DMA_MAP).

The patch, as posted, allocates normal memory for USB buffers _within_ the USB 
subsystem and invariably bounces all buffers to new coherent buffers.
So, basically, what the patch claims (avoid 32-bit writes for coherent memory 
within the USB subsystem) is done (hey, it actually works ;-).

But I think you have raised valid points here :)

If the coherent memory is already allocated and passed (as already 
dma-mapped) to the USB subsystem then there is no gain in bouncing the buffer:
- if a non-32 bit write was done to that coherent memory the damage is 
already done
- if the coherent memory was written always in 32-bit accesses then we can 
just safely use it
So bouncing here should be avoided as it is unneeded.

On the other hand, we can control USB buffers managed by the USB subsystem 
itself.
That's what the patch does. It avoids access restrictions to USB buffers by 
allocating them from normal memory (leaving USB drivers free to access those 
buffers in whatever bus width they need, as they do today) ... and bouncing 
them.
The thing here is that it makes no sense to bounce them to coherent memory if 
they can be dma-mapped directly (as you point in your 
dma_map_single-vs-dma_alloc_coherent comment).

So... that's what RFCs are for :)
I'll take a look again at the patch.

 +/**
 + * hcd_memcpy32_to_coherent - copy data to a bounce buffer
 + * @dst: destination dma bounce buffer
 + * @src: source buffer
 + * @len: number of bytes to copy
 + *
 + * This function copies @len bytes from @src to @dst in 32 bit chunks.
 + * The caller must guarantee that @dst length is 4 byte aligned and
 + * that @dst length is greater than or equal to @src length.
 + */
 +static void *hcd_memcpy32_to_coherent(void *dst, const void *src, size_t 
 len)
 +{
 +u32 *q = dst, *p = (void *)src;
 +u8 *s;
 +
 +while (len = 4) {
 +*q++ = *p++;
 +len -= 4;
 +}
 +s = (u8 *)p;
 +switch (len) {
 +case 3:
 +*q = s[0]  24 | s[1]  16 | s[2]  8;
 +break;
 +case 2:
 +*q = s[0]  24 | s[1]  16;
 +break;
 +case 1:
 +*q = s[0]  24;
 +break;
 +default:
 +break;
 +}
 +return dst;
 +}
 
 What happens if somebody tries to use this code on a little-endian CPU?
 

It will fail.


 It seems that every time somebody comes up with a new kind of 
 memory-access restriction, this function grows by a factor of 2.  After 
 a few more iterations it will be larger than the rest of the kernel!
 
 There must be a better way to structure the requirements here.
 

Hopefully I didn't miss any of your concerns and managed to explain the problem.

 Alan Stern
 
 

Thanks,
Albert

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-04 Thread Alan Stern
On Thu, 4 Feb 2010, Albert Herranz wrote:

 Hi Alan,
 
 Alan Stern wrote:
  This description sounds hopelessly confused.  Maybe you're just
  misusing the term coherent.  The patch itself doesn't affect the
  coherent DMA mappings anyway; it affects the streaming mappings.  Or to
  put it another way, what's the justification for replacing a call to
  dma_map_single() with a call to dma_alloc_coherent()?
  
  Since the patch doesn't affect any of the coherent mappings (see for 
  example the calls to dma_pool_create() in ehci-mem.c), I don't see how 
  it can possibly do what you claim.
  
 
 Thanks for your comments. Let's try to hopefully clarify this a bit.
 
 I've used the term coherent as described in Documentation/DMA-API.txt (aka 
 consistent as used in PCI-related functions).
 I've tried to describe first the limitations of the platform that I'm working 
 on. Basically, one of the annoying things of that platform is that writes to 
 uncached memory (as used in coherent memory) can only be reliably performed 
 in 32-bit accesses.
 
 The USB subsystem ends up using coherent memory for buffers and/or other 
 structures in different ways.
 
 The coherent memory allocated in dma_pool_create() in ehci-mem.c that you 
 report is not a problem at all because it is always accessed in 32-bit chunks 
 (it hasn't been always like that but since commit 
 3807e26d69b9ad3864fe03224ebebc9610d5802e USB: EHCI: split ehci_qh into hw 
 and sw parts this got addressed as a side effect, so I didn't need to post 
 another patch for that).

On a 64-bit processor, some of the accesses will be 64 bits wide
instead of 32.  Does that matter for your purposes?

What about ohci-hcd and uhci-hcd?  They both use non-32-bit accesses to 
structures in coherent memory.

 Other possible interactions with coherent memory are those involving 
 buffers used in USB transactions, which may be allocated via the USB 
 subsystem (at usb_buffer_alloc() or when bounced via hcd_alloc_coherent()) or 
 which may come already allocated and ready for use 
 (URB_NO_{SETUP,TRANSFER}_DMA_MAP).

Ah yes, quite correct.  And this indicates that you need to concentrate
on usb_buffer_alloc().  On your system (or rather, whenever the
HCD_NO_COHERENT_MEM flag is set) it should allocate normal memory and
set the DMA pointer to NULL.

Then map_urb_for_dma() should check the urb-setup_dma and
urb-transfer_dma pointers; if a pointer is NULL then the
corresponding urb-URB_NO_SETUP_DMA_MAP or urb-NO_TRANSFER_DMA_MAP
flag should be ignored (and probably should be cleared so as to 
avoid confusing unmap_urb_for_dma()).

 The patch, as posted, allocates normal memory for USB buffers _within_ the 
 USB subsystem and invariably bounces all buffers to new coherent buffers.
 So, basically, what the patch claims (avoid 32-bit writes for coherent 
 memory within the USB subsystem) is done (hey, it actually works ;-).
 
 But I think you have raised valid points here :)
 
 If the coherent memory is already allocated and passed (as already 
 dma-mapped) to the USB subsystem then there is no gain in bouncing the buffer:
 - if a non-32 bit write was done to that coherent memory the damage is 
 already done
 - if the coherent memory was written always in 32-bit accesses then we can 
 just safely use it
 So bouncing here should be avoided as it is unneeded.
 
 On the other hand, we can control USB buffers managed by the USB subsystem 
 itself.
 That's what the patch does. It avoids access restrictions to USB buffers by 
 allocating them from normal memory (leaving USB drivers free to access those 
 buffers in whatever bus width they need, as they do today) ... and bouncing 
 them.
 The thing here is that it makes no sense to bounce them to coherent memory 
 if they can be dma-mapped directly (as you point in your 
 dma_map_single-vs-dma_alloc_coherent comment).
 
 So... that's what RFCs are for :)

If you do it as described above then the buffers you're worried about
won't be allocated in coherent memory to begin with, so no problems 
will arise.

Alan Stern

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-03 Thread Albert Herranz
The HCD_BOUNCE_BUFFERS USB host controller driver flag can be enabled
to instruct the USB stack to always bounce USB buffers to/from coherent
memory buffers _just_ before/after a host controller transmission.

This setting allows overcoming some platform-specific limitations.

For example, the Nintendo Wii video game console is a NOT_COHERENT_CACHE
platform that is unable to safely perform non-32 bit uncached writes
to RAM because the byte enables are not connected to the bus.
Thus, in that platform, coherent DMA buffers cannot be directly used
by the kernel code unless it guarantees that all write accesses
to said buffers are done in 32 bit chunks (which is not the case in the
USB subsystem).

To avoid this unwanted behaviour HCD_BOUNCE_BUFFERS can be enabled at
the HCD controller, causing buffer allocations to be satisfied from
normal memory and, only at the very last moment, before the actual
transfer, buffers get copied to/from their corresponding DMA coherent
bounce buffers.

Note that HCD_LOCAL_MEM doesn't help in solving this problem as in that
case buffers may be allocated from coherent memory in the first place
and thus potentially accessed in non-32 bit chuncks by USB drivers.

Signed-off-by: Albert Herranz albert_herr...@yahoo.es
---
 drivers/usb/core/hcd.c |  337 +++
 drivers/usb/core/hcd.h |   13 +-
 2 files changed, 286 insertions(+), 64 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 80995ef..befca85 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1260,6 +1260,173 @@ static void hcd_free_coherent(struct usb_bus *bus, 
dma_addr_t *dma_handle,
*dma_handle = 0;
 }
 
+/*
+ * The HCD_BOUNCE_BUFFERS USB host controller driver flag can be enabled
+ * to instruct the USB stack to always bounce USB buffers to/from coherent
+ * memory buffers _just_ before/after a host controller transmission.
+ *
+ * This setting allows overcoming some platform-specific limitations.
+ *
+ * For example, the Nintendo Wii video game console is a NOT_COHERENT_CACHE
+ * platform that is unable to safely perform non-32 bit uncached writes
+ * to RAM because the byte enables are not connected to the bus.
+ * Thus, in that platform, coherent DMA buffers cannot be directly used
+ * by the kernel code unless it guarantees that all write accesses
+ * to said buffers are done in 32 bit chunks (which is not the case in the
+ * USB subsystem).
+ *
+ * To avoid this unwanted behaviour HCD_BOUNCE_BUFFERS can be enabled at
+ * the HCD controller, causing buffer allocations to be satisfied from
+ * normal memory and, only at the very last moment, before the actual
+ * transfer, buffers get copied to/from their corresponding DMA coherent
+ * bounce buffers.
+ *
+ * Note that HCD_LOCAL_MEM doesn't help in solving this problem as in that
+ * case buffers may be allocated from coherent memory in the first place
+ * and thus potentially accessed in non-32 bit chuncks by USB drivers.
+ *
+ */
+
+#define HCD_BOUNCE_ALIGN   4
+
+#define hcd_align_up(addr, size)   (((addr)+((size)-1))(~((size)-1)))
+
+struct hcd_coherent_buffer_ctx {
+   unsigned char *vaddr;
+   dma_addr_t dma_handle;
+};
+
+/**
+ * hcd_memcpy32_to_coherent - copy data to a bounce buffer
+ * @dst: destination dma bounce buffer
+ * @src: source buffer
+ * @len: number of bytes to copy
+ *
+ * This function copies @len bytes from @src to @dst in 32 bit chunks.
+ * The caller must guarantee that @dst length is 4 byte aligned and
+ * that @dst length is greater than or equal to @src length.
+ */
+static void *hcd_memcpy32_to_coherent(void *dst, const void *src, size_t len)
+{
+   u32 *q = dst, *p = (void *)src;
+   u8 *s;
+
+   while (len = 4) {
+   *q++ = *p++;
+   len -= 4;
+   }
+   s = (u8 *)p;
+   switch (len) {
+   case 3:
+   *q = s[0]  24 | s[1]  16 | s[2]  8;
+   break;
+   case 2:
+   *q = s[0]  24 | s[1]  16;
+   break;
+   case 1:
+   *q = s[0]  24;
+   break;
+   default:
+   break;
+   }
+   return dst;
+}
+
+/**
+ * hcd_memcpy32_from_coherent - copy data from a bounce buffer
+ * @dst: destination buffer
+ * @src: source dma bounce buffer
+ * @len: number of bytes to copy
+ *
+ * This function copies @len bytes from @src to @dst in 32 bit chunks.
+ * The caller must guarantee that @src length is 4 byte aligned and
+ * that @src length is greater than or equal to @dst length.
+ */
+static void *hcd_memcpy32_from_coherent(void *dst, const void *src, size_t len)
+{
+   u32 *q = dst, *p = (void *)src;
+   u32 v;
+   u8 *d;
+
+   while (len = 4) {
+   *q++ = *p++;
+   len -= 4;
+   }
+   if (len) {
+   d = (u8 *)q;
+   v = p[0];
+   switch (len) {
+   case 3:
+   d[2] = (v  8)  

Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-03 Thread Greg KH
On Wed, Feb 03, 2010 at 07:30:39PM +0100, Albert Herranz wrote:
 +/**
 + * hcd_memcpy32_to_coherent - copy data to a bounce buffer
 + * @dst: destination dma bounce buffer
 + * @src: source buffer
 + * @len: number of bytes to copy
 + *
 + * This function copies @len bytes from @src to @dst in 32 bit chunks.
 + * The caller must guarantee that @dst length is 4 byte aligned and
 + * that @dst length is greater than or equal to @src length.
 + */
 +static void *hcd_memcpy32_to_coherent(void *dst, const void *src, size_t len)

Why isn't there platform-specific functions for this already?  It seems
a bit odd to bury them in the USB hcd core, when I'm sure that other
people need these, if they haven't already created them.

thanks,

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev