Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-10-31 Thread Pandita, Vikram
On Thu, Sep 8, 2011 at 3:41 PM, Mark Salter msal...@redhat.com wrote:
 On Wed, 2011-08-31 at 20:35 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
  On 08/31/2011 12:51 PM, Will Deacon wrote:
   Another thing that Marc and I tried on OMAP4 was not bringing up the 
   secondary
   CPU during boot (by commenting out most of smp_init). In this case, I/O
   performance was good until we tried to online the secondary CPU. The 
   online
   failed but after that the I/O performance was certainly degraded.
  
 
  Was the SCU enabled at that point? One diff between nosmp boot and
  offlining the 2nd core would be that the SCU remains enabled in the
  latter case. I think the SCU does not get enabled for nosmp.

 Our rudimentary test (printing out the SCU control register during boot)
 showed that it *was* enabled for nosmp. I think this is due to the secure
 world having to do that on OMAP so it's probably not true for other
 platforms.

 I've done a little test and found that turning on the MMU of the second
 core causes the problem to show up. I patched head.S so I stopped the
 second core in an infinite loop just before turning on the MMU. The
 system continues booting on core#0 and I see ~20MB/s with hdparm -t to
 an attached usb disk. Same setup but with second core being stopped with
 infinite loop just after MMU is enabled shows ~5MB/s. So whatever is
 going wrong, its not because of anything the second core is doing beyond
 turning on its MMU and doing an empty loop.

what was the final take on the optimization?
Excuse if i could not follow the whole thread - could someone
summarize for the benefit of many.

Thanks


 --Mark




 ___
 linux-arm-kernel mailing list
 linux-arm-ker...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-08 Thread Mark Salter
On Wed, 2011-08-31 at 20:35 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
  On 08/31/2011 12:51 PM, Will Deacon wrote:
   Another thing that Marc and I tried on OMAP4 was not bringing up the 
   secondary
   CPU during boot (by commenting out most of smp_init). In this case, I/O
   performance was good until we tried to online the secondary CPU. The 
   online
   failed but after that the I/O performance was certainly degraded.
   
  
  Was the SCU enabled at that point? One diff between nosmp boot and
  offlining the 2nd core would be that the SCU remains enabled in the
  latter case. I think the SCU does not get enabled for nosmp.
 
 Our rudimentary test (printing out the SCU control register during boot)
 showed that it *was* enabled for nosmp. I think this is due to the secure
 world having to do that on OMAP so it's probably not true for other
 platforms.

I've done a little test and found that turning on the MMU of the second
core causes the problem to show up. I patched head.S so I stopped the
second core in an infinite loop just before turning on the MMU. The
system continues booting on core#0 and I see ~20MB/s with hdparm -t to
an attached usb disk. Same setup but with second core being stopped with
infinite loop just after MMU is enabled shows ~5MB/s. So whatever is
going wrong, its not because of anything the second core is doing beyond
turning on its MMU and doing an empty loop.

--Mark



--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-02 Thread Marc Zyngier
On 01/09/11 20:08, Stephen Warren wrote:
 Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
 I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test 
 on.
 
 Here are the results I found:
 
 Harmony:
 Tegra USB3 - SMSC9514 hub: NOT affected
 (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)
 
 Seaboard (springbank; clamshell):
 Tegra USB1 - no hub: Affected
 
 Seaboard (seaboard non-clamshell):
 Tegra USB1 - no hub: Affected
 Tegra USB3 - no hub: Affected
 
 TrimSlice:
 Tegra USB3 - unknown hub: Affected
 
 This implies there's something different about Harmony.
 
 Is the USB hub a clue? Seaboard doesn't have one, and although I don't
 know what model TrimSlice uses, I assume it's different since I know
 TrimSlice's Ethernet is not the same as Harmony's.

Panda has the exact same USB hub configuration, and is affected. So we
can rule this out.

 I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
 anything USB-related.
 
 Perhaps there's some kind of bootloader or BCT difference. However, my
 Harmony and both Seaboards both use (a very old) U-Boot and BCT from
 ChromeOS, so I don't imagine there's actually much difference there.

I just noticed something else. Harmony is fast *most of the time*. In
about one in 3 reboots, I get the slow behavior. When USB is fast, I
also have I2C interrupts screaming:

 85: 294321  0   GIC  tegra-i2c
116:  0  0   GIC  tegra-i2c
118:  98542  0   GIC  tps6586x

This is a couple of seconds after boot.

When USB is slow, I see the following:
[0.385270] tps6586x 3-0034: Chip ID read failed: -121
[0.390584] tps6586x: probe of 3-0034 failed with error -5

... and I2C interrupt is quiet.

The I2C interrupt handler calls writel(), which does a cache sync. That
would explain the fast behavior of Harmony.

Do you see the same this on your board?

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-02 Thread Marc Dietich
just another measurement point

 Stephen Warren wrote at Thursday:

 Here are the results I found:
 
 Harmony:
 Tegra USB3 - SMSC9514 hub: NOT affected
 (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change
 this)
 
 Seaboard (springbank; clamshell):
 Tegra USB1 - no hub: Affected
 
 Seaboard (seaboard non-clamshell):
 Tegra USB1 - no hub: Affected
 Tegra USB3 - no hub: Affected
 
 TrimSlice:
 Tegra USB3 - unknown hub: Affected

PAZ00:
ULPI - SMCS 2512: Affected
Tegra USB3 - SMSC 2514: Affected

The patch also cures high latencies/packet drops on wifi connected to ULPI via 
the 2512 hub. The pen drive was connected to USB3/2514.

 This implies there's something different about Harmony.
 
 Is the USB hub a clue? Seaboard doesn't have one, and although I don't
 know what model TrimSlice uses, I assume it's different since I know
 TrimSlice's Ethernet is not the same as Harmony's.
 
 I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
 anything USB-related.
 
 Perhaps there's some kind of bootloader or BCT difference. However, my
 Harmony and both Seaboards both use (a very old) U-Boot and BCT from
 ChromeOS, so I don't imagine there's actually much difference there.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-02 Thread Stephen Warren
Marc Zyngier wrote at Friday, September 02, 2011 3:51 AM:
 On 01/09/11 20:08, Stephen Warren wrote:
  Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
  I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test 
  on.
 
  Here are the results I found:
 
  Harmony:
  Tegra USB3 - SMSC9514 hub: NOT affected
  (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)
...
 I just noticed something else. Harmony is fast *most of the time*. In
 about one in 3 reboots, I get the slow behavior. When USB is fast, I
 also have I2C interrupts screaming:
 
  85: 294321  0   GIC  tegra-i2c
 116:  0  0   GIC  tegra-i2c
 118:  98542  0   GIC  tps6586x
 
 This is a couple of seconds after boot.
 
 When USB is slow, I see the following:
 [0.385270] tps6586x 3-0034: Chip ID read failed: -121
 [0.390584] tps6586x: probe of 3-0034 failed with error -5
 
 ... and I2C interrupt is quiet.
 
 The I2C interrupt handler calls writel(), which does a cache sync. That
 would explain the fast behavior of Harmony.
 
 Do you see the same this on your board?

Yes, I re-ran the test a few more times and see those exact same symptoms.

In a case with the screaming I2C interrupts and fast USB, I then did:

echo 3-0034  /sys/bus/i2c/drivers/tps6586x/unbind
(I got a kernel BUG and bash crashed here, but just logged back in)

which caused the I2C interrupt handler to stop, then re-ran the test.
I then saw the slow USB speed.

So, now I think *all* platforms(boards) are affected, right?

-- 
nvpublic

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-01 Thread Marc Zyngier
Hi Marc,

On 31/08/11 17:55, Marc Dietrich wrote:
 Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
 [...]
 Oddly enough, this patch doesn't do anything on my Tegra setup. In both
 cases, I get around 17MB/s from a crap SD card plugged in a USB reader.

 This leads me to suspect that this issue is very much OMAP4 specific.
 Can anyone verify this theory on other some A9 platforms?
 
 That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to 
 17 MB/s. 

I'm using a Harmony board. Could you share your kernel version, .config
and dmesg?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-01 Thread Marc Dietich
 Hi Marc,

^dito,

 On 31/08/11 17:55, Marc Dietrich wrote:
  Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
  [...]
  Oddly enough, this patch doesn't do anything on my Tegra setup. In both
  cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
  
  This leads me to suspect that this issue is very much OMAP4 specific.
  Can anyone verify this theory on other some A9 platforms?
  
  That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to
  17 MB/s.
 
 I'm using a Harmony board. Could you share your kernel version, .config
 and dmesg?
 
 Thanks,
 
   M.

I use the chromiumos tree (for chromiumos 2.6.38 kernel: 
http://git.chromium.org/chromiumos/third_party/kernel-next.git) with some 
additions to make it run on the AC100. This modified tree is on 
git://gitorious.org/~marvin24/ac100/marvin24s-kernel.git. The config is 
paz00_defconfig and a dmesg you can get e.g. from http://pastebin.com/9uVfDWma
(it's not very current, but it should be sufficient).

I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on. 

Btw, this is the patch I used: http://gitorious.org/~marvin24/ac100/marvin24s-
kernel/commit/cce8d9e25d009a45c219a6ad0b9ac4e27d034ab0

Marc
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-01 Thread Stephen Warren
Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
 I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.

Here are the results I found:

Harmony:
Tegra USB3 - SMSC9514 hub: NOT affected
(Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)

Seaboard (springbank; clamshell):
Tegra USB1 - no hub: Affected

Seaboard (seaboard non-clamshell):
Tegra USB1 - no hub: Affected
Tegra USB3 - no hub: Affected

TrimSlice:
Tegra USB3 - unknown hub: Affected

This implies there's something different about Harmony.

Is the USB hub a clue? Seaboard doesn't have one, and although I don't
know what model TrimSlice uses, I assume it's different since I know
TrimSlice's Ethernet is not the same as Harmony's.

I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
anything USB-related.

Perhaps there's some kind of bootloader or BCT difference. However, my
Harmony and both Seaboards both use (a very old) U-Boot and BCT from
ChromeOS, so I don't imagine there's actually much difference there.

-- 
nvpublic

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-09-01 Thread Grant Grundler
Mark Salter msalter at redhat.com writes:
 On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
...
  +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
  +static inline void ehci_sync_mem()
  +{
  +   mb();
  +}
  +#else
  +static inline void ehci_sync_mem()
  +{
  +}
  +#endif
...

Consider moving the #ifdef inside the function. :)

 I'm wondering if this doesn't really belong in the DMA API for any
 future architectures that can't avoid prolonged write buffering to DMA
 coherent memory.

I suspect the semantics needed exist in mmiowb().  Honestly, I'm not an expert
of either ehci USB and ARM platforms but it looks that way to me after reading
the thread.

BTW, ChromiumOS has had a performance bug open for quite a while:
http://code.google.com/p/chromium-os/issues/detail?id=11503

Vince Palatin said he would try it.


mmiowb() was originally implemented for SGI SN2 (IA64/Altix) machines but
applies to any case where MMIO (any uncached writes really) and regular (ie
cached) memory writes need to be ordered.

The wmb() vs mmiowb() thread explains this pretty well:
http://www.mail-archive.com/linux-ia64@vger.kernel.org/msg03379.html


 IIUC, ARM mitigates this for most drivers by including
 an implicit write buffer flush in the mmio write routines.

The write flush is part of the MMIO writel() semantics AFAIK because MMIO writes
have to be strongly ordered (PCI requirement).

 This takes
 care of the drivers which write to a mmio device register after writing
 something to shared DMA memory. IIUC, this doesn't help ehci because the
 host controller is polling to see what the cpu writes to the shared
 memory.

Write flush in writel() does help for the not RUNNING - RUNNING state
transition (I'm looking at qh_link_async()). It doesn't help for already
RUNNING case.

 Other hardware which polls shared memory like that will likely
 have the same problem and could use buffer drain helpers as well.

Yup - agreed.

cheers,
grant


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Will Deacon
On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
 One question: why this write buffer issue did not happen at UP ARM V7 
 platform, whose dma buffer
 also uncache, but bufferable?

Which CPU was on this platform?

Will
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Chen Peter-B29397


Best Regard,
Peter Chen
 



On Aug 31, 2011, at 4:49 PM, Will Deacon wrote:

 On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
 One question: why this write buffer issue did not happen at UP ARM V7 
 platform, whose dma buffer
 also uncache, but bufferable?
 
 Which CPU was on this platform?

Cortex A8 UP system (freescale i.MX5x platform)
 
 Will
 


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Mark Salter
On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
  One question: why this write buffer issue did not happen at UP ARM V7 
  platform, whose dma buffer
  also uncache, but bufferable?
 
 Which CPU was on this platform?

Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
nosmp on the commandline, I see 20.3MB/s.

Can someone explain why nosmp would make such a difference?

--Mark


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Will Deacon
On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
 On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
  On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
   One question: why this write buffer issue did not happen at UP ARM V7 
   platform, whose dma buffer
   also uncache, but bufferable?
  
  Which CPU was on this platform?
 
 Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
 usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
 nosmp on the commandline, I see 20.3MB/s.
 
 Can someone explain why nosmp would make such a difference?

Oh gawd, that's horrible. I have a feeling it's probably a separate issue
though, caused by:

omap_modify_auxcoreboot0(0x200, 0xfdff);

in boot_secondary for OMAP. Unfortunately I have no idea what that line is
doing because it ends up talking to the secure monitor.

Will
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Mark Salter
On Wed, 2011-08-31 at 16:21 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
  On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
   On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
One question: why this write buffer issue did not happen at UP ARM V7 
platform, whose dma buffer
also uncache, but bufferable?
   
   Which CPU was on this platform?
  
  Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
  usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
  nosmp on the commandline, I see 20.3MB/s.
  
  Can someone explain why nosmp would make such a difference?
 
 Oh gawd, that's horrible. I have a feeling it's probably a separate issue
 though, caused by:
 
 omap_modify_auxcoreboot0(0x200, 0xfdff);
 
 in boot_secondary for OMAP. Unfortunately I have no idea what that line is
 doing because it ends up talking to the secure monitor.

Okay, I may poke around a bit with that to see I can get a better
understanding.

With the patched ehci-q.c, I see no noticeable difference between smp
and nosmp. Both get me around 23.5MB/s with my setup.

--Mark


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Marc Zyngier
On 31/08/11 16:27, Mark Salter wrote:
 On Wed, 2011-08-31 at 16:21 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
 On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
 One question: why this write buffer issue did not happen at UP ARM V7 
 platform, whose dma buffer
 also uncache, but bufferable?

 Which CPU was on this platform?

 Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
 usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
 nosmp on the commandline, I see 20.3MB/s.

 Can someone explain why nosmp would make such a difference?

 Oh gawd, that's horrible. I have a feeling it's probably a separate issue
 though, caused by:

 omap_modify_auxcoreboot0(0x200, 0xfdff);

 in boot_secondary for OMAP. Unfortunately I have no idea what that line is
 doing because it ends up talking to the secure monitor.
 
 Okay, I may poke around a bit with that to see I can get a better
 understanding.
 
 With the patched ehci-q.c, I see no noticeable difference between smp
 and nosmp. Both get me around 23.5MB/s with my setup.

Oddly enough, this patch doesn't do anything on my Tegra setup. In both
cases, I get around 17MB/s from a crap SD card plugged in a USB reader.

This leads me to suspect that this issue is very much OMAP4 specific.
Can anyone verify this theory on other some A9 platforms?

Cheers,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Marc Dietrich
Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
 [...]
 Oddly enough, this patch doesn't do anything on my Tegra setup. In both
 cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
 
 This leads me to suspect that this issue is very much OMAP4 specific.
 Can anyone verify this theory on other some A9 platforms?

That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to 
17 MB/s. 

Marc


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Nicolas Pitre
On Wed, 31 Aug 2011, Will Deacon wrote:

 On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
  On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
   On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
One question: why this write buffer issue did not happen at UP ARM V7 
platform, whose dma buffer
also uncache, but bufferable?
   
   Which CPU was on this platform?
  
  Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
  usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
  nosmp on the commandline, I see 20.3MB/s.
  
  Can someone explain why nosmp would make such a difference?
 
 Oh gawd, that's horrible. I have a feeling it's probably a separate issue
 though, caused by:
 
 omap_modify_auxcoreboot0(0x200, 0xfdff);
 
 in boot_secondary for OMAP. Unfortunately I have no idea what that line is
 doing because it ends up talking to the secure monitor.

Well, this issue is apparently affecting other ARMv9 implementations 
too.  In which case this code in arch/arm/mm/mmu.c could be responsible:

if (is_smp()) {
/*
 * Mark memory with the shared attribute
 * for SMP systems
 */
user_pgprot |= L_PTE_SHARED;
kern_pgprot |= L_PTE_SHARED;
vecs_pgprot |= L_PTE_SHARED;
mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
}

However I don't see the nosmp kernel argument having any effect on the 
result from is_smp().


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Rob Herring
On 08/31/2011 12:51 PM, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
 On Wed, 31 Aug 2011, Will Deacon wrote:

 On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
 On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
 On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
 One question: why this write buffer issue did not happen at UP ARM V7 
 platform, whose dma buffer
 also uncache, but bufferable?

 Which CPU was on this platform?

 Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
 usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
 nosmp on the commandline, I see 20.3MB/s.

 Can someone explain why nosmp would make such a difference?

 Oh gawd, that's horrible. I have a feeling it's probably a separate issue
 though, caused by:

 omap_modify_auxcoreboot0(0x200, 0xfdff);

 in boot_secondary for OMAP. Unfortunately I have no idea what that line is
 doing because it ends up talking to the secure monitor.

 Well, this issue is apparently affecting other ARMv9 implementations 
 too.  In which case this code in arch/arm/mm/mmu.c could be responsible:

 if (is_smp()) {
 /*
  * Mark memory with the shared attribute
  * for SMP systems
  */
 user_pgprot |= L_PTE_SHARED;
 kern_pgprot |= L_PTE_SHARED;
 vecs_pgprot |= L_PTE_SHARED;
 mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
 mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
 mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
 mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
 mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
 mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
 mem_types[MT_MEMORY_NONCACHED].prot_sect |= 
 PMD_SECT_S;
 mem_types[MT_MEMORY_NONCACHED].prot_pte |= 
 L_PTE_SHARED;
 }

 However I don't see the nosmp kernel argument having any effect on the 
 result from is_smp().
 
 Yes, the first thing that sprung to mind was the shared attribute, but like
 you say, that doesn't seem to be affected by the nosmp command line
 argument.
 
 Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
 CPU during boot (by commenting out most of smp_init). In this case, I/O
 performance was good until we tried to online the secondary CPU. The online
 failed but after that the I/O performance was certainly degraded.
 

Was the SCU enabled at that point? One diff between nosmp boot and
offlining the 2nd core would be that the SCU remains enabled in the
latter case. I think the SCU does not get enabled for nosmp.

Do we really know which write buffer the data is sitting? Some
experiments to only flush the L1 write buffer would be interesting.
Perhaps something executed on the 2nd core has a mb which doesn't help
for SMP because the other core's L1 write buffer is not flushed, but it
helps for nosmp because everything runs on 1 core and any occurrence of
a mb will flush all data out. I wouldn't expect the behavior to be so
consistent though. Could it be something is not visible to the other
core rather than not visible to the EHCI controller?

Rob
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-31 Thread Mark Salter
On Wed, 2011-08-31 at 13:49 -0500, Rob Herring wrote:
 An outer_sync will only drain the write buffer of the L2. It does not
 flush the cache though. If the write buffer does in fact keep data as
 long as possible (until it needs a free slot or the line is full), then
 long delays to write out data are certainly possible. The exact
 operation is not documented AFAIR.

Ah, thanks for that. I really haven't been paying close attention to
ARMv6/7 hardware and I'm in the process of playing catch up.

--Mark


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread ming . lei
From: Ming Lei ming@canonical.com

This patch introduces the helper of ehci_sync_mem to flush
qtd/qh into memory immediately on some ARM, so that HC can
see the up-to-date qtd/qh descriptor asap.

This patch fixs one performance bug on ARM Cortex A9 dual core
platform, which has been reported on quite a few ARM machines
(OMAP4, Tegra 2, snowball...), see details from link of
https://bugs.launchpad.net/bugs/709245.

The patch has been tested ok on OMAP4 panda A1 board, and the
performance of 'dd' over usb mass storage can be increased from
4~5MB/sec to 14~16MB/sec after applying this patch.

Cc: Alan Stern st...@rowland.harvard.edu
Cc: Russell King li...@arm.linux.org.uk
Signed-off-by: Ming Lei ming@canonical.com
---
 drivers/usb/host/ehci-q.c |   18 ++
 drivers/usb/host/ehci.h   |   17 +
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 0917e3a..2719879 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -995,6 +995,12 @@ static void qh_link_async (struct ehci_hcd *ehci, struct 
ehci_qh *qh)
head-qh_next.qh = qh;
head-hw-hw_next = dma;
 
+   /*
+* flush qh descriptor into memory immediately,
+* see comments in qh_append_tds.
+* */
+   ehci_sync_mem();
+
qh_get(qh);
qh-xacterrs = 0;
qh-qh_state = QH_STATE_LINKED;
@@ -1082,6 +1088,18 @@ static struct ehci_qh *qh_append_tds (
wmb ();
dummy-hw_token = token;
 
+   /*
+* Writing to dma coherent buffer on ARM may
+* be delayed to reach memory, so HC may not see
+* hw_token of dummy qtd in time, which can cause
+* the qtd transaction to be executed very late,
+* and degrade performance a lot. ehci_sync_mem
+* is added to flush 'token' immediatelly into
+* memory, so that ehci can execute the transaction
+* ASAP.
+* */
+   ehci_sync_mem();
+
urb-hcpriv = qh_get (qh);
}
}
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index cc7d337..313d9d6 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -738,6 +738,23 @@ static inline u32 hc32_to_cpup (const struct ehci_hcd 
*ehci, const __hc32 *x)
 
 #endif
 
+/*
+ * Writing to dma coherent memory on ARM may be delayed via L2
+ * writing buffer, so introduce the helper which can flush L2 writing
+ * buffer into memory immediately, especially used to flush ehci
+ * descriptor to memory.
+ * */
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+static inline void ehci_sync_mem()
+{
+   mb();
+}
+#else
+static inline void ehci_sync_mem()
+{
+}
+#endif
+
 /*-*/
 
 #ifndef DEBUG
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Alan Stern
On Wed, 31 Aug 2011 ming@canonical.com wrote:

 From: Ming Lei ming@canonical.com
 
 This patch introduces the helper of ehci_sync_mem to flush
 qtd/qh into memory immediately on some ARM, so that HC can
 see the up-to-date qtd/qh descriptor asap.
 
 This patch fixs one performance bug on ARM Cortex A9 dual core
 platform, which has been reported on quite a few ARM machines
 (OMAP4, Tegra 2, snowball...), see details from link of
 https://bugs.launchpad.net/bugs/709245.
 
 The patch has been tested ok on OMAP4 panda A1 board, and the
 performance of 'dd' over usb mass storage can be increased from
 4~5MB/sec to 14~16MB/sec after applying this patch.
 
 Cc: Alan Stern st...@rowland.harvard.edu
 Cc: Russell King li...@arm.linux.org.uk
 Signed-off-by: Ming Lei ming@canonical.com
 ---
  drivers/usb/host/ehci-q.c |   18 ++
  drivers/usb/host/ehci.h   |   17 +
  2 files changed, 35 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
 index 0917e3a..2719879 100644
 --- a/drivers/usb/host/ehci-q.c
 +++ b/drivers/usb/host/ehci-q.c
 @@ -995,6 +995,12 @@ static void qh_link_async (struct ehci_hcd *ehci, struct 
 ehci_qh *qh)
   head-qh_next.qh = qh;
   head-hw-hw_next = dma;
  
 + /*
 +  * flush qh descriptor into memory immediately,
 +  * see comments in qh_append_tds.
 +  * */

Comments are supposed to look like this:

/*
 * Blah blah blah
 * blah blah blah
 */

 + ehci_sync_mem();
 +
   qh_get(qh);
   qh-xacterrs = 0;
   qh-qh_state = QH_STATE_LINKED;
 @@ -1082,6 +1088,18 @@ static struct ehci_qh *qh_append_tds (
   wmb ();
   dummy-hw_token = token;
  
 + /*
 +  * Writing to dma coherent buffer on ARM may
 +  * be delayed to reach memory, so HC may not see
 +  * hw_token of dummy qtd in time, which can cause
 +  * the qtd transaction to be executed very late,
 +  * and degrade performance a lot. ehci_sync_mem
 +  * is added to flush 'token' immediatelly into
 +  * memory, so that ehci can execute the transaction
 +  * ASAP.
 +  * */

Here too.

 + ehci_sync_mem();
 +
   urb-hcpriv = qh_get (qh);
   }
   }
 diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
 index cc7d337..313d9d6 100644
 --- a/drivers/usb/host/ehci.h
 +++ b/drivers/usb/host/ehci.h
 @@ -738,6 +738,23 @@ static inline u32 hc32_to_cpup (const struct ehci_hcd 
 *ehci, const __hc32 *x)
  
  #endif
  
 +/*
 + * Writing to dma coherent memory on ARM may be delayed via L2
 + * writing buffer, so introduce the helper which can flush L2 writing
 + * buffer into memory immediately, especially used to flush ehci
 + * descriptor to memory.
 + * */

And here.

 +#ifdef   CONFIG_ARM_DMA_MEM_BUFFERABLE
 +static inline void ehci_sync_mem()
 +{
 + mb();
 +}
 +#else
 +static inline void ehci_sync_mem()
 +{
 +}
 +#endif
 +

Except for the formatting of the comments, this is fine.  When you fix 
up the comments, you can add:

Signed-off-by: Alan Stern st...@rowland.harvard.edu

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Mark Salter
On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
 +/*
 + * Writing to dma coherent memory on ARM may be delayed via L2
 + * writing buffer, so introduce the helper which can flush L2 writing
 + * buffer into memory immediately, especially used to flush ehci
 + * descriptor to memory.
 + * */
 +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
 +static inline void ehci_sync_mem()
 +{
 +   mb();
 +}
 +#else
 +static inline void ehci_sync_mem()
 +{
 +}
 +#endif
 +

I'm wondering if this doesn't really belong in the DMA API for any
future architectures that can't avoid prolonged write buffering to DMA
coherent memory. IIUC, ARM mitigates this for most drivers by including
an implicit write buffer flush in the mmio write routines. This takes
care of the drivers which write to a mmio device register after writing
something to shared DMA memory. IIUC, this doesn't help ehci because the
host controller is polling to see what the cpu writes to the shared
memory. Other hardware which polls shared memory like that will likely
have the same problem and could use buffer drain helpers as well.

--Mark


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Alan Stern
On Tue, 30 Aug 2011, Mark Salter wrote:

 On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
  +/*
  + * Writing to dma coherent memory on ARM may be delayed via L2
  + * writing buffer, so introduce the helper which can flush L2 writing
  + * buffer into memory immediately, especially used to flush ehci
  + * descriptor to memory.
  + * */
  +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
  +static inline void ehci_sync_mem()
  +{
  +   mb();
  +}
  +#else
  +static inline void ehci_sync_mem()
  +{
  +}
  +#endif
  +
 
 I'm wondering if this doesn't really belong in the DMA API for any
 future architectures that can't avoid prolonged write buffering to DMA
 coherent memory. IIUC, ARM mitigates this for most drivers by including
 an implicit write buffer flush in the mmio write routines. This takes
 care of the drivers which write to a mmio device register after writing
 something to shared DMA memory. IIUC, this doesn't help ehci because the
 host controller is polling to see what the cpu writes to the shared
 memory. Other hardware which polls shared memory like that will likely
 have the same problem and could use buffer drain helpers as well.

This would be a good thing to define centrally.  Would you like to 
post an RFC on LKML?

Do you know of any other examples of hardware that polls shared DMA 
memory?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Will Deacon
On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
 On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
  +/*
  + * Writing to dma coherent memory on ARM may be delayed via L2
  + * writing buffer, so introduce the helper which can flush L2 writing
  + * buffer into memory immediately, especially used to flush ehci
  + * descriptor to memory.
  + * */
  +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
  +static inline void ehci_sync_mem()
  +{
  +   mb();
  +}
  +#else
  +static inline void ehci_sync_mem()
  +{
  +}
  +#endif
  +
 
 I'm wondering if this doesn't really belong in the DMA API for any
 future architectures that can't avoid prolonged write buffering to DMA
 coherent memory. IIUC, ARM mitigates this for most drivers by including
 an implicit write buffer flush in the mmio write routines. This takes
 care of the drivers which write to a mmio device register after writing
 something to shared DMA memory. IIUC, this doesn't help ehci because the
 host controller is polling to see what the cpu writes to the shared
 memory. Other hardware which polls shared memory like that will likely
 have the same problem and could use buffer drain helpers as well.

Right. In this case the buffering is happening at L2 which becomes
noticeable when measuring performance. We also buffer stores at the
CPU (regardless of memory type) but because these tend to become visible
fairly quickly, there isn't a comparable performance problem.

Given that I would expect other architectures to buffer writes at the CPU,
would it not be worth having an API for flushing to L3 (devices)? It seems
like this would be a useful addition to the coherent DMA API on platforms
that handle coherency with non-cacheable memory attributes.

Will
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Greg KH
On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
 On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
  On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
   +/*
   + * Writing to dma coherent memory on ARM may be delayed via L2
   + * writing buffer, so introduce the helper which can flush L2 writing
   + * buffer into memory immediately, especially used to flush ehci
   + * descriptor to memory.
   + * */
   +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
   +static inline void ehci_sync_mem()
   +{
   +   mb();
   +}
   +#else
   +static inline void ehci_sync_mem()
   +{
   +}
   +#endif
   +
  
  I'm wondering if this doesn't really belong in the DMA API for any
  future architectures that can't avoid prolonged write buffering to DMA
  coherent memory. IIUC, ARM mitigates this for most drivers by including
  an implicit write buffer flush in the mmio write routines. This takes
  care of the drivers which write to a mmio device register after writing
  something to shared DMA memory. IIUC, this doesn't help ehci because the
  host controller is polling to see what the cpu writes to the shared
  memory. Other hardware which polls shared memory like that will likely
  have the same problem and could use buffer drain helpers as well.
 
 Right. In this case the buffering is happening at L2 which becomes
 noticeable when measuring performance. We also buffer stores at the
 CPU (regardless of memory type) but because these tend to become visible
 fairly quickly, there isn't a comparable performance problem.
 
 Given that I would expect other architectures to buffer writes at the CPU,
 would it not be worth having an API for flushing to L3 (devices)? It seems
 like this would be a useful addition to the coherent DMA API on platforms
 that handle coherency with non-cacheable memory attributes.

I agree, this seems to be a new type of barrier that is needed, as the
code comment above seems to go against what the kernel memory barrier
documentation says about what a memory barrier really does on the
hardware.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Will Deacon
On Tue, Aug 30, 2011 at 06:48:59PM +0100, Greg KH wrote:
 On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
  On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
   On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
+/*
+ * Writing to dma coherent memory on ARM may be delayed via L2
+ * writing buffer, so introduce the helper which can flush L2 writing
+ * buffer into memory immediately, especially used to flush ehci
+ * descriptor to memory.
+ * */
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+static inline void ehci_sync_mem()
+{
+   mb();
+}
+#else
+static inline void ehci_sync_mem()
+{
+}
+#endif
+
   
   I'm wondering if this doesn't really belong in the DMA API for any
   future architectures that can't avoid prolonged write buffering to DMA
   coherent memory. IIUC, ARM mitigates this for most drivers by including
   an implicit write buffer flush in the mmio write routines. This takes
   care of the drivers which write to a mmio device register after writing
   something to shared DMA memory. IIUC, this doesn't help ehci because the
   host controller is polling to see what the cpu writes to the shared
   memory. Other hardware which polls shared memory like that will likely
   have the same problem and could use buffer drain helpers as well.
  
  Right. In this case the buffering is happening at L2 which becomes
  noticeable when measuring performance. We also buffer stores at the
  CPU (regardless of memory type) but because these tend to become visible
  fairly quickly, there isn't a comparable performance problem.
  
  Given that I would expect other architectures to buffer writes at the CPU,
  would it not be worth having an API for flushing to L3 (devices)? It seems
  like this would be a useful addition to the coherent DMA API on platforms
  that handle coherency with non-cacheable memory attributes.
 
 I agree, this seems to be a new type of barrier that is needed, as the
 code comment above seems to go against what the kernel memory barrier
 documentation says about what a memory barrier really does on the
 hardware.

Although this doesn't have anything to do with ordering; it's all to do with
immediacy so I think we should try to avoiding using the term `barrier'. If
this can be made part of the coherent DMA API, that might be the best place
for it (I can't think of any other areas this is needed given that the
streaming DMA API and I/O accessors already deal with it).

Will
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Mark Salter
On Tue, 2011-08-30 at 13:15 -0400, Alan Stern wrote:
 On Tue, 30 Aug 2011, Mark Salter wrote:
 
  On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
   +/*
   + * Writing to dma coherent memory on ARM may be delayed via L2
   + * writing buffer, so introduce the helper which can flush L2 writing
   + * buffer into memory immediately, especially used to flush ehci
   + * descriptor to memory.
   + * */
   +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
   +static inline void ehci_sync_mem()
   +{
   +   mb();
   +}
   +#else
   +static inline void ehci_sync_mem()
   +{
   +}
   +#endif
   +
  
  I'm wondering if this doesn't really belong in the DMA API for any
  future architectures that can't avoid prolonged write buffering to DMA
  coherent memory. IIUC, ARM mitigates this for most drivers by including
  an implicit write buffer flush in the mmio write routines. This takes
  care of the drivers which write to a mmio device register after writing
  something to shared DMA memory. IIUC, this doesn't help ehci because the
  host controller is polling to see what the cpu writes to the shared
  memory. Other hardware which polls shared memory like that will likely
  have the same problem and could use buffer drain helpers as well.
 
 This would be a good thing to define centrally.  Would you like to 
 post an RFC on LKML?

Yes, I can take a stab at that.

 
 Do you know of any other examples of hardware that polls shared DMA 
 memory?

Not offhand nor after a quick search. I don't think it is a common
way of doing things.

--Mark


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Chen Peter-B29397


On Aug 31, 2011, at 1:54 AM, Will Deacon wrote:

 On Tue, Aug 30, 2011 at 06:48:59PM +0100, Greg KH wrote:
 On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
 On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
 On Wed, 2011-08-31 at 00:03 +0800, ming@canonical.com wrote:
 +/*
 + * Writing to dma coherent memory on ARM may be delayed via L2
 + * writing buffer, so introduce the helper which can flush L2 writing
 + * buffer into memory immediately, especially used to flush ehci
 + * descriptor to memory.
 + * */
 +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
 +static inline void ehci_sync_mem()
 +{
 +   mb();
 +}
 +#else
 +static inline void ehci_sync_mem()
 +{
 +}
 +#endif
 +
 
 I'm wondering if this doesn't really belong in the DMA API for any
 future architectures that can't avoid prolonged write buffering to DMA
 coherent memory. IIUC, ARM mitigates this for most drivers by including
 an implicit write buffer flush in the mmio write routines. This takes
 care of the drivers which write to a mmio device register after writing
 something to shared DMA memory. IIUC, this doesn't help ehci because the
 host controller is polling to see what the cpu writes to the shared
 memory. Other hardware which polls shared memory like that will likely
 have the same problem and could use buffer drain helpers as well.
 
 Right. In this case the buffering is happening at L2 which becomes
 noticeable when measuring performance. We also buffer stores at the
 CPU (regardless of memory type) but because these tend to become visible
 fairly quickly, there isn't a comparable performance problem.
 
 Given that I would expect other architectures to buffer writes at the CPU,
 would it not be worth having an API for flushing to L3 (devices)? It seems
 like this would be a useful addition to the coherent DMA API on platforms
 that handle coherency with non-cacheable memory attributes.
 
 I agree, this seems to be a new type of barrier that is needed, as the
 code comment above seems to go against what the kernel memory barrier
 documentation says about what a memory barrier really does on the
 hardware.
 
 Although this doesn't have anything to do with ordering; it's all to do with
 immediacy so I think we should try to avoiding using the term `barrier'. If
 this can be made part of the coherent DMA API, that might be the best place
 for it (I can't think of any other areas this is needed given that the
 streaming DMA API and I/O accessors already deal with it).
I am agree with you. I met the same issue at both usb device driver (adding 
next dTD pointer which
the current one is handling) and usb host driver (performance issue this thread 
have discussed) at
Freescale i.MX6Q platform (4 Cores, ARM SMP).
So, now I need to add two barriers at two different drivers.

One question: why this write buffer issue did not happen at UP ARM V7 platform, 
whose dma buffer
also uncache, but bufferable?

 
 Will
 --
 To unsubscribe from this list: send the line unsubscribe linux-usb in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Best Regard,
Peter Chen
 
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP

2011-08-30 Thread Ming Lei
On Wed, Aug 31, 2011 at 1:54 AM, Will Deacon will.dea...@arm.com wrote:

 Although this doesn't have anything to do with ordering; it's all to do with
 immediacy so I think we should try to avoiding using the term `barrier'. If
 this can be made part of the coherent DMA API, that might be the best place
 for it (I can't think of any other areas this is needed given that the
 streaming DMA API and I/O accessors already deal with it).

Agree too.


thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html