Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-22 Thread Benjamin Herrenschmidt
On Tue, 2008-05-20 at 15:53 -0700, David Miller wrote: From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:43:58 -0500 David Miller wrote: The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-22 Thread Trent Piepho
On Fri, 23 May 2008, Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 15:53 -0700, David Miller wrote: From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:43:58 -0500 David Miller wrote: The __volatile__ in the asm construct disallows movement of the inline asm relative to

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Andreas Schwab
Trent Piepho [EMAIL PROTECTED] writes: On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Benjamin Herrenschmidt
Depends on what you define as necessary. It's seem clear that I/O accessors _no not_ need to be strictly ordered with respect to normal memory accesses, by what's defined in memory-barriers.txt. So if by necessary you mean what the Linux standard for I/O accessors requires (and what other

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Benjamin Herrenschmidt
On Tue, 2008-05-20 at 15:55 -0700, Trent Piepho wrote: here doesn't appear to be any barriers to use for coherent dma other than mb() and wmb(). Correct me if I'm wrong, but I think the sync isn't actually _required_ (by memory-barriers.txt's definitions), and it would be enough to use

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Trent Piepho
On Wed, 21 May 2008, Benjamin Herrenschmidt wrote: Depends on what you define as necessary. It's seem clear that I/O accessors _no not_ need to be strictly ordered with respect to normal memory accesses, by what's defined in memory-barriers.txt. So if by necessary you mean what the Linux

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Trent Piepho
On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Benjamin Herrenschmidt
On Wed, 2008-05-21 at 12:44 -0700, Trent Piepho wrote: Someone should update memory-barriers.txt, because it doesn't say that, and all I/O accessors for all the arches, because none of them are. There have been long discussions about that. The end result was that being too weakly ordered is

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Scott Wood
Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 13:40 -0700, Trent Piepho wrote: There was some discussion on a Freescale list if the powerpc I/O accessors should be strictly ordered w.r.t. normal memory. Currently they are not. It does not appear as if any other architecture's I/O

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Andreas Schwab
Trent Piepho [EMAIL PROTECTED] writes: For the LE versions, eventually they boil down to an asm that will look something like this: asm(sync; stwbrx %1,0,%2 : =m (*addr) : r (val), r (addr)); While not perfect, this appears to be the best one can do. The issue is that the stwbrx

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Benjamin Herrenschmidt
On Tue, 2008-05-20 at 16:38 -0500, Scott Wood wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O access. It won't prevent reordering of memory reads around an I/O read, though, which could be a

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho
On Tue, 20 May 2008, Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 13:40 -0700, Trent Piepho wrote: There was some discussion on a Freescale list if the powerpc I/O accessors should be strictly ordered w.r.t. normal memory. Currently they are not. It does not appear as if any other

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho
On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: For the LE versions, eventually they boil down to an asm that will look something like this: asm(sync; stwbrx %1,0,%2 : =m (*addr) : r (val), r (addr)); While not perfect, this appears to be the best one can do.

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho
On Tue, 20 May 2008, Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 16:38 -0500, Scott Wood wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O access. It won't prevent reordering of memory reads

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Scott Wood
Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread David Miller
From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:35:56 -0500 Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Scott Wood
David Miller wrote: From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:35:56 -0500 Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Andreas Schwab
Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions that only come in 'x' versions to bother with

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread David Miller
From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:43:58 -0500 David Miller wrote: The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason barrier() in kernel.h needs a memory clobber is because of a

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Alan Cox
It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed behaviour. Alan

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho
On Tue, 20 May 2008, Scott Wood wrote: Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators

Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho
On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions