Nick Piggin writes:
/* turn off LED */
val64 = readq(bar0-adapter_control);
val64 = val64 (~ADAPTER_LED_ON);
writeq(val64, bar0-adapter_control);
s2io_link(nic, LINK_DOWN);
}
On Thursday 12 June 2008 22:14, Paul Mackerras wrote:
Nick Piggin writes:
/* turn off LED */
val64 = readq(bar0-adapter_control);
val64 = val64 (~ADAPTER_LED_ON);
writeq(val64, bar0-adapter_control);
On Thu, Jun 05, 2008 at 06:43:53PM +1000, Benjamin Herrenschmidt wrote:
Note that the powerpc implementation currently clears the flag
on spin_lock and tests it on unlock. We are considering changing
that to not touch the flag on spin_lock and just clear it whenever
we do a sync (ie, on
On Thu, 2008-06-12 at 09:07 -0600, Matthew Wilcox wrote:
On Thu, Jun 05, 2008 at 06:43:53PM +1000, Benjamin Herrenschmidt wrote:
Note that the powerpc implementation currently clears the flag
on spin_lock and tests it on unlock. We are considering changing
that to not touch the flag on
On Wednesday 11 June 2008 15:35, Nick Piggin wrote:
On Wednesday 11 June 2008 15:13, Paul Mackerras wrote:
Nick Piggin writes:
I just wish we had even one actual example of things going wrong with
the current rules we have on powerpc to motivate changing to this
model.
Nick Piggin writes:
Now that doesn't leave waker ordering architectures lumped with slow old
x86 semantics. Think of it as giving them the benefit of sharing x86
development and testing :)
Worth something, but not perhaps as much as you think, given that many
x86 driver writers still don't
On Wed, 11 Jun 2008, Nick Piggin wrote:
I can't actually find the definitive statement in the Intel manuals
saying UC is strongly ordered also WRT WB. Linus?
Definitive? Dunno. But look in the Architecture manual, volume 3A, 10.3
Methods of Caching Available, and then under the bullet
On Tuesday, June 10, 2008 8:29 pm Nick Piggin wrote:
On Wednesday 11 June 2008 05:19, Jesse Barnes wrote:
On Tuesday, June 10, 2008 12:05 pm Roland Dreier wrote:
me too. That's the whole basis for readX_relaxed() and its cohorts:
we make our weirdest machines (like altix) conform to
On Wednesday 04 June 2008 05:07, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to
On Monday, June 09, 2008 11:56 pm Nick Piggin wrote:
So that still doesn't tell us what *minimum* level of ordering we should
provide in the cross platform readl/writel API. Some relatively sane
suggestions would be:
- as strong as x86. guaranteed not to break drivers that work on x86,
but
On Tue, 2008-06-10 at 10:41 -0700, Jesse Barnes wrote:
On Monday, June 09, 2008 11:56 pm Nick Piggin wrote:
So that still doesn't tell us what *minimum* level of ordering we should
provide in the cross platform readl/writel API. Some relatively sane
suggestions would be:
- as strong as
me too. That's the whole basis for readX_relaxed() and its cohorts: we
make our weirdest machines (like altix) conform to the x86 norm. Then
where it really kills us we introduce additional semantics to selected
drivers that enable us to recover I/O speed on the abnormal platforms.
On Tuesday, June 10, 2008 12:05 pm Roland Dreier wrote:
me too. That's the whole basis for readX_relaxed() and its cohorts: we
make our weirdest machines (like altix) conform to the x86 norm. Then
where it really kills us we introduce additional semantics to selected
drivers that
On Wednesday 11 June 2008 05:19, Jesse Barnes wrote:
On Tuesday, June 10, 2008 12:05 pm Roland Dreier wrote:
me too. That's the whole basis for readX_relaxed() and its cohorts:
we make our weirdest machines (like altix) conform to the x86 norm.
Then where it really kills us we
On Wed, 2008-06-11 at 13:29 +1000, Nick Piggin wrote:
Exactly, yes. I guess everybody has had good intentions here, but
as noticed, what is lacking is coordination and documentation.
You mention strong ordering WRT spin_unlock, which suggests that
you would prefer to take option #2 (the
On Wednesday 11 June 2008 13:40, Benjamin Herrenschmidt wrote:
On Wed, 2008-06-11 at 13:29 +1000, Nick Piggin wrote:
Exactly, yes. I guess everybody has had good intentions here, but
as noticed, what is lacking is coordination and documentation.
You mention strong ordering WRT
Nick Piggin writes:
OK, I'm sitll not quite sure where this has ended up. I guess you are happy
with x86 semantics as they are now. That is, all IO accesses are strongly
ordered WRT one another and WRT cacheable memory (which includes keeping
them within spinlocks),
My understanding was that
On Wednesday 11 June 2008 14:18, Paul Mackerras wrote:
Nick Piggin writes:
OK, I'm sitll not quite sure where this has ended up. I guess you are
happy with x86 semantics as they are now. That is, all IO accesses are
strongly ordered WRT one another and WRT cacheable memory (which includes
Nick Piggin writes:
I just wish we had even one actual example of things going wrong with
the current rules we have on powerpc to motivate changing to this
model.
~/usr/src/linux-2.6 git grep test_and_set_bit drivers/ | wc -l
506
How sure are you that none of those forms part of a
Jesse Barnes wrote:
Now, in hindsight, using a PIO write set test flag approach in
writeX/spin_unlock (ala powerpc) might have been a better approach, but iirc
that never came up in the discussion, probably because we were focused on PCI
posting and not uncached vs. cached ordering.
Hi
On Thu, 2008-06-05 at 10:40 +0200, Jes Sorensen wrote:
Jesse Barnes wrote:
Now, in hindsight, using a PIO write set test flag approach in
writeX/spin_unlock (ala powerpc) might have been a better approach, but
iirc
that never came up in the discussion, probably because we were focused
On Mon, 2 Jun 2008, Haavard Skinnemoen wrote:
So what happened to the old idea of putting the accessor function pointers
in the device/bus structure?
Don't know. I think it sounds like overkill to replace a simple load or
store with an indirect function call.
Indeed. *Especially* as
On Tue, Jun 03, 2008 at 06:19:05PM +1000, Nick Piggin wrote:
On Tuesday 03 June 2008 18:15, Jeremy Higdon wrote:
On Tue, Jun 03, 2008 at 02:33:11PM +1000, Nick Piggin wrote:
On Monday 02 June 2008 19:56, Jes Sorensen wrote:
Would we be able to use Ben's trick of setting a per cpu flag in
On Tuesday 03 June 2008 18:15, Jeremy Higdon wrote:
On Tue, Jun 03, 2008 at 02:33:11PM +1000, Nick Piggin wrote:
On Monday 02 June 2008 19:56, Jes Sorensen wrote:
Would we be able to use Ben's trick of setting a per cpu flag in
writel() then and checking that in spin unlock issuing the
Scott Wood [EMAIL PROTECTED] wrote:
On Mon, Jun 02, 2008 at 10:11:02AM +0200, Haavard Skinnemoen wrote:
Geert Uytterhoeven [EMAIL PROTECTED] wrote:
On Fri, 30 May 2008, Haavard Skinnemoen wrote:
Maybe we need another interface that does not do byteswapping but
provides stronger
On Tue, 2008-06-03 at 16:11 +1000, Nick Piggin wrote:
- readl is synchronous (ie, makes the CPU think the
data was actually used before executing subsequent
instructions, thus waits for the data to come back,
for example to ensure that a read used to
On Tuesday 03 June 2008 16:53, Paul Mackerras wrote:
Nick Piggin writes:
So your readl can pass an earlier cacheable store or earlier writel?
No. It's quite gross at the moment, it has a sync before the access
(i.e. a full mb()) and a twi; isync sequence after the access that
stalls
On Tue, Jun 03, 2008 at 02:33:11PM +1000, Nick Piggin wrote:
On Monday 02 June 2008 19:56, Jes Sorensen wrote:
Jeremy Higdon wrote:
We don't actually have that problem on the Altix. All writes issued
by CPU X will be ordered with respect to each other. But writes by
CPU X and CPU Y
Nick Piggin writes:
So your readl can pass an earlier cacheable store or earlier writel?
No. It's quite gross at the moment, it has a sync before the access
(i.e. a full mb()) and a twi; isync sequence after the access that
stalls execution until the data comes back.
We don't provide
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to other memory types (ie. load/load
and store/store reordering is allowed). Also, as you know, store/load
reordering is explicitly allowed as well,
On Tue, 3 Jun 2008, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to other memory types (ie. load/load
and store/store
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to other memory types (ie. load/load
and store/store reordering is allowed). Also, as you know, store/load
On Tue, Jun 03, 2008 at 11:47:00AM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to other memory types (ie. load/load
and
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 11:47:00AM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to
On Tue, 3 Jun 2008, Nick Piggin wrote:
On Monday 02 June 2008 17:24, Russell King wrote:
So, can the semantics of what's expected from these IO accessor
functions be documented somewhere. Please? Before this thread gets
lost in the depths of time?
This whole thread also ties in with my
On Tue, Jun 03, 2008 at 12:43:21PM -0700, Trent Piepho wrote:
IOW, there are four ways one can defined endianness/swapping:
1) Little-endian
2) Big-endian
3) Native-endian aka non-byte-swapping
4) Foreign-endian aka byte-swapping
1 and 2 are by far the most used. Some code wants 3. No
On Tue, Jun 03, 2008 at 12:57:56PM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 11:47:00AM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 12:43:21PM -0700, Trent Piepho wrote:
IOW, there are four ways one can defined endianness/swapping:
1) Little-endian
2) Big-endian
3) Native-endian aka non-byte-swapping
4) Foreign-endian aka byte-swapping
1 and 2 are by far the
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 12:57:56PM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 11:47:00AM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
On Tue, 2008-06-03 at 12:43 -0700, Trent Piepho wrote:
Byte-swapping vs not byte-swapping is not usually what the programmer wants.
Usually your device's registers are defined as being big-endian or
little-endian and you want whatever is needed to give you that.
Yes, which is why I (and
On Wednesday 04 June 2008 07:58, Trent Piepho wrote:
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 12:57:56PM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
On Tue, Jun 03, 2008 at 11:47:00AM -0700, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus
On Wednesday 04 June 2008 05:07, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Trent Piepho wrote:
On Tue, 3 Jun 2008, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to
On Wednesday 04 June 2008 00:47, Linus Torvalds wrote:
On Tue, 3 Jun 2008, Nick Piggin wrote:
Linus: on x86, memory operations to wc and wc+ memory are not ordered
with one another, or operations to other memory types (ie. load/load
and store/store reordering is allowed). Also, as you know,
On Wednesday 04 June 2008 07:44, Trent Piepho wrote:
On Tue, 3 Jun 2008, Matthew Wilcox wrote:
I don't understand why you keep talking about DMA. Are you talking
about ordering between readX() and DMA? PCI proides those guarantees.
I guess you haven't been reading the whole thread. The
On Wed, 4 Jun 2008, Nick Piggin wrote:
Actually, according to the document I am looking at (the AMD one), a UC
store may pass a previous WC store.
Hmm. Intel arch manyal, Vol 3, 10.3 (page 10-7 in my version):
If the WC bufer is partially filled, the writes may be delayed until
the
On Tue, May 27, 2008 at 02:55:56PM -0700, Linus Torvalds wrote:
On Wed, 28 May 2008, Benjamin Herrenschmidt wrote:
A problem with __raw_ though is that they -also- don't do byteswap,
Well, that's why there is __readl() and __raw_readl(), no?
Neither does ordering, and __raw_readl()
Geert Uytterhoeven [EMAIL PROTECTED] wrote:
On Fri, 30 May 2008, Haavard Skinnemoen wrote:
Maybe we need another interface that does not do byteswapping but
provides stronger ordering guarantees?
The byte swapping depends on the device/bus.
Of course. But isn't it reasonable to assume
Jeremy Higdon wrote:
We don't actually have that problem on the Altix. All writes issued
by CPU X will be ordered with respect to each other. But writes by
CPU X and CPU Y will not be, unless an mmiowb() is done by the
original CPU before the second CPU writes. I.e.
CPU X writel
* Linus Torvalds [EMAIL PROTECTED] wrote:
Here's a UNTESTED patch for x86 that may or may not compile and
work, and which serializes (on a compiler level) the IO accesses
against regular memory accesses.
Ok, so it at least boots on x86-32. Thus probably on x86-64 too (since
the code
Pavel Machek wrote:
Still better than changing semantics of writel for _all_ drivers.
If you are really sure driver does not depend on writel order, it is
just a sed/// command, so I don't see any big code maintenance
issues...
This isn't changing the semantics for all drivers, it means it
On Mon, Jun 02, 2008 at 10:11:02AM +0200, Haavard Skinnemoen wrote:
Geert Uytterhoeven [EMAIL PROTECTED] wrote:
On Fri, 30 May 2008, Haavard Skinnemoen wrote:
Maybe we need another interface that does not do byteswapping but
provides stronger ordering guarantees?
The byte swapping
On Mon, Jun 02, 2008 at 11:56:39AM +0200, Jes Sorensen wrote:
Jeremy Higdon wrote:
We don't actually have that problem on the Altix. All writes issued
by CPU X will be ordered with respect to each other. But writes by
CPU X and CPU Y will not be, unless an mmiowb() is done by the
original
On Mon, 2008-06-02 at 12:36 +0200, Ingo Molnar wrote:
The patch passed initial light testing in -tip (~30 successful random
self-builds and bootups on various mixed 32-bit/64-bit boxes) but
it's
still v2.6.27 material IMO.
I think adding the memory clobber should be .26 and even -stable. We
On Monday 02 June 2008 19:56, Jes Sorensen wrote:
Jeremy Higdon wrote:
We don't actually have that problem on the Altix. All writes issued
by CPU X will be ordered with respect to each other. But writes by
CPU X and CPU Y will not be, unless an mmiowb() is done by the
original CPU
Hi!
Though it's my understanding that at least ia64 does require the
explicit barriers anyway, so we are still in a dodgy situation here
where it's not clear what drivers should do and we end up with
possibly excessive barriers on powerpc where I end up with both
the wmb/rmb/mb
Hi!
The only way to guarantee ordering in the above setup,
is to either
make writel() fully ordered or adding the mmiowb()'s
inbetween the two
writel's. On Altix you have to go and read from the
PCI brige to
ensure all writes to it have been flushed, which is
also what mmiowb()
is
On Thu, May 29, 2008 at 10:47:18AM -0400, Jes Sorensen wrote:
Thats not going to solve the problem on Altix. On Altix the issue is
that there can be multiple paths through the NUMA fabric from cpuX to
PCI bridge Y.
Consider this uber-cooltm ascii art - NR is my abbrevation for NUMA
router:
On Fri, May 30, 2008 at 10:21:00AM -0700, Jesse Barnes wrote:
On Friday, May 30, 2008 2:36 am Jes Sorensen wrote:
James Bottomley wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's.
On Fri, 30 May 2008 11:13:23 +1000
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
Currently, this is the only interface I know that can do native-endian
accesses, so if you take it away, I'm gonna need an alternative
interface that doesn't do byteswapping.
Are you aware that these
On Fri, 2008-05-30 at 08:07 +0200, Haavard Skinnemoen wrote:
On Fri, 30 May 2008 11:13:23 +1000
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
Currently, this is the only interface I know that can do native-endian
accesses, so if you take it away, I'm gonna need an alternative
On Fri, 30 May 2008 17:24:27 +1000
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
On Fri, 2008-05-30 at 08:07 +0200, Haavard Skinnemoen wrote:
I think the drivers I've written have the necessary barriers (or dma
ops with implicit barriers) that they don't actually depend on any
DMA vs.
On Fri, 30 May 2008, Haavard Skinnemoen wrote:
Maybe we need another interface that does not do byteswapping but
provides stronger ordering guarantees?
The byte swapping depends on the device/bus.
So what happened to the old idea of putting the accessor function pointers
in the device/bus
James Bottomley wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go and read from the PCI brige to
ensure all writes to it have been flushed, which is also what
Jesse Barnes wrote:
On Thursday, May 29, 2008 2:40 pm Benjamin Herrenschmidt wrote:
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On
Benjamin Herrenschmidt wrote:
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go and read from the PCI brige to
ensure
On Friday, May 30, 2008 2:36 am Jes Sorensen wrote:
James Bottomley wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go and read from the PCI brige to
ensure all
On Wednesday 28 May 2008, Benjamin Herrenschmidt wrote:
On Tue, 2008-05-27 at 14:55 -0700, Linus Torvalds wrote:
On Wed, 28 May 2008, Benjamin Herrenschmidt wrote:
A problem with __raw_ though is that they -also- don't do byteswap,
Well, that's why there is __readl() and
It's not exactly a well-established interface. Only five architectures
define these functions, and there is not a single user in the kernel
source outside of these architecture's io.h files.
That is because the drivers using them had them removed (eg I²O) - mostly
because it didn't compile on
On 28 Μαϊ 2008, at 11:36 ΠΜ, Haavard Skinnemoen wrote:
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
I'm happy to say that __raw is purely about ordering and make them
byteswap on powerpc tho (ie, make them little endian like the non-
raw
counterpart).
That would break a lot of drivers.
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
Roland == Roland Dreier [EMAIL PROTECTED] writes:
This is a different issue. We deal with it on powerpc by having
writel set a per-cpu flag and spin_unlock() test it, and do the
barrier if needed there.
Roland Cool... I assume you
Roland == Roland Dreier [EMAIL PROTECTED] writes:
This is a different issue. We deal with it on powerpc by having
writel set a per-cpu flag and spin_unlock() test it, and do the
barrier if needed there.
Roland Cool... I assume you do this for mutex_unlock() etc?
Roland Is there any reason
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go and read from the PCI brige to
ensure all writes to it have been
On Fri, 30 May 2008, Benjamin Herrenschmidt wrote:
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
Interesting. I've always been taught by ia64 people that mmiowb() was
intended to be used solely between writel() and spin_unlock().
That's what I gathered too, based on what's written in
On Thursday, May 29, 2008 2:40 pm Benjamin Herrenschmidt wrote:
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go
The problem is that your two writel's, despite being both issued on
cpu X, due to the spin lock, in your example, can end up with the
first one going through NR 1 and the second one going through NR 2. If
there's contention on NR 1, the write going via NR 2 may hit the PCI
bridge prior
On Thu, 2008-05-29 at 14:48 -0700, Trent Piepho wrote:
I wrote a JTAG over gpio driver for the powerpc MPC8572DS platform. With the
non-raw io accessors, the JTAG clock can run at almost ~9.5 MHz. Using raw
versions (which I had to write since powerpc doesn't have any), the clock
speed
I do -- in all the drivers for on-chip peripherals that are shared
between AT91 ARM (LE) and AVR32 (BE). Since everything goes on inside
the chip, we must use LE accesses on ARM and BE accesses on AVR32.
Currently, this is the only interface I know that can do native-endian
accesses, so if
On Fri, 30 May 2008, Benjamin Herrenschmidt wrote:
On Thu, 2008-05-29 at 14:48 -0700, Trent Piepho wrote:
I wrote a JTAG over gpio driver for the powerpc MPC8572DS platform. With the
non-raw io accessors, the JTAG clock can run at almost ~9.5 MHz. Using raw
versions (which I had to write
Trent Piepho writes:
On Thu, 29 May 2008, Roland Dreier wrote:
The problem is that your two writel's, despite being both issued on
cpu X, due to the spin lock, in your example, can end up with the
first one going through NR 1 and the second one going through NR 2. If
there's
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
I'm happy to say that __raw is purely about ordering and make them
byteswap on powerpc tho (ie, make them little endian like the non-raw
counterpart).
That would break a lot of drivers.
How many actually use __raw_ * ?
I do --
On Tue, 27 May 2008, Linus Torvalds wrote:
Here's a UNTESTED patch for x86 that may or may not compile and work, and
which serializes (on a compiler level) the IO accesses against regular
memory accesses.
Ok, so it at least boots on x86-32. Thus probably on x86-64 too (since the
code is
On Tue, 2008-05-27 at 08:50 -0700, Roland Dreier wrote:
Though it's my understanding that at least ia64 does require the
explicit barriers anyway, so we are still in a dodgy situation here
where it's not clear what drivers should do and we end up with
possibly excessive barriers on
Actually, this specifically should not be. The need for mmiowb on altix
is because it explicitly violates some of the PCI rules that would
otherwise impede performance. The compromise is that readX on altix
contains the needed dma flush but there's a variant operator,
readX_relaxed
On Tue, 2008-05-27 at 10:38 -0700, Roland Dreier wrote:
Actually, this specifically should not be. The need for mmiowb on altix
is because it explicitly violates some of the PCI rules that would
otherwise impede performance. The compromise is that readX on altix
contains the needed
Um, OK, you've said write twice now ... I was assuming you meant read.
Even on an x86, writes are posted, so there's no way a spin lock could
serialise a write without an intervening read to flush the posting
(that's why only reads have a relaxed version on altix). Or is there
something
Writes are posted yes, but not reordered arbitrarily.
on standard x86 I mean here...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev
On Tue, 27 May 2008, Linus Torvalds wrote:
On Tue, 27 May 2008, Benjamin Herrenschmidt wrote:
Yes. As it is today, tg3 for example is potentially broken on all archs
with newer gcc unless we either add memory clobber to readl/writel or
stick some wmb's in there (just a random driver I picked).
Trent Piepho wrote:
Is there an issue with anything _besides_ coherent DMA?
Could one have a special version of the accessors for drivers that
want to assume they are strictly ordered vs coherent DMA memory?
That would be much easier to get right, without slowing _everything_
down.
It's
On Tue, 2008-05-27 at 08:35 -0700, Linus Torvalds wrote:
On Tue, 27 May 2008, Benjamin Herrenschmidt wrote:
Yes. As it is today, tg3 for example is potentially broken on all archs
with newer gcc unless we either add memory clobber to readl/writel or
stick some wmb's in there (just a
On Tue, 2008-05-27 at 08:50 -0700, Roland Dreier wrote:
Though it's my understanding that at least ia64 does require the
explicit barriers anyway, so we are still in a dodgy situation here
where it's not clear what drivers should do and we end up with
possibly excessive barriers on
On Tue, 2008-05-27 at 09:47 -0700, Linus Torvalds wrote:
__read[bwlq]()/__write[bwlq]() are not serialized with a :memory
barrier, although since they still use asm volatile I suspect that
i
practice they are probably serial too. Did not look very closely at
any
generated code (only
Roland Dreier wrote:
Writes are posted yes, but not reordered arbitrarily. If I have code like:
spin_lock(mmio_lock);
writel(val1, reg1);
writel(val2, reg2);
spin_unlock(mmio_lock);
then I have a reasonable expectation that if two CPUs run this at the
same
Writes are posted yes, but not reordered arbitrarily. If I have code like:
spin_lock(mmio_lock);
writel(val1, reg1);
writel(val2, reg2);
spin_unlock(mmio_lock);
then I have a reasonable expectation that if two CPUs run this at the
same time, their writes to
On Wed, 28 May 2008, Benjamin Herrenschmidt wrote:
On Tue, 2008-05-27 at 08:35 -0700, Linus Torvalds wrote:
Expecting people to fix up all drivers is simply not going to happen. And
serializing things shouldn't be *that* expensive. People who cannot take
the expense can continue to
This is a different issue. We deal with it on powerpc by having writel
set a per-cpu flag and spin_unlock() test it, and do the barrier if
needed there.
Cool... I assume you do this for mutex_unlock() etc?
Is there any reason why ia64 can't do this too so we can kill mmiowb and
save
So practically speaking, I suspect that the right approach is to just say
ok, x86 will continue to be pretty darn ordered, and the barriers won't
really matter (*) but at the same time also saying we wish reality was
different, and well-maintained drivers should probably try to work in the
re-ordering, even though I doubt it will be visible in practice. So if you
use the __ versions, you'd better have barriers even on x86!
Are we also going to have __ioread*/__iowrite* ?
Also is the sematics of __readl/__writel defined for all architectures -
I used it ages ago in the i2o
On Tue, May 27, 2008 at 10:38:22PM +0100, Alan Cox wrote:
re-ordering, even though I doubt it will be visible in practice. So if you
use the __ versions, you'd better have barriers even on x86!
Are we also going to have __ioread*/__iowrite* ?
Didn't we already define ioread*() to have
On Wed, 28 May 2008, Benjamin Herrenschmidt wrote:
A problem with __raw_ though is that they -also- don't do byteswap,
Well, that's why there is __readl() and __raw_readl(), no?
Neither does ordering, and __raw_readl() doesn't do byte-swap.
Of course, I'm not going to guarantee every
On Tue, 27 May 2008, Alan Cox wrote:
re-ordering, even though I doubt it will be visible in practice. So if you
use the __ versions, you'd better have barriers even on x86!
Are we also going to have __ioread*/__iowrite* ?
I doubt there is any reason to. Let's just keep them very
1 - 100 of 107 matches
Mail list logo