Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Nick Piggin

Paul E. McKenney wrote:

On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:



Maybe it is the safe way to go, but it does obscure cases where there
is a real need for barriers.



I prefer burying barriers into other primitives.


When they should naturally be there, eg. locking or the RCU primitives,
I agree.

I don't like having them scattered in various just in case places,
because it makes both the users and the APIs hard to understand and
change.

Especially since several big architectures don't have volatile in their
atomic_get and _set, I think it would be a step backwards to add them in
as a just in case thin now (unless there is a better reason).



Many atomic operations are allowed to be reordered between CPUs, so
I don't have a good idea for the rationale to order them within the
CPU (also loads and stores to long and ptr types are not ordered like
this, although we do consider those to be atomic operations too).

barrier() in a way is like enforcing sequential memory ordering
between process and interrupt context, wheras volatile is just
enforcing coherency of a single memory location (and as such is
cheaper).



barrier() is useful, but it has the very painful side-effect of forcing
the compiler to dump temporaries.  So we do need something that is
not quite so global in effect.


Yep.



What do you think of this crazy idea?

/* Enforce a compiler barrier for only operations to location X.
* Call multiple times to provide an ordering between multiple
* memory locations. Other memory operations can be assumed by
* the compiler to remain unchanged and may be reordered
*/
#define order(x) asm volatile( : +m (x))



There was something very similar discussed earlier in this thread,
with quite a bit of debate as to exactly what the m flag should
look like.  I suggested something similar named ACCESS_ONCE in the
context of RCU (http://lkml.org/lkml/2007/7/11/664):


Oh, I missed that earlier debate. Will go have a look.



#define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x))

The nice thing about this is that it works for both loads and stores.
Not clear that order() above does this -- I get compiler errors when
I try something like b = order(a) or order(a) = 1 using gcc 4.1.2.


As Arnd ponted out, order() is not supposed to be an lvalue, but a
statement like the rest of our memory ordering API.

As to your followup question of why to use it over ACCESS_ONCE. I
guess, aside from consistency with the rest of the barrier APIs, you
can use it in other primitives when you don't actually know what the
caller is going to do or if it even will make an access. You could
also use it between calls to _other_ primitives, etc... it just
seems more flexible to me, but I haven't actually used such a thing
in real code...

ACCESS_ONCE doesn't seem as descriptive. What it results in is the
memory location being loaded or stored (presumably once exactly),
but I think the more general underlying idea is a barrier point.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Arnd Bergmann
On Wednesday 15 August 2007, Paul E. McKenney wrote:

 ACCESS_ONCE() is indeed intended to be used when actually loading or
 storing the variable.  That said, I must admit that it is not clear to me
 why you would want to add an extra order() rather than ACCESS_ONCE()ing
 one or both of the adjacent accesses to that same variable.
 
 So, what am I missing?

You're probably right, the only case I can construct is something like

if (ACCESS_ONCE(x)) {
...
ACCESS_ONCE(x)++;
}

which would be slightly less efficient than

if (x)
x++;
order(x);

because in the first case, you need to do two ordered accesses
but only one in the second case. However, I can't think of a case
where this actually makes a noticable difference in real life.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Michael Buesch
On Wednesday 15 August 2007 15:29:43 Arnd Bergmann wrote:
 On Wednesday 15 August 2007, Paul E. McKenney wrote:
 
  ACCESS_ONCE() is indeed intended to be used when actually loading or
  storing the variable.  That said, I must admit that it is not clear to me
  why you would want to add an extra order() rather than ACCESS_ONCE()ing
  one or both of the adjacent accesses to that same variable.
  
  So, what am I missing?
 
 You're probably right, the only case I can construct is something like
 
   if (ACCESS_ONCE(x)) {
   ...
   ACCESS_ONCE(x)++;
   }
 
 which would be slightly less efficient than
 
   if (x)
   x++;
   order(x);
 
 because in the first case, you need to do two ordered accesses
 but only one in the second case. However, I can't think of a case
 where this actually makes a noticable difference in real life.

How can this example actually get used in a sane and race-free
way? This would need locking around the whole if
statement. But locking is a barrier.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool
Well if there is only one memory location involved, then smp_rmb() 
isn't

going to really do anything anyway, so it would be incorrect to use it.


rmb() orders *any* two reads; that includes two reads from the same
location.


Consider that smp_rmb basically will do anything from flushing the
pipeline to invalidating loads speculatively executed out of order. 
AFAIK
it will not control the visibility of stores coming from other CPUs 
(that

is up to the cache coherency).


The writer side should typically use wmb() whenever the reader side
uses rmb(), sure.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Paul E. McKenney
On Wed, Aug 15, 2007 at 08:51:58PM +0200, Segher Boessenkool wrote:
 Well if there is only one memory location involved, then smp_rmb() 
 isn't
 going to really do anything anyway, so it would be incorrect to use it.
 
 rmb() orders *any* two reads; that includes two reads from the same
 location.

If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.

Thanx, Paul

 Consider that smp_rmb basically will do anything from flushing the
 pipeline to invalidating loads speculatively executed out of order. 
 AFAIK
 it will not control the visibility of stores coming from other CPUs 
 (that
 is up to the cache coherency).
 
 The writer side should typically use wmb() whenever the reader side
 uses rmb(), sure.
 
 
 Segher
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use 
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.


rmb() orders *any* two reads; that includes two reads from the same
location.


If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.


That's true of course, although there is no real guarantee for that.


A CPU that did not provide this property (cache coherence) would be
most emphatically reviled.


That doesn't have anything to do with coherency as far as I can see.

It's just about the order in which a CPU (speculatively) performs the 
loads

(which isn't necessarily the same as the order in which it executes the
corresponding instructions, even).


So we are pretty safe assuming that CPUs
will provide it.


Yeah, pretty safe.  I just don't like undocumented assumptions :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Paul E. McKenney
On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:
 Paul E. McKenney wrote:
 On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
 
 Maybe it is the safe way to go, but it does obscure cases where there
 is a real need for barriers.
 
 
 I prefer burying barriers into other primitives.
 
 When they should naturally be there, eg. locking or the RCU primitives,
 I agree.
 
 I don't like having them scattered in various just in case places,
 because it makes both the users and the APIs hard to understand and
 change.

I certainly agree that we shouldn't do volatile just for the fun of it,
and also that use of volatile should be quite rare.

 Especially since several big architectures don't have volatile in their
 atomic_get and _set, I think it would be a step backwards to add them in
 as a just in case thin now (unless there is a better reason).

Good point, except that I would expect gcc's optimization to continue
to improve.  I would like the kernel to be able to take advantage of
improved optimization, which means that we are going to have to make
a few changes.  Adding volatile to atomic_get() and atomic_set() is
IMHO one of those changes.

 Many atomic operations are allowed to be reordered between CPUs, so
 I don't have a good idea for the rationale to order them within the
 CPU (also loads and stores to long and ptr types are not ordered like
 this, although we do consider those to be atomic operations too).
 
 barrier() in a way is like enforcing sequential memory ordering
 between process and interrupt context, wheras volatile is just
 enforcing coherency of a single memory location (and as such is
 cheaper).
 
 
 barrier() is useful, but it has the very painful side-effect of forcing
 the compiler to dump temporaries.  So we do need something that is
 not quite so global in effect.
 
 Yep.
 
 What do you think of this crazy idea?
 
 /* Enforce a compiler barrier for only operations to location X.
 * Call multiple times to provide an ordering between multiple
 * memory locations. Other memory operations can be assumed by
 * the compiler to remain unchanged and may be reordered
 */
 #define order(x) asm volatile( : +m (x))
 
 There was something very similar discussed earlier in this thread,
 with quite a bit of debate as to exactly what the m flag should
 look like.  I suggested something similar named ACCESS_ONCE in the
 context of RCU (http://lkml.org/lkml/2007/7/11/664):
 
 Oh, I missed that earlier debate. Will go have a look.
 
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x))
 
 The nice thing about this is that it works for both loads and stores.
 Not clear that order() above does this -- I get compiler errors when
 I try something like b = order(a) or order(a) = 1 using gcc 4.1.2.
 
 As Arnd ponted out, order() is not supposed to be an lvalue, but a
 statement like the rest of our memory ordering API.
 
 As to your followup question of why to use it over ACCESS_ONCE. I
 guess, aside from consistency with the rest of the barrier APIs, you
 can use it in other primitives when you don't actually know what the
 caller is going to do or if it even will make an access. You could
 also use it between calls to _other_ primitives, etc... it just
 seems more flexible to me, but I haven't actually used such a thing
 in real code...
 
 ACCESS_ONCE doesn't seem as descriptive. What it results in is the
 memory location being loaded or stored (presumably once exactly),
 but I think the more general underlying idea is a barrier point.

OK, first, I am not arguing that ACCESS_ONCE() can replace all current
uses of barrier().  Similarly, rcu_dereference(), rcu_assign_pointer(),
and the various RCU versions of the list-manipulation API in no way
replaced all uses of explicit memory barriers.  However, I do believe that
these API are much easier to use (where they apply, of course) than were
the earlier idioms involving explicit memory barriers.

But we of course need to keep the explicit memory and compiler barriers
for other situations.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Paul E. McKenney
On Wed, Aug 15, 2007 at 09:46:55PM +0200, Segher Boessenkool wrote:
 Well if there is only one memory location involved, then smp_rmb()
 isn't
 going to really do anything anyway, so it would be incorrect to use 
 it.
 
 rmb() orders *any* two reads; that includes two reads from the same
 location.
 
 If the two reads are to the same location, all CPUs I am aware of
 will maintain the ordering without need for a memory barrier.
 
 That's true of course, although there is no real guarantee for that.

A CPU that did not provide this property (cache coherence) would be
most emphatically reviled.  So we are pretty safe assuming that CPUs
will provide it.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Paul E. McKenney
On Wed, Aug 15, 2007 at 10:13:49PM +0200, Segher Boessenkool wrote:
 Well if there is only one memory location involved, then smp_rmb()
 isn't
 going to really do anything anyway, so it would be incorrect to use
 it.
 
 rmb() orders *any* two reads; that includes two reads from the same
 location.
 
 If the two reads are to the same location, all CPUs I am aware of
 will maintain the ordering without need for a memory barrier.
 
 That's true of course, although there is no real guarantee for that.
 
 A CPU that did not provide this property (cache coherence) would be
 most emphatically reviled.
 
 That doesn't have anything to do with coherency as far as I can see.
 
 It's just about the order in which a CPU (speculatively) performs the 
 loads
 (which isn't necessarily the same as the order in which it executes the
 corresponding instructions, even).

Please check the definition of cache coherence.

Summary: the CPU is indeed within its rights to execute loads and stores
to a single variable out of order, -but- only if it gets the same result
that it would have obtained by executing them in order.  Which means that
any reordering of accesses by a single CPU to a single variable will be
invisible to the software.

 So we are pretty safe assuming that CPUs
 will provide it.
 
 Yeah, pretty safe.  I just don't like undocumented assumptions :-)

Can't help you there!  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Segher Boessenkool

Please check the definition of cache coherence.


Which of the twelve thousand such definitions?  :-)

Summary: the CPU is indeed within its rights to execute loads and 
stores
to a single variable out of order, -but- only if it gets the same 
result
that it would have obtained by executing them in order.  Which means 
that

any reordering of accesses by a single CPU to a single variable will be
invisible to the software.


I'm still not sure if that applies to all architectures.
Doesn't matter anyway, let's kill this thread :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Nick Piggin

Segher Boessenkool wrote:

Please check the definition of cache coherence.



Which of the twelve thousand such definitions?  :-)


Every definition I have seen says that writes to a single memory
location have a serial order as seen by all CPUs, and that a read
will return the most recent write in the sequence (with a bit of
extra mumbo jumbo to cover store queues and store forwarding).

Within such a definition, I don't see how would be allowed for a
single CPU to execute reads out of order and have the second read
return an earlier write than the first read.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Nick Piggin

Paul E. McKenney wrote:

On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:



Especially since several big architectures don't have volatile in their
atomic_get and _set, I think it would be a step backwards to add them in
as a just in case thin now (unless there is a better reason).



Good point, except that I would expect gcc's optimization to continue
to improve.  I would like the kernel to be able to take advantage of
improved optimization, which means that we are going to have to make
a few changes.  Adding volatile to atomic_get() and atomic_set() is
IMHO one of those changes.


What optimisations? gcc already does most of the things you need a
barrier/volatile for, like reordering non-dependant loads and stores,
and eliminating mem ops completely by caching in registers.



As to your followup question of why to use it over ACCESS_ONCE. I
guess, aside from consistency with the rest of the barrier APIs, you
can use it in other primitives when you don't actually know what the
caller is going to do or if it even will make an access. You could
also use it between calls to _other_ primitives, etc... it just
seems more flexible to me, but I haven't actually used such a thing
in real code...

ACCESS_ONCE doesn't seem as descriptive. What it results in is the
memory location being loaded or stored (presumably once exactly),
but I think the more general underlying idea is a barrier point.



OK, first, I am not arguing that ACCESS_ONCE() can replace all current
uses of barrier().


OK. Well I also wasn't saying that ACCESS_ONCE should not be
implemented. But if we want something like it, then it would make
sense to have an equivalent barrier statement as well (ie. order()).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 11:09:25AM +1000, Nick Piggin wrote:
 Paul E. McKenney wrote:
 On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:
 
 Especially since several big architectures don't have volatile in their
 atomic_get and _set, I think it would be a step backwards to add them in
 as a just in case thin now (unless there is a better reason).
 
 Good point, except that I would expect gcc's optimization to continue
 to improve.  I would like the kernel to be able to take advantage of
 improved optimization, which means that we are going to have to make
 a few changes.  Adding volatile to atomic_get() and atomic_set() is
 IMHO one of those changes.
 
 What optimisations? gcc already does most of the things you need a
 barrier/volatile for, like reordering non-dependant loads and stores,
 and eliminating mem ops completely by caching in registers.

Yep, most of the currently practiced optimizations.  Given that CPU clock
frequencies are not rising as quickly as they once did, I would expect
that there will be added effort on implementing the known more-aggressive
optimization techniques, and on coming up with new ones as well.

Some of these new optimizations will likely be inappropriate for kernel
code, but I expect that some will be things that we want.

 As to your followup question of why to use it over ACCESS_ONCE. I
 guess, aside from consistency with the rest of the barrier APIs, you
 can use it in other primitives when you don't actually know what the
 caller is going to do or if it even will make an access. You could
 also use it between calls to _other_ primitives, etc... it just
 seems more flexible to me, but I haven't actually used such a thing
 in real code...
 
 ACCESS_ONCE doesn't seem as descriptive. What it results in is the
 memory location being loaded or stored (presumably once exactly),
 but I think the more general underlying idea is a barrier point.
 
 OK, first, I am not arguing that ACCESS_ONCE() can replace all current
 uses of barrier().
 
 OK. Well I also wasn't saying that ACCESS_ONCE should not be
 implemented. But if we want something like it, then it would make
 sense to have an equivalent barrier statement as well (ie. order()).

And I am not arguing against use of asms to implement the volatility
in atomic_read() and atomic_set(), and in fact it appears that one
of the architectures will be taking this approach.

Sounds like we might be in violent agreement.  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-14 Thread Herbert Xu
On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:

 What do you think of this crazy idea?
 
 /* Enforce a compiler barrier for only operations to location X.
  * Call multiple times to provide an ordering between multiple
  * memory locations. Other memory operations can be assumed by
  * the compiler to remain unchanged and may be reordered
  */
 #define order(x) asm volatile( : +m (x))

Yes this is a very good idea.  This also makes it explicit
that the coder is depending on this rather than the more
vague connotations of atomic_read.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-14 Thread Paul E. McKenney
On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
 Paul E. McKenney wrote:
 On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:
 
 Paul E. McKenney [EMAIL PROTECTED] wrote:
 
 On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
 
 Chris Snook [EMAIL PROTECTED] wrote:
 
 cpu_relax() contains a barrier, so it should do the right thing.  For 
 non-smp architectures, I'm concerned about interacting with interrupt 
 handlers.  Some drivers do use atomic_* operations.
 
 What problems with interrupt handlers? Access to int/long must
 be atomic or we're in big trouble anyway.
 
 Reordering due to compiler optimizations.  CPU reordering does not
 affect interactions with interrupt handlers on a given CPU, but
 reordering due to compiler code-movement optimization does.  Since
 volatile can in some cases suppress code-movement optimizations,
 it can affect interactions with interrupt handlers.
 
 If such reordering matters, then you should use one of the
 *mb macros or barrier() rather than relying on possibly
 hidden volatile cast.
 
 
 If communicating among CPUs, sure.  However, when communicating between
 mainline and interrupt/NMI handlers on the same CPU, the barrier() and
 most expecially the *mb() macros are gross overkill.  So there really
 truly is a place for volatile -- not a large place, to be sure, but a
 place nonetheless.
 
 I really would like all volatile users to go away and be replaced
 by explicit barriers. It makes things nicer and more explicit... for
 atomic_t type there probably aren't many optimisations that can be
 made which volatile would disallow (in actual kernel code), but for
 others (eg. bitops, maybe atomic ops in UP kernels), there would be.
 
 Maybe it is the safe way to go, but it does obscure cases where there
 is a real need for barriers.

I prefer burying barriers into other primitives.

 Many atomic operations are allowed to be reordered between CPUs, so
 I don't have a good idea for the rationale to order them within the
 CPU (also loads and stores to long and ptr types are not ordered like
 this, although we do consider those to be atomic operations too).
 
 barrier() in a way is like enforcing sequential memory ordering
 between process and interrupt context, wheras volatile is just
 enforcing coherency of a single memory location (and as such is
 cheaper).

barrier() is useful, but it has the very painful side-effect of forcing
the compiler to dump temporaries.  So we do need something that is
not quite so global in effect.

 What do you think of this crazy idea?
 
 /* Enforce a compiler barrier for only operations to location X.
  * Call multiple times to provide an ordering between multiple
  * memory locations. Other memory operations can be assumed by
  * the compiler to remain unchanged and may be reordered
  */
 #define order(x) asm volatile( : +m (x))

There was something very similar discussed earlier in this thread,
with quite a bit of debate as to exactly what the m flag should
look like.  I suggested something similar named ACCESS_ONCE in the
context of RCU (http://lkml.org/lkml/2007/7/11/664):

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x))

The nice thing about this is that it works for both loads and stores.
Not clear that order() above does this -- I get compiler errors when
I try something like b = order(a) or order(a) = 1 using gcc 4.1.2.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-14 Thread Arnd Bergmann
On Tuesday 14 August 2007, Paul E. McKenney wrote:
  #define order(x) asm volatile( : +m (x))
 
 There was something very similar discussed earlier in this thread,
 with quite a bit of debate as to exactly what the m flag should
 look like.  I suggested something similar named ACCESS_ONCE in the
 context of RCU (http://lkml.org/lkml/2007/7/11/664):
 
 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x))
 
 The nice thing about this is that it works for both loads and stores.
 Not clear that order() above does this -- I get compiler errors when
 I try something like b = order(a) or order(a) = 1 using gcc 4.1.2.

Well, it serves a different purpose: While your ACCESS_ONCE() macro is
an lvalue, the order() macro is a statement that can be used in place
of the barrier() macro. order() is the most lightweight barrier as it
only enforces ordering on a single variable in the compiler, but does
not have any side-effects visible to other threads, like the cache
line access in ACCESS_ONCE has.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-14 Thread Paul E. McKenney
On Wed, Aug 15, 2007 at 12:01:54AM +0200, Arnd Bergmann wrote:
 On Tuesday 14 August 2007, Paul E. McKenney wrote:
   #define order(x) asm volatile( : +m (x))
  
  There was something very similar discussed earlier in this thread,
  with quite a bit of debate as to exactly what the m flag should
  look like.  I suggested something similar named ACCESS_ONCE in the
  context of RCU (http://lkml.org/lkml/2007/7/11/664):
  
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x))
  
  The nice thing about this is that it works for both loads and stores.
  Not clear that order() above does this -- I get compiler errors when
  I try something like b = order(a) or order(a) = 1 using gcc 4.1.2.
 
 Well, it serves a different purpose: While your ACCESS_ONCE() macro is
 an lvalue, the order() macro is a statement that can be used in place
 of the barrier() macro. order() is the most lightweight barrier as it
 only enforces ordering on a single variable in the compiler, but does
 not have any side-effects visible to other threads, like the cache
 line access in ACCESS_ONCE has.

ACCESS_ONCE() is indeed intended to be used when actually loading or
storing the variable.  That said, I must admit that it is not clear to me
why you would want to add an extra order() rather than ACCESS_ONCE()ing
one or both of the adjacent accesses to that same variable.

So, what am I missing?

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-13 Thread Paul E. McKenney
On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:
 Paul E. McKenney [EMAIL PROTECTED] wrote:
  On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
  Chris Snook [EMAIL PROTECTED] wrote:
   
   cpu_relax() contains a barrier, so it should do the right thing.  For 
   non-smp architectures, I'm concerned about interacting with interrupt 
   handlers.  Some drivers do use atomic_* operations.
  
  What problems with interrupt handlers? Access to int/long must
  be atomic or we're in big trouble anyway.
  
  Reordering due to compiler optimizations.  CPU reordering does not
  affect interactions with interrupt handlers on a given CPU, but
  reordering due to compiler code-movement optimization does.  Since
  volatile can in some cases suppress code-movement optimizations,
  it can affect interactions with interrupt handlers.
 
 If such reordering matters, then you should use one of the
 *mb macros or barrier() rather than relying on possibly
 hidden volatile cast.

If communicating among CPUs, sure.  However, when communicating between
mainline and interrupt/NMI handlers on the same CPU, the barrier() and
most expecially the *mb() macros are gross overkill.  So there really
truly is a place for volatile -- not a large place, to be sure, but a
place nonetheless.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-13 Thread Chris Snook

David Howells wrote:

Chris Snook [EMAIL PROTECTED] wrote:


cpu_relax() contains a barrier, so it should do the right thing.  For non-smp
architectures, I'm concerned about interacting with interrupt handlers.  Some
drivers do use atomic_* operations.


I'm not sure that actually answers my question.  Why not smp_rmb()?

David


I would assume because we want to waste time efficiently even on non-smp 
architectures, rather than frying the CPU or draining the battery.  Certain 
looping execution patterns can cause the CPU to operate above thermal design 
power.  I have fans on my workstation that only ever come on when running 
LINPACK, and that's generally memory bandwidth-bound.  Just imagine what happens 
when you're executing the same few non-serializing instructions in a tight loop 
without ever stalling on memory fetches, or being scheduled out.


If there's another reason, I'd like to hear it too, because I'm just guessing 
here.

-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-13 Thread Nick Piggin

Paul E. McKenney wrote:

On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:


Paul E. McKenney [EMAIL PROTECTED] wrote:


On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:


Chris Snook [EMAIL PROTECTED] wrote:

cpu_relax() contains a barrier, so it should do the right thing.  For 
non-smp architectures, I'm concerned about interacting with interrupt 
handlers.  Some drivers do use atomic_* operations.


What problems with interrupt handlers? Access to int/long must
be atomic or we're in big trouble anyway.


Reordering due to compiler optimizations.  CPU reordering does not
affect interactions with interrupt handlers on a given CPU, but
reordering due to compiler code-movement optimization does.  Since
volatile can in some cases suppress code-movement optimizations,
it can affect interactions with interrupt handlers.


If such reordering matters, then you should use one of the
*mb macros or barrier() rather than relying on possibly
hidden volatile cast.



If communicating among CPUs, sure.  However, when communicating between
mainline and interrupt/NMI handlers on the same CPU, the barrier() and
most expecially the *mb() macros are gross overkill.  So there really
truly is a place for volatile -- not a large place, to be sure, but a
place nonetheless.


I really would like all volatile users to go away and be replaced
by explicit barriers. It makes things nicer and more explicit... for
atomic_t type there probably aren't many optimisations that can be
made which volatile would disallow (in actual kernel code), but for
others (eg. bitops, maybe atomic ops in UP kernels), there would be.

Maybe it is the safe way to go, but it does obscure cases where there
is a real need for barriers.

Many atomic operations are allowed to be reordered between CPUs, so
I don't have a good idea for the rationale to order them within the
CPU (also loads and stores to long and ptr types are not ordered like
this, although we do consider those to be atomic operations too).

barrier() in a way is like enforcing sequential memory ordering
between process and interrupt context, wheras volatile is just
enforcing coherency of a single memory location (and as such is
cheaper).

What do you think of this crazy idea?

/* Enforce a compiler barrier for only operations to location X.
 * Call multiple times to provide an ordering between multiple
 * memory locations. Other memory operations can be assumed by
 * the compiler to remain unchanged and may be reordered
 */
#define order(x) asm volatile( : +m (x))

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-13 Thread Nick Piggin

Chris Snook wrote:

David Howells wrote:


Chris Snook [EMAIL PROTECTED] wrote:

cpu_relax() contains a barrier, so it should do the right thing.  For 
non-smp
architectures, I'm concerned about interacting with interrupt 
handlers.  Some

drivers do use atomic_* operations.



I'm not sure that actually answers my question.  Why not smp_rmb()?

David



I would assume because we want to waste time efficiently even on non-smp 
architectures, rather than frying the CPU or draining the battery.  
Certain looping execution patterns can cause the CPU to operate above 
thermal design power.  I have fans on my workstation that only ever come 
on when running LINPACK, and that's generally memory bandwidth-bound.  
Just imagine what happens when you're executing the same few 
non-serializing instructions in a tight loop without ever stalling on 
memory fetches, or being scheduled out.


If there's another reason, I'd like to hear it too, because I'm just 
guessing here.


Well if there is only one memory location involved, then smp_rmb() isn't
going to really do anything anyway, so it would be incorrect to use it.

Consider that smp_rmb basically will do anything from flushing the
pipeline to invalidating loads speculatively executed out of order. AFAIK
it will not control the visibility of stores coming from other CPUs (that
is up to the cache coherency).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-12 Thread Herbert Xu
Paul E. McKenney [EMAIL PROTECTED] wrote:
 On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
 Chris Snook [EMAIL PROTECTED] wrote:
  
  cpu_relax() contains a barrier, so it should do the right thing.  For 
  non-smp architectures, I'm concerned about interacting with interrupt 
  handlers.  Some drivers do use atomic_* operations.
 
 What problems with interrupt handlers? Access to int/long must
 be atomic or we're in big trouble anyway.
 
 Reordering due to compiler optimizations.  CPU reordering does not
 affect interactions with interrupt handlers on a given CPU, but
 reordering due to compiler code-movement optimization does.  Since
 volatile can in some cases suppress code-movement optimizations,
 it can affect interactions with interrupt handlers.

If such reordering matters, then you should use one of the
*mb macros or barrier() rather than relying on possibly
hidden volatile cast.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-11 Thread David Howells
Chris Snook [EMAIL PROTECTED] wrote:

 cpu_relax() contains a barrier, so it should do the right thing.  For non-smp
 architectures, I'm concerned about interacting with interrupt handlers.  Some
 drivers do use atomic_* operations.

I'm not sure that actually answers my question.  Why not smp_rmb()?

David
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-10 Thread David Howells
Chris Snook [EMAIL PROTECTED] wrote:

 To head off the criticism, I admit this is an oversimplification, and true
 busy-waiters should be using cpu_relax(), which contains a barrier.

Why would you want to use cpu_relax()?  That's there to waste time efficiently,
isn't it?  Shouldn't you be using smp_rmb() or something like that?

David
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-10 Thread Chris Snook

David Howells wrote:

Chris Snook [EMAIL PROTECTED] wrote:


To head off the criticism, I admit this is an oversimplification, and true
busy-waiters should be using cpu_relax(), which contains a barrier.


Why would you want to use cpu_relax()?  That's there to waste time efficiently,
isn't it?  Shouldn't you be using smp_rmb() or something like that?

David


cpu_relax() contains a barrier, so it should do the right thing.  For 
non-smp architectures, I'm concerned about interacting with interrupt 
handlers.  Some drivers do use atomic_* operations.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-10 Thread Herbert Xu
Chris Snook [EMAIL PROTECTED] wrote:
 
 cpu_relax() contains a barrier, so it should do the right thing.  For 
 non-smp architectures, I'm concerned about interacting with interrupt 
 handlers.  Some drivers do use atomic_* operations.

What problems with interrupt handlers? Access to int/long must
be atomic or we're in big trouble anyway.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-10 Thread Paul E. McKenney
On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
 Chris Snook [EMAIL PROTECTED] wrote:
  
  cpu_relax() contains a barrier, so it should do the right thing.  For 
  non-smp architectures, I'm concerned about interacting with interrupt 
  handlers.  Some drivers do use atomic_* operations.
 
 What problems with interrupt handlers? Access to int/long must
 be atomic or we're in big trouble anyway.

Reordering due to compiler optimizations.  CPU reordering does not
affect interactions with interrupt handlers on a given CPU, but
reordering due to compiler code-movement optimization does.  Since
volatile can in some cases suppress code-movement optimizations,
it can affect interactions with interrupt handlers.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-09 Thread Chris Snook
From: Chris Snook [EMAIL PROTECTED]

Make atomic_read() volatile on frv.  This ensures that busy-waiting
for an interrupt handler to change an atomic_t won't get compiled to an
infinite loop, consistent with SMP architectures.

Signed-off-by: Chris Snook [EMAIL PROTECTED]

--- linux-2.6.23-rc2-orig/include/asm-frv/atomic.h  2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-frv/atomic.h   2007-08-09 06:41:48.0 
-0400
@@ -40,7 +40,12 @@ typedef struct {
 } atomic_t;
 
 #define ATOMIC_INIT(i) { (i) }
-#define atomic_read(v) ((v)-counter)
+
+/*
+ * Casting to volatile here minimizes the need for barriers,
+ * without having to declare the type itself as volatile.
+ */
+#define atomic_read(v) (*(volatile int *)(v)-counter)
 #define atomic_set(v, i)   (((v)-counter) = (i))
 
 #ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-09 Thread Chris Snook

Chris Snook wrote:

From: Chris Snook [EMAIL PROTECTED]

Make atomic_read() volatile on frv.  This ensures that busy-waiting
for an interrupt handler to change an atomic_t won't get compiled to an
infinite loop, consistent with SMP architectures.


To head off the criticism, I admit this is an oversimplification, and true 
busy-waiters should be using cpu_relax(), which contains a barrier.  As 
discussed in recent threads, there are other cases which can be optimized by 
removing the need for a barrier, and having behavior consistent with 
architectures where the benefit is more profound is also valuable.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html