Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Thursday 16 August 2007 01:39, Satyam Sharma wrote:

  static inline void wait_for_init_deassert(atomic_t *deassert)
  {
 - while (!atomic_read(deassert));
 + while (!atomic_read(deassert))
 + cpu_relax();
   return;
  }

For less-than-briliant people like me, it's totally non-obvious that
cpu_relax() is needed for correctness here, not just to make P4 happy.

IOW: atomic_read name quite unambiguously means I will read
this variable from main memory. Which is not true and creates
potential for confusion and bugs.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Kenn Humborg
 On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
 
   static inline void wait_for_init_deassert(atomic_t *deassert)
   {
  -   while (!atomic_read(deassert));
  +   while (!atomic_read(deassert))
  +   cpu_relax();
  return;
   }
 
 For less-than-briliant people like me, it's totally non-obvious that
 cpu_relax() is needed for correctness here, not just to make P4 happy.
 
 IOW: atomic_read name quite unambiguously means I will read
 this variable from main memory. Which is not true and creates
 potential for confusion and bugs.

To me, atomic_read means a read which is synchronized with other 
changes to the variable (using the atomic_XXX functions) in such 
a way that I will always only see the before or after
state of the variable - never an intermediate state while a 
modification is happening.  It doesn't imply that I have to 
see the after state immediately after another thread modifies
it.

Perhaps the Linux atomic_XXX functions work like that, or used
to work like that, but it's counter-intuitive to me that atomic
should imply a memory read.

Later,
Kenn

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Andi Kleen
On Friday 24 August 2007 13:59:32 Denys Vlasenko wrote:
 On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
 
   static inline void wait_for_init_deassert(atomic_t *deassert)
   {
  -   while (!atomic_read(deassert));
  +   while (!atomic_read(deassert))
  +   cpu_relax();
  return;
   }
 
 For less-than-briliant people like me, it's totally non-obvious that
 cpu_relax() is needed for correctness here, not just to make P4 happy.

I find it also non obvious. It would be really better to have a barrier
or equivalent (volatile or variable clobber) in the atomic_read()
 
 IOW: atomic_read name quite unambiguously means I will read
 this variable from main memory. Which is not true and creates
 potential for confusion and bugs.

Agreed.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Satyam Sharma
Hi Denys,


On Fri, 24 Aug 2007, Denys Vlasenko wrote:

 On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
 
   static inline void wait_for_init_deassert(atomic_t *deassert)
   {
  -   while (!atomic_read(deassert));
  +   while (!atomic_read(deassert))
  +   cpu_relax();
  return;
   }
 
 For less-than-briliant people like me, it's totally non-obvious that
 cpu_relax() is needed for correctness here, not just to make P4 happy.

This thread has been round-and-round with exactly the same discussions
:-) I had proposed few such variants to make a compiler barrier implicit
in atomic_{read,set} myself, but frankly, at least personally speaking
(now that I know better), I'm not so much in favour of implicit barriers
(compiler, memory or both) in atomic_{read,set}.

This might sound like an about-turn if you read my own postings to Nick
Piggin from a week back, but I do agree with most his opinions on the
matter now -- separation of barriers from atomic ops is actually good,
beneficial to certain code that knows what it's doing, explicit usage
of barriers stands out more clearly (most people here who deal with it
do know cpu_relax() is an explicit compiler barrier) compared to an
implicit usage in an atomic_read() or such variant ...


 IOW: atomic_read name quite unambiguously means I will read
 this variable from main memory. Which is not true and creates
 potential for confusion and bugs.

I'd have to disagree here -- atomic ops are all about _atomicity_ of
memory accesses, not _making_ them happen (or visible to other CPUs)
_then and there_ itself. The latter are the job of barriers.

The behaviour (and expectations) are quite comprehensively covered in
atomic_ops.txt -- let alone atomic_{read,set}, even atomic_{inc,dec}
are permitted by archs' implementations to _not_ have any memory
barriers, for that matter. [It is unrelated that on x86 making them
SMP-safe requires the use of the LOCK prefix that also happens to be
an implicit memory barrier.]

An argument was also made about consistency of atomic_{read,set} w.r.t.
the other atomic ops -- but clearly, they are all already consistent!
All of them are atomic :-) The fact that atomic_{read,set} do _not_
require any inline asm or LOCK prefix whereas the others do, has to do
with the fact that unlike all others, atomic_{read,set} are not RMW ops
and hence guaranteed to be atomic just as they are in plain  simple C.

But if people do seem to have a mixed / confused notion of atomicity
and barriers, and if there's consensus, then as I'd said earlier, I
have no issues in going with the consensus (eg. having API variants).
Linus would be more difficult to convince, however, I suspect :-)


Satyam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 13:12, Kenn Humborg wrote:
  On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
static inline void wait_for_init_deassert(atomic_t *deassert)
{
   - while (!atomic_read(deassert));
   + while (!atomic_read(deassert))
   + cpu_relax();
 return;
}
 
  For less-than-briliant people like me, it's totally non-obvious that
  cpu_relax() is needed for correctness here, not just to make P4 happy.
 
  IOW: atomic_read name quite unambiguously means I will read
  this variable from main memory. Which is not true and creates
  potential for confusion and bugs.

 To me, atomic_read means a read which is synchronized with other
 changes to the variable (using the atomic_XXX functions) in such
 a way that I will always only see the before or after
 state of the variable - never an intermediate state while a
 modification is happening.  It doesn't imply that I have to
 see the after state immediately after another thread modifies
 it.

So you are ok with compiler propagating n1 to n2 here:

n1 += atomic_read(x);
other_variable++;
n2 += atomic_read(x);

without accessing x second time. What's the point? Any sane coder
will say that explicitly anyway:

tmp = atomic_read(x);
n1 += tmp;
other_variable++;
n2 += tmp;

if only for the sake of code readability. Because first code
is definitely hinting that it reads RAM twice, and it's actively *bad*
for code readability when in fact it's not the case!

Locking, compiler and CPU barriers are complicated enough already,
please don't make them even harder to understand.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Luck, Tony
  static inline void wait_for_init_deassert(atomic_t *deassert)
  {
 -while (!atomic_read(deassert));
 +while (!atomic_read(deassert))
 +cpu_relax();
  return;
  }

 For less-than-briliant people like me, it's totally non-obvious that
 cpu_relax() is needed for correctness here, not just to make P4 happy.

Not just P4 ... there are other threaded cpus where it is useful to
let the core know that this is a busy loop so it would be a good thing
to let other threads have priority.

Even on a non-threaded cpu the cpu_relax() could be useful in the
future to hint to the cpu that it could drop into a lower power
hogging state.

But I agree with your main point that the loop without the cpu_relax()
looks like it ought to work because atomic_read() ought to actually
go out and read memory each time around the loop.

-Tony
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Christoph Lameter
On Fri, 24 Aug 2007, Satyam Sharma wrote:

 But if people do seem to have a mixed / confused notion of atomicity
 and barriers, and if there's consensus, then as I'd said earlier, I
 have no issues in going with the consensus (eg. having API variants).
 Linus would be more difficult to convince, however, I suspect :-)

The confusion may be the result of us having barrier semantics in 
atomic_read. If we take that out then we may avoid future confusions.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Linus Torvalds


On Fri, 24 Aug 2007, Denys Vlasenko wrote:
 
 So you are ok with compiler propagating n1 to n2 here:
 
 n1 += atomic_read(x);
 other_variable++;
 n2 += atomic_read(x);
 
 without accessing x second time. What's the point? Any sane coder
 will say that explicitly anyway:

No.

This is a common mistake, and it's total crap.

Any sane coder will often use inline functions, macros, etc helpers to 
do certain abstract things. Those things may contain atomic_read() 
calls.

The biggest reason for compilers doing CSE is exactly the fact that many 
opportunities for CSE simple *are*not*visible* on a source code level. 

That is true of things like atomic_read() equally as to things like shared 
offsets inside structure member accesses. No difference what-so-ever.

Yes, we have, traditionally, tried to make it *easy* for the compiler to 
generate good code. So when we can, and when we look at performance for 
some really hot path, we *will* write the source code so that the compiler 
doesn't even have the option to screw it up, and that includes things like 
doing CSE at a source code level so that we don't see the compiler 
re-doing accesses unnecessarily.

And I'm not saying we shouldn't do that. But performance is not an 
either-or kind of situation, and we should:

 - spend the time at a source code level: make it reasonably easy for the 
   compiler to generate good code, and use the right algorithms at a 
   higher level (and order structures etc so that they have good cache 
   behaviour).

 - .. *and* expect the compiler to handle the cases we didn't do by hand
   pretty well anyway. In particular, quite often, abstraction levels at a 
   software level means that we give compilers stupid code, because some 
   function may have a certain high-level abstraction rule, but then on a 
   particular architecture it's actually a no-op, and the compiler should 
   get to untangle our stupid code and generate good end results.

 - .. *and* expect the hardware to be sane and do a good job even when the 
   compiler didn't generate perfect code or there were unlucky cache miss
   patterns etc.

and if we do all of that, we'll get good performance. But you really do 
want all three levels. It's not enough to be good at any one level (or 
even any two).

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:06, Christoph Lameter wrote:
 On Fri, 24 Aug 2007, Satyam Sharma wrote:
  But if people do seem to have a mixed / confused notion of atomicity
  and barriers, and if there's consensus, then as I'd said earlier, I
  have no issues in going with the consensus (eg. having API variants).
  Linus would be more difficult to convince, however, I suspect :-)

 The confusion may be the result of us having barrier semantics in
 atomic_read. If we take that out then we may avoid future confusions.

I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);// doesnt hint as strongly at reading as atomic_read
n = atomic_fetch(x);// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Chris Snook

Denys Vlasenko wrote:

On Friday 24 August 2007 18:06, Christoph Lameter wrote:

On Fri, 24 Aug 2007, Satyam Sharma wrote:

But if people do seem to have a mixed / confused notion of atomicity
and barriers, and if there's consensus, then as I'd said earlier, I
have no issues in going with the consensus (eg. having API variants).
Linus would be more difficult to convince, however, I suspect :-)

The confusion may be the result of us having barrier semantics in
atomic_read. If we take that out then we may avoid future confusions.


I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);// doesnt hint as strongly at reading as atomic_read
n = atomic_fetch(x);// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?


atomic_value() vs. atomic_fetch() should be rather unambiguous. 
atomic_read_uncached() begs the question of precisely which cache we are 
avoiding, and could itself cause confusion.


So, if I were writing atomic.h from scratch, knowing what I know now, I think 
I'd use atomic_value() and atomic_fetch().  The problem is that there are a lot 
of existing users of atomic_read(), and we can't write a script to correctly 
guess their intent.  I'm not sure auditing all uses of atomic_read() is really 
worth the comparatively miniscule benefits.


We could play it safe and convert them all to atomic_fetch(), or we could 
acknowledge that changing the semantics 8 months ago was not at all disastrous, 
and make them all atomic_value(), allowing people to use atomic_fetch() where 
they really care.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html