Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Thursday 16 August 2007 01:39, Satyam Sharma wrote: static inline void wait_for_init_deassert(atomic_t *deassert) { - while (!atomic_read(deassert)); + while (!atomic_read(deassert)) + cpu_relax(); return; } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. IOW: atomic_read name quite unambiguously means I will read this variable from main memory. Which is not true and creates potential for confusion and bugs. -- vda - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Thursday 16 August 2007 01:39, Satyam Sharma wrote: static inline void wait_for_init_deassert(atomic_t *deassert) { - while (!atomic_read(deassert)); + while (!atomic_read(deassert)) + cpu_relax(); return; } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. IOW: atomic_read name quite unambiguously means I will read this variable from main memory. Which is not true and creates potential for confusion and bugs. To me, atomic_read means a read which is synchronized with other changes to the variable (using the atomic_XXX functions) in such a way that I will always only see the before or after state of the variable - never an intermediate state while a modification is happening. It doesn't imply that I have to see the after state immediately after another thread modifies it. Perhaps the Linux atomic_XXX functions work like that, or used to work like that, but it's counter-intuitive to me that atomic should imply a memory read. Later, Kenn - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Friday 24 August 2007 13:59:32 Denys Vlasenko wrote: On Thursday 16 August 2007 01:39, Satyam Sharma wrote: static inline void wait_for_init_deassert(atomic_t *deassert) { - while (!atomic_read(deassert)); + while (!atomic_read(deassert)) + cpu_relax(); return; } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. I find it also non obvious. It would be really better to have a barrier or equivalent (volatile or variable clobber) in the atomic_read() IOW: atomic_read name quite unambiguously means I will read this variable from main memory. Which is not true and creates potential for confusion and bugs. Agreed. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
Hi Denys, On Fri, 24 Aug 2007, Denys Vlasenko wrote: On Thursday 16 August 2007 01:39, Satyam Sharma wrote: static inline void wait_for_init_deassert(atomic_t *deassert) { - while (!atomic_read(deassert)); + while (!atomic_read(deassert)) + cpu_relax(); return; } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. This thread has been round-and-round with exactly the same discussions :-) I had proposed few such variants to make a compiler barrier implicit in atomic_{read,set} myself, but frankly, at least personally speaking (now that I know better), I'm not so much in favour of implicit barriers (compiler, memory or both) in atomic_{read,set}. This might sound like an about-turn if you read my own postings to Nick Piggin from a week back, but I do agree with most his opinions on the matter now -- separation of barriers from atomic ops is actually good, beneficial to certain code that knows what it's doing, explicit usage of barriers stands out more clearly (most people here who deal with it do know cpu_relax() is an explicit compiler barrier) compared to an implicit usage in an atomic_read() or such variant ... IOW: atomic_read name quite unambiguously means I will read this variable from main memory. Which is not true and creates potential for confusion and bugs. I'd have to disagree here -- atomic ops are all about _atomicity_ of memory accesses, not _making_ them happen (or visible to other CPUs) _then and there_ itself. The latter are the job of barriers. The behaviour (and expectations) are quite comprehensively covered in atomic_ops.txt -- let alone atomic_{read,set}, even atomic_{inc,dec} are permitted by archs' implementations to _not_ have any memory barriers, for that matter. [It is unrelated that on x86 making them SMP-safe requires the use of the LOCK prefix that also happens to be an implicit memory barrier.] An argument was also made about consistency of atomic_{read,set} w.r.t. the other atomic ops -- but clearly, they are all already consistent! All of them are atomic :-) The fact that atomic_{read,set} do _not_ require any inline asm or LOCK prefix whereas the others do, has to do with the fact that unlike all others, atomic_{read,set} are not RMW ops and hence guaranteed to be atomic just as they are in plain simple C. But if people do seem to have a mixed / confused notion of atomicity and barriers, and if there's consensus, then as I'd said earlier, I have no issues in going with the consensus (eg. having API variants). Linus would be more difficult to convince, however, I suspect :-) Satyam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Friday 24 August 2007 13:12, Kenn Humborg wrote: On Thursday 16 August 2007 01:39, Satyam Sharma wrote: static inline void wait_for_init_deassert(atomic_t *deassert) { - while (!atomic_read(deassert)); + while (!atomic_read(deassert)) + cpu_relax(); return; } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. IOW: atomic_read name quite unambiguously means I will read this variable from main memory. Which is not true and creates potential for confusion and bugs. To me, atomic_read means a read which is synchronized with other changes to the variable (using the atomic_XXX functions) in such a way that I will always only see the before or after state of the variable - never an intermediate state while a modification is happening. It doesn't imply that I have to see the after state immediately after another thread modifies it. So you are ok with compiler propagating n1 to n2 here: n1 += atomic_read(x); other_variable++; n2 += atomic_read(x); without accessing x second time. What's the point? Any sane coder will say that explicitly anyway: tmp = atomic_read(x); n1 += tmp; other_variable++; n2 += tmp; if only for the sake of code readability. Because first code is definitely hinting that it reads RAM twice, and it's actively *bad* for code readability when in fact it's not the case! Locking, compiler and CPU barriers are complicated enough already, please don't make them even harder to understand. -- vda - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
static inline void wait_for_init_deassert(atomic_t *deassert) { -while (!atomic_read(deassert)); +while (!atomic_read(deassert)) +cpu_relax(); return; } For less-than-briliant people like me, it's totally non-obvious that cpu_relax() is needed for correctness here, not just to make P4 happy. Not just P4 ... there are other threaded cpus where it is useful to let the core know that this is a busy loop so it would be a good thing to let other threads have priority. Even on a non-threaded cpu the cpu_relax() could be useful in the future to hint to the cpu that it could drop into a lower power hogging state. But I agree with your main point that the loop without the cpu_relax() looks like it ought to work because atomic_read() ought to actually go out and read memory each time around the loop. -Tony - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Fri, 24 Aug 2007, Satyam Sharma wrote: But if people do seem to have a mixed / confused notion of atomicity and barriers, and if there's consensus, then as I'd said earlier, I have no issues in going with the consensus (eg. having API variants). Linus would be more difficult to convince, however, I suspect :-) The confusion may be the result of us having barrier semantics in atomic_read. If we take that out then we may avoid future confusions. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Fri, 24 Aug 2007, Denys Vlasenko wrote: So you are ok with compiler propagating n1 to n2 here: n1 += atomic_read(x); other_variable++; n2 += atomic_read(x); without accessing x second time. What's the point? Any sane coder will say that explicitly anyway: No. This is a common mistake, and it's total crap. Any sane coder will often use inline functions, macros, etc helpers to do certain abstract things. Those things may contain atomic_read() calls. The biggest reason for compilers doing CSE is exactly the fact that many opportunities for CSE simple *are*not*visible* on a source code level. That is true of things like atomic_read() equally as to things like shared offsets inside structure member accesses. No difference what-so-ever. Yes, we have, traditionally, tried to make it *easy* for the compiler to generate good code. So when we can, and when we look at performance for some really hot path, we *will* write the source code so that the compiler doesn't even have the option to screw it up, and that includes things like doing CSE at a source code level so that we don't see the compiler re-doing accesses unnecessarily. And I'm not saying we shouldn't do that. But performance is not an either-or kind of situation, and we should: - spend the time at a source code level: make it reasonably easy for the compiler to generate good code, and use the right algorithms at a higher level (and order structures etc so that they have good cache behaviour). - .. *and* expect the compiler to handle the cases we didn't do by hand pretty well anyway. In particular, quite often, abstraction levels at a software level means that we give compilers stupid code, because some function may have a certain high-level abstraction rule, but then on a particular architecture it's actually a no-op, and the compiler should get to untangle our stupid code and generate good end results. - .. *and* expect the hardware to be sane and do a good job even when the compiler didn't generate perfect code or there were unlucky cache miss patterns etc. and if we do all of that, we'll get good performance. But you really do want all three levels. It's not enough to be good at any one level (or even any two). Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
On Friday 24 August 2007 18:06, Christoph Lameter wrote: On Fri, 24 Aug 2007, Satyam Sharma wrote: But if people do seem to have a mixed / confused notion of atomicity and barriers, and if there's consensus, then as I'd said earlier, I have no issues in going with the consensus (eg. having API variants). Linus would be more difficult to convince, however, I suspect :-) The confusion may be the result of us having barrier semantics in atomic_read. If we take that out then we may avoid future confusions. I think better name may help. Nuke atomic_read() altogether. n = atomic_value(x);// doesnt hint as strongly at reading as atomic_read n = atomic_fetch(x);// yes, we _do_ touch RAM n = atomic_read_uncached(x); // or this How does that sound? -- vda - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()
Denys Vlasenko wrote: On Friday 24 August 2007 18:06, Christoph Lameter wrote: On Fri, 24 Aug 2007, Satyam Sharma wrote: But if people do seem to have a mixed / confused notion of atomicity and barriers, and if there's consensus, then as I'd said earlier, I have no issues in going with the consensus (eg. having API variants). Linus would be more difficult to convince, however, I suspect :-) The confusion may be the result of us having barrier semantics in atomic_read. If we take that out then we may avoid future confusions. I think better name may help. Nuke atomic_read() altogether. n = atomic_value(x);// doesnt hint as strongly at reading as atomic_read n = atomic_fetch(x);// yes, we _do_ touch RAM n = atomic_read_uncached(x); // or this How does that sound? atomic_value() vs. atomic_fetch() should be rather unambiguous. atomic_read_uncached() begs the question of precisely which cache we are avoiding, and could itself cause confusion. So, if I were writing atomic.h from scratch, knowing what I know now, I think I'd use atomic_value() and atomic_fetch(). The problem is that there are a lot of existing users of atomic_read(), and we can't write a script to correctly guess their intent. I'm not sure auditing all uses of atomic_read() is really worth the comparatively miniscule benefits. We could play it safe and convert them all to atomic_fetch(), or we could acknowledge that changing the semantics 8 months ago was not at all disastrous, and make them all atomic_value(), allowing people to use atomic_fetch() where they really care. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html