Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly
On Thu, May 24, 2018 at 10:18:44AM +, Christophe Leroy wrote: > On 05/24/2018 06:20 AM, Christophe LEROY wrote: > >Le 23/05/2018 à 20:34, Segher Boessenkool a écrit : > >>On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote: > >>>The generic csum_ipv6_magic() generates a pretty bad result > >> > >> > >> > >>Please try with a more recent compiler, what you used is pretty ancient. > >>It's not like recent compilers do great on this either, but it's not > >>*that* bad anymore ;-) > > Here is what I get with GCC 8.1 > It doesn't look much better, does it ? There are no more mfocrf, which is a big speedup. Other than that it is pretty lousy still, I totally agree. This improvement happened quite a while ago, it's fixed in GCC 6. Segher
Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly
On Thu, May 24, 2018 at 08:20:16AM +0200, Christophe LEROY wrote: > Le 23/05/2018 à 20:34, Segher Boessenkool a écrit : > >On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote: > >>+_GLOBAL(csum_ipv6_magic) > >>+ lwz r8, 0(r3) > >>+ lwz r9, 4(r3) > >>+ lwz r10, 8(r3) > >>+ lwz r11, 12(r3) > >>+ addcr0, r5, r6 > >>+ adder0, r0, r7 > >>+ adder0, r0, r8 > >>+ adder0, r0, r9 > >>+ adder0, r0, r10 > >>+ adder0, r0, r11 > >>+ lwz r8, 0(r4) > >>+ lwz r9, 4(r4) > >>+ lwz r10, 8(r4) > >>+ lwz r11, 12(r4) > >>+ adder0, r0, r8 > >>+ adder0, r0, r9 > >>+ adder0, r0, r10 > >>+ adder0, r0, r11 > >>+ addze r0, r0 > >>+ rotlwi r3, r0, 16 > >>+ add r3, r0, r3 > >>+ not r3, r3 > >>+ rlwinm r3, r3, 16, 16, 31 > >>+ blr > >>+EXPORT_SYMBOL(csum_ipv6_magic) > > > >Clustering the loads and carry insns together is pretty much the worst you > >can do on most 32-bit CPUs. > > Oh, really ? __csum_partial is written that way too. I thought I told you about this before? Maybe not. > Right, now I tried interleaving the lwz and adde. I get no improvment at > all on a 885, but I get a 15% improvment on a 8321. It won't likely help on single-issue cores (like the one 885 has), yes. Segher
Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly
On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote: > The generic csum_ipv6_magic() generates a pretty bad result Please try with a more recent compiler, what you used is pretty ancient. It's not like recent compilers do great on this either, but it's not *that* bad anymore ;-) > --- a/arch/powerpc/lib/checksum_32.S > +++ b/arch/powerpc/lib/checksum_32.S > @@ -293,3 +293,36 @@ dst_error: > EX_TABLE(51b, dst_error); > > EXPORT_SYMBOL(csum_partial_copy_generic) > + > +/* > + * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr, > + * const struct in6_addr *daddr, > + * __u32 len, __u8 proto, __wsum sum) > + */ > + > +_GLOBAL(csum_ipv6_magic) > + lwz r8, 0(r3) > + lwz r9, 4(r3) > + lwz r10, 8(r3) > + lwz r11, 12(r3) > + addcr0, r5, r6 > + adder0, r0, r7 > + adder0, r0, r8 > + adder0, r0, r9 > + adder0, r0, r10 > + adder0, r0, r11 > + lwz r8, 0(r4) > + lwz r9, 4(r4) > + lwz r10, 8(r4) > + lwz r11, 12(r4) > + adder0, r0, r8 > + adder0, r0, r9 > + adder0, r0, r10 > + adder0, r0, r11 > + addze r0, r0 > + rotlwi r3, r0, 16 > + add r3, r0, r3 > + not r3, r3 > + rlwinm r3, r3, 16, 16, 31 > + blr > +EXPORT_SYMBOL(csum_ipv6_magic) Clustering the loads and carry insns together is pretty much the worst you can do on most 32-bit CPUs. Segher
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
And no, RMW on MMIO isn't problematic at all, either. An RMW op is a read op, a modify op, and a write op, all rolled into one opcode. But three actual operations. Maybe for some CPUs, but not all. ARM for instance can't use the load exclusive and store exclusive instructions to MMIO space. Sure, your CPU doesn't have RMW instructions -- how to emulate those if you don't have them is a totally different thing. Let me say it more clearly: On ARM, it is impossible to perform atomic operations on MMIO space. It's all completely beside the point, see the other subthread, but... Yeah, you can't do LL/SC to MMIO space; ARM isn't alone in that. You could still implement atomic operations on MMIO space by taking a lock elsewhere, in normal cacheable memory space. Why you would do this is a separate question, you probably don't want it :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Let me say it more clearly: On ARM, it is impossible to perform atomic operations on MMIO space. Actually, no one is suggesting that we try to do that at all. The discussion about RMW ops on MMIO space started with a comment attributed to the gcc developers that one reason why gcc on x86 doesn't use instructions that do RMW ops on volatile variables is that volatile is used to mark MMIO addresses, and there was some uncertainty about whether (non-atomic) RMW ops on x86 could be used on MMIO. This is in regard to the question about why gcc on x86 always moves a volatile variable into a register before doing anything to it. This question is GCC PR33102, which was incorrectly closed as a duplicate of PR3506 -- and *that* PR was closed because its reporter seemed to claim the GCC generated code for an increment on a volatile (namely, three machine instructions: load, modify, store) was incorrect, and it has to be one machine instruction. So the whole discussion is irrelevant to ARM, PowerPC and any other architecture except x86[-64]. And even there, it's not something the kernel can take advantage of before GCC 4.4 is in widespread use, if then. Let's move on. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
At some point in the future, barrier() will be universally regarded as a hammer too big for most purposes. Whether or not removing it now You can't just remove it, it is needed in some places; you want to replace it in most places with a more fine-grained compiler barrier, I presume? constitutes premature optimization is arguable, but I think we should allow such optimization to happen (or not happen) in architecture-dependent code, and provide a consistent API that doesn't require the use of such things in arch-independent code where it might turn into a totally superfluous performance killer depending on what hardware it gets compiled for. Explicit barrier()s won't be too hard to replace -- but what to do about the implicit barrier()s in rmb() etc. etc. -- *those* will be hard to get rid of, if only because it is hard enough to teach driver authors about how to use those primitives *already*. It is far from clear what a good interface like that would look like, anyway. Probably we should first start experimenting with a forget()-style micro-barrier (but please, find a better name), and see if a nice usage pattern shows up that can be turned into an API. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Right. ROTFL... volatile actually breaks atomic_t instead of making it safe. x++ becomes a register load, increment and a register store. Without volatile we can increment the memory directly. It seems that volatile requires that the variable is loaded into a register first and then operated upon. Understandable when you think about volatile being used to access memory mapped I/O registers where a RMW operation could be problematic. So, if we want consistent behavior, we're pretty much screwed unless we use inline assembler everywhere? Nah, this whole argument is flawed -- without volatile we still *cannot* increment the memory directly. On x86, you need a lock prefix; on other archs, some other mechanism to make the memory increment an *atomic* memory increment. And no, RMW on MMIO isn't problematic at all, either. An RMW op is a read op, a modify op, and a write op, all rolled into one opcode. But three actual operations. The advantages of asm code for atomic_{read,set} are: 1) all the other atomic ops are implemented that way already; 2) you have full control over the asm insns selected, in particular, you can guarantee you *do* get an atomic op; 3) you don't need to use volatile data which generates not-all-that-good code on archs like x86, and we want to get rid of it anyway since it is problematic in many ways; 4) you don't need to use *(volatile type*)data, which a) doesn't exist in C; b) isn't documented or supported in GCC; c) has a recent history of bugginess; d) _still uses volatile objects_; e) _still_ is problematic in almost all those same ways as in 3); 5) you can mix atomic and non-atomic accesses to the atomic_t, which you cannot with the other alternatives. The only disadvantage I know of is potentially slightly worse instruction scheduling. This is a generic asm() problem: GCC cannot see what actual insns are inside the asm() block. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Such code generally doesn't care precisely when it gets the update, just that the update is atomic, and it doesn't loop forever. Yes, it _does_ care that it gets the update _at all_, and preferably as early as possible. Regardless, I'm convinced we just need to do it all in assembly. So do you want volatile asm or plain asm, for atomic_read()? The asm version has two ways to go about it too... Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
And no, RMW on MMIO isn't problematic at all, either. An RMW op is a read op, a modify op, and a write op, all rolled into one opcode. But three actual operations. Maybe for some CPUs, but not all. ARM for instance can't use the load exclusive and store exclusive instructions to MMIO space. Sure, your CPU doesn't have RMW instructions -- how to emulate those if you don't have them is a totally different thing. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler optimisations an acceptable price to pay for some peace of mind. So why is this a good tradeoff? It certainly is better than disabling all compiler optimisations! It's easy to be better than something really stupid :) Sure, it wasn't me who made the comparison though. So i386 and x86-64 don't have volatiles there, and it saves them a few K of kernel text. Which has to be investigated. A few kB is a lot more than expected. What you need to justify is why it is a good tradeoff to make them volatile (which btw, is much harder to go the other way after we let people make those assumptions). My point is that people *already* made those assumptions. There are two ways to clean up this mess: 1) Have the volatile semantics by default, change the users that don't need it; 2) Have non-volatile semantics by default, change the users that do need it. Option 2) randomly breaks stuff all over the place, option 1) doesn't. Yeah 1) could cause some extremely minor speed or code size regression, but only temporarily until everything has been audited. I also think that just adding things to APIs in the hope it might fix up some bugs isn't really a good road to go down. Where do you stop? I look at it the other way: keeping the volatile semantics in atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs; Yeah, but we could add lots of things to help prevent bugs and would never be included. I would also contend that it helps _hide_ bugs and encourages people to be lazy when thinking about these things. Sure. We aren't _adding_ anything here though, not on the platforms where it is most likely to show up, anyway. Also, you dismiss the fact that we'd actually be *adding* volatile semantics back to the 2 most widely tested architectures (in terms of test time, number of testers, variety of configurations, and coverage of driver code). I'm not dismissing that. x86 however is one of the few architectures where mistakenly leaving out a volatile will not easily show up on user testing, since the compiler will very often produce a memory reference anyway because it has no registers to play with. This is a very important different from just keeping volatile semantics because it is basically a one-way API change. That's a good point. Maybe we should create _two_ new APIs, one explicitly going each way. certainly most people expect that behaviour, and also that behaviour is *needed* in some places and no other interface provides that functionality. I don't know that most people would expect that behaviour. I didn't conduct any formal poll either :-) Is there any documentation anywhere that would suggest this? Not really I think, no. But not the other way around, either. Most uses of it seem to expect it though. [some confusion about barriers wrt atomics snipped] What were you confused about? Me? Not much. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Of course, since *normal* accesses aren't necessarily limited wrt re-ordering, the question then becomes one of with regard to *what* does it limit re-ordering?. A C compiler that re-orders two different volatile accesses that have a sequence point in between them is pretty clearly a buggy compiler. So at a minimum, it limits re-ordering wrt other volatiles (assuming sequence points exists). It also means that the compiler cannot move it speculatively across conditionals, but other than that it's starting to get fuzzy. This is actually really well-defined in C, not fuzzy at all. Volatile accesses are a side effect, and no side effects can be reordered with respect to sequence points. The side effects that matter in the kernel environment are: 1) accessing a volatile object; 2) modifying an object; 3) volatile asm(); 4) calling a function that does any of these. We certainly should avoid volatile whenever possible, but because it's fuzzy wrt reordering is not a reason -- all alternatives have exactly the same issues. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
(and yes, it is perfectly legitimate to want a non-volatile read for a data type that you also want to do atomic RMW operations on) ...which is undefined behaviour in C (and GCC) when that data is declared volatile, which is a good argument against implementing atomics that way in itself. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
In a reasonable world, gcc should just make that be (on x86) addl $1,i(%rip) on x86-64, which is indeed what it does without the volatile. But with the volatile, the compiler gets really nervous, and doesn't dare do it in one instruction, and thus generates crap like movli(%rip), %eax addl$1, %eax movl%eax, i(%rip) instead. For no good reason, except that volatile just doesn't have any good/clear semantics for the compiler, so most compilers will just make it be I will not touch this access in any way, shape, or form. Including even trivially correct instruction optimization/combination. It's just a (target-specific, perhaps) missed-optimisation kind of bug in GCC. Care to file a bug report? but is (again) something that gcc doesn't dare do, since i is volatile. Just nobody taught it it can do this; perhaps no one wanted to add optimisations like that, maybe with a reasoning like people who hit the go-slow-in-unspecified-ways button should get what they deserve ;-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Now the second wording *IS* technically correct, but come on, it's 24 words long whereas the original one was 3 -- and hopefully anybody reading the shorter phrase *would* have known anyway what was meant, without having to be pedantic about it :-) Well you were talking pretty formal (and detailed) stuff, so IMHO it's good to have that exactly correct. Sure it's nicer to use small words most of the time :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
#define forget(a) __asm__ __volatile__ ( :=m (a) :m (a)) [ This is exactly equivalent to using +m in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back where I thought +m was bogus. ] [It wasn't explained on a GCC list in response to your patch, as far as I can see -- if I missed it, please point me to an archived version of it]. One last time: it isn't equivalent on older (but still supported by Linux) versions of GCC. Current versions of GCC allow it, but it has no documented behaviour at all, so use it at your own risk. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Here, I should obviously admit that the semantics of *(volatile int *) aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) precisely what constitutes as access to object of volatile-qualified type is implementation-defined, but GCC does help us out here by doing the right thing. Where do you get that idea? Try a testcase (experimentally verify). That doesn't prove anything. Experiments can only disprove things. GCC manual, section 6.1, When is a Volatile Object Accessed? doesn't say anything of the kind. True, implementation-defined as per the C standard _is_ supposed to mean unspecified behaviour where each implementation documents how the choice is made. So ok, probably GCC isn't documenting this implementation-defined behaviour which it is supposed to, but can't really fault them much for this, probably. GCC _is_ documenting this, namely in this section 6.1. It doesn't mention volatile-casted stuff. Draw your own conclusions. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
#define forget(a) __asm__ __volatile__ ( :=m (a) :m (a)) [ This is exactly equivalent to using +m in the constraints, as recently explained on a GCC list somewhere, in response to the patch in my bitops series a few weeks back where I thought +m was bogus. ] [It wasn't explained on a GCC list in response to your patch, as far as I can see -- if I missed it, please point me to an archived version of it]. http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html Ah yes, that old thread, thank you. That's when _I_ came to know how GCC interprets +m, but probably this has been explained on those lists multiple times. Who cares, anyway? I just couldn't find the thread you meant, I thought I missed have it, that's all :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
atomic_dec() writes to memory, so it _does_ have volatile semantics, implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. I don't think an atomic_dec() implemented as an inline asm volatile or one that uses a forget macro would have the same re-ordering guarantees as an atomic_dec() that uses a volatile access cast. The asm volatile implementation does have exactly the same reordering guarantees as the volatile cast thing, I don't think so. asm volatile creates a side effect. Side effects aren't allowed to be reordered wrt sequence points. This is exactly the same reason as why volatile accesses cannot be reordered. if that is implemented by GCC in the obvious way. Even a plain asm() will do the same. Read the relevant GCC documentation. I did, yes. [ of course, if the (latest) GCC documentation is *yet again* wrong, then alright, not much I can do about it, is there. ] There was (and is) nothing wrong about the +m documentation, if that is what you are talking about. It could be extended now, to allow +m -- but that takes more than just fixing the documentation. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. volatile has nothing to do with reordering. If you're talking of volatile the type-qualifier keyword, then http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows otherwise. I'm not sure what in that mail you mean, but anyway... Yes, of course, the fact that volatile creates a side effect prevents certain things from being reordered wrt the atomic_dec(); but the atomic_dec() has a side effect *already* so the volatile doesn't change anything. atomic_dec() writes to memory, so it _does_ have volatile semantics, implicitly, as long as the compiler cannot optimise the atomic variable away completely -- any store counts as a side effect. I don't think an atomic_dec() implemented as an inline asm volatile or one that uses a forget macro would have the same re-ordering guarantees as an atomic_dec() that uses a volatile access cast. The asm volatile implementation does have exactly the same reordering guarantees as the volatile cast thing, if that is implemented by GCC in the obvious way. Even a plain asm() will do the same. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
The asm volatile implementation does have exactly the same reordering guarantees as the volatile cast thing, I don't think so. asm volatile creates a side effect. Yeah. Side effects aren't allowed to be reordered wrt sequence points. Yeah. This is exactly the same reason as why volatile accesses cannot be reordered. No, the code in that sub-thread I earlier pointed you at *WAS* written such that there was a sequence point after all the uses of that volatile access cast macro, and _therefore_ we were safe from re-ordering (behaviour guaranteed by the C standard). And exactly the same is true for the asm version. Now, one cannot fantasize that volatile asms are also sequence points. Sure you can do that. I don't though. In fact such an argument would be sadly mistaken, because sequence points are defined by the C standard and it'd be horribly wrong to even _try_ claiming that the C standard knows about volatile asms. That's nonsense. GCC can extend the C standard any way they bloody well please -- witness the fact that they added an extra class of side effects... Read the relevant GCC documentation. I did, yes. No, you didn't read: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html Read the bit about the need for artificial dependencies, and the example given there: asm volatile(mtfsf 255,%0 : : f (fpenv)); sum = x + y; The docs explicitly say the addition can be moved before the volatile asm. Hopefully, as you know, (x + y) is an C expression and hence a sequence point as defined by the standard. The _end of a full expression_ is a sequence point, not every expression. And that is irrelevant here anyway. It is perfectly fine to compute x+y any time before the assignment; the C compiler is allowed to compute it _after_ the assignment even, if it could figure out how ;-) x+y does not contain a side effect, you know. I know there is also stuff written about side-effects there which _could_ give the same semantic w.r.t. sequence points as the volatile access casts, s/could/does/ but hey, it's GCC's own documentation, you obviously can't find fault with _me_ if there's wrong stuff written in there. Say that to GCC ... There's nothing wrong there. See, volatile C keyword, for all it's ill-definition and dodgy semantics, is still at least given somewhat of a treatment in the C standard (whose quality is ... ummm, sadly not always good and clear, but unsurprisingly, still about 5,482 orders-of-magnitude times better than GCC docs). If you find any problems/shortcomings in the GCC documentation, please file a PR, don't go whine on some unrelated mailing lists. Thank you. Semantics of volatile as applies to inline asm, OTOH? You're completely relying on the compiler for that ... Yes, and? GCC promises the behaviour it has documented. [ of course, if the (latest) GCC documentation is *yet again* wrong, then alright, not much I can do about it, is there. ] There was (and is) nothing wrong about the +m documentation, if that is what you are talking about. It could be extended now, to allow +m -- but that takes more than just fixing the documentation. No, there was (and is) _everything_ wrong about the + documentation as applies to memory-constrained operands. I don't give a whit if it's some workaround in their gimplifier, or the other, that makes it possible to use +m (like the current kernel code does). The docs suggest otherwise, so there's obviously a clear disconnect between the docs and actual GCC behaviour. The documentation simply doesn't say +m is allowed. The code to allow it was added for the benefit of people who do not read the documentation. Documentation for +m might get added later if it is decided this [the code, not the documentation] is a sane thing to have (which isn't directly obvious). [ You seem to often take issue with _amazingly_ petty and pedantic things, by the way :-) ] If you're talking details, you better get them right. Handwaving is fine with me as long as you're not purporting you're not. And I simply cannot stand false assertions. You can always ignore me if _you_ take issue with _that_ :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
The documentation simply doesn't say +m is allowed. The code to allow it was added for the benefit of people who do not read the documentation. Documentation for +m might get added later if it is decided this [the code, not the documentation] is a sane thing to have (which isn't directly obvious). Huh? If the (current) documentation doesn't match up with the (current) code, then _at least one_ of them has to be (as of current) wrong. I wonder how could you even try to disagree with that. Easy. The GCC documentation you're referring to is the user's manual. See the blurb on the first page: This manual documents how to use the GNU compilers, as well as their features and incompatibilities, and how to report bugs. It corresponds to GCC version 4.3.0. The internals of the GNU compilers, including how to port them to new targets and some information about how to write front ends for new languages, are documented in a separate manual. _How to use_. This documentation doesn't describe in minute detail everything the compiler does (see the source code for that -- no, it isn't described in the internals manual either). If it doesn't tell you how to use +m, and even tells you _not_ to use it, maybe that is what it means to say? It doesn't mean +m doesn't actually do something. It also doesn't mean it does what you think it should do. It might do just that of course. But treating writing C code as an empirical science isn't such a smart idea. And I didn't go whining about this ... you asked me. (I think I'd said something to the effect of GCC docs are often wrong, No need to guess at what you said, even if you managed to delete your own mail already, there are plenty of free web-based archives around. You said: See, volatile C keyword, for all it's ill-definition and dodgy semantics, is still at least given somewhat of a treatment in the C standard (whose quality is ... ummm, sadly not always good and clear, but unsurprisingly, still about 5,482 orders-of-magnitude times better than GCC docs). and that to me reads as complaining that the ISO C standard isn't very good and that the GCC documentation is 10**5482 times worse even. Which of course is hyperbole and cannot be true. It also isn't helpful in any way or form for anyone on this list. I call that whining. which is true, Yes, documentation of that size often has shortcomings. No surprise there. However, great effort is made to make it better documentation, and especially to keep it up to date; if you find any errors or omissions, please report them. There are many ways how to do that, see the GCC homepage./end-of-marketing-blurb but probably you feel saying that is not allowed on non-gcc lists?) You're allowed to say whatever you want. Let's have a quote again shall we? I said: If you find any problems/shortcomings in the GCC documentation, please file a PR, don't go whine on some unrelated mailing lists. Thank you. I read that as a friendly request, not a prohibition. Well maybe not actually friendly, more a bit angry. A request, either way. As for the PR Problem report, a bugzilla ticket. Sorry for using terminology unknown to you. you're requesting me to file with GCC for this, that gcc-patches@ thread did precisely that Actually not -- PRs make sure issues aren't forgotten (although they might gather dust, sure). But yes, submitting patches is a Great Thing(tm). and more (submitted a patch to said documentation -- and no, saying documentation might get added later is totally bogus and nonsensical -- documentation exists to document current behaviour, not past). When code like you want to write becomes a supported feature, that will be reflected in the user manual. It is completely nonsensical to expect everything that is *not* a supported feature to be mentioned there. I wouldn't have replied, really, if you weren't so provoking. Hey, maybe that character trait is good for something, then. Now to build a business plan around it... Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler optimisations an acceptable price to pay for some peace of mind. So why is this a good tradeoff? It certainly is better than disabling all compiler optimisations! I also think that just adding things to APIs in the hope it might fix up some bugs isn't really a good road to go down. Where do you stop? I look at it the other way: keeping the volatile semantics in atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs; certainly most people expect that behaviour, and also that behaviour is *needed* in some places and no other interface provides that functionality. [some confusion about barriers wrt atomics snipped] Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
The only thing volatile on an asm does is create a side effect on the asm statement; in effect, it tells the compiler do not remove this asm even if you don't need any of its outputs. It's not disabling optimisation likely to result in bugs, heisen- or otherwise; _not_ putting the volatile on an asm that needs it simply _is_ a bug :-) Yep. And the reason it is a bug is that it fails to disable the relevant compiler optimizations. So I suspect that we might actually be saying the same thing here. We're not saying the same thing, but we do agree :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
I'd go so far as to say that anywhere where you want a non-volatile atomic_read, either your code is buggy, or else an int would work just as well. Even, the only way to implement a non-volatile atomic_read() is essentially as a plain int (you can do some tricks so you cannot assign to the result and stuff like that, but that's not the issue here). So if that would be the behaviour we wanted, just get rid of that whole atomic_read() thing, so no one can misuse it anymore. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
I can't speak for this particular case, but there could be similar code examples elsewhere, where we do the atomic ops on an atomic_t object inside a higher-level locking scheme that would take care of the kind of problem you're referring to here. It would be useful for such or similar code if the compiler kept the value of that atomic object in a register. If there is a higher-level locking scheme then there is no point to using atomic_t variables. Atomic_t is specifically for the situation where multiple CPUs are updating a variable without locking. And don't forget about the case where it is an I/O device that is updating the memory (in buffer descriptors or similar). The driver needs to do a volatile atomic read to get at the most recent version of that data, which can be important for optimising latency (or throughput even). There is no other way the kernel can get that info -- doing an MMIO read is way way too expensive. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Note that volatile is a type-qualifier, not a type itself, so a cast of the _object_ itself to a qualified-type i.e. (volatile int) would not make the access itself volatile-qualified. There is no such thing as volatile-qualified access defined anywhere; there only is the concept of a volatile-qualified *object*. To serve our purposes, it is necessary for us to take the address of this (non-volatile) object, cast the resulting _pointer_ to the corresponding volatile-qualified pointer-type, and then dereference it. This makes that particular _access_ be volatile-qualified, without the object itself being such. Also note that the (dereferenced) result is also a valid lvalue and hence can be used in *(volatile int *)a = b; kind of construction (which we use for the atomic_set case). There is a quite convincing argument that such an access _is_ an access to a volatile object; see GCC PR21568 comment #9. This probably isn't the last word on the matter though... Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Here, I should obviously admit that the semantics of *(volatile int *) aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) precisely what constitutes as access to object of volatile-qualified type is implementation-defined, but GCC does help us out here by doing the right thing. Where do you get that idea? GCC manual, section 6.1, When is a Volatile Object Accessed? doesn't say anything of the kind. PR33053 and some others. Honestly, given such confusion, and the propensity of the volatile type-qualifier keyword to be ill-defined (or at least poorly understood, often inconsistently implemented), I'd (again) express my opinion that it would be best to avoid its usage, given other alternatives do exist. Yeah. Or we can have an email thread like this every time someone proposes a patch that uses an atomic variable ;-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Volatile behaviour itself isn't consistently defined (at least definitely not consistently implemented in various gcc versions across platforms), It should be consistent across platforms; if not, file a bug please. but it is /expected/ to mean something like: ensure that every such access actually goes all the way to memory, and is not re-ordered w.r.t. to other accesses, as far as the compiler can take care of these. The last as far as compiler can take care disclaimer comes about due to CPUs doing their own re-ordering nowadays. You can *expect* whatever you want, but this isn't in line with reality at all. volatile _does not_ make accesses go all the way to memory. volatile _does not_ prevent reordering wrt other accesses. What volatile does are a) never optimise away a read (or write) to the object, since the data can change in ways the compiler cannot see; and b) never move stores to the object across a sequence point. This does not mean other accesses cannot be reordered wrt the volatile access. If the abstract machine would do an access to a volatile- qualified object, the generated machine code will do that access too. But, for example, it can still be optimised away by the compiler, if it can prove it is allowed to. If you want stuff to go all the way to memory, you need some architecture-specific flush sequence; to make a store globally visible before another store, you need mb(); before some following read, you need mb(); to prevent reordering you need a barrier. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/24] make atomic_read() behave consistently on frv
Well if there is only one memory location involved, then smp_rmb() isn't going to really do anything anyway, so it would be incorrect to use it. rmb() orders *any* two reads; that includes two reads from the same location. Consider that smp_rmb basically will do anything from flushing the pipeline to invalidating loads speculatively executed out of order. AFAIK it will not control the visibility of stores coming from other CPUs (that is up to the cache coherency). The writer side should typically use wmb() whenever the reader side uses rmb(), sure. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/24] make atomic_read() behave consistently on frv
Well if there is only one memory location involved, then smp_rmb() isn't going to really do anything anyway, so it would be incorrect to use it. rmb() orders *any* two reads; that includes two reads from the same location. If the two reads are to the same location, all CPUs I am aware of will maintain the ordering without need for a memory barrier. That's true of course, although there is no real guarantee for that. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/24] make atomic_read() behave consistently on frv
Well if there is only one memory location involved, then smp_rmb() isn't going to really do anything anyway, so it would be incorrect to use it. rmb() orders *any* two reads; that includes two reads from the same location. If the two reads are to the same location, all CPUs I am aware of will maintain the ordering without need for a memory barrier. That's true of course, although there is no real guarantee for that. A CPU that did not provide this property (cache coherence) would be most emphatically reviled. That doesn't have anything to do with coherency as far as I can see. It's just about the order in which a CPU (speculatively) performs the loads (which isn't necessarily the same as the order in which it executes the corresponding instructions, even). So we are pretty safe assuming that CPUs will provide it. Yeah, pretty safe. I just don't like undocumented assumptions :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
How does the compiler know that msleep() has got barrier()s? Because msleep_interruptible() is in a separate compilation unit, the compiler has to assume that it might modify any arbitrary global. No; compilation units have nothing to do with it, GCC can optimise across compilation unit boundaries just fine, if you tell it to compile more than one compilation unit at once. What you probably mean is that the compiler has to assume any code it cannot currently see can do anything (insofar as allowed by the relevant standards etc.) In many cases, the compiler also has to assume that msleep_interruptible() might call back into a function in the current compilation unit, thus possibly modifying global static variables. It most often is smart enough to see what compilation-unit-local variables might be modified that way, though :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
I think this was just terminology confusion here again. Isn't any code that it cannot currently see the same as another compilation unit, and wouldn't the compilation unit itself expand if we ask gcc to compile more than one unit at once? Or is there some more specific definition for compilation unit (in gcc lingo, possibly?) This is indeed my understanding -- compilation unit is whatever the compiler looks at in one go. I have heard the word module used for the minimal compilation unit covering a single .c file and everything that it #includes, but there might be a better name for this. Yes, that's what's called compilation unit :-) [/me double checks] Erm, the C standard actually calls it translation unit. To be exact, to avoid any more confusion: 5.1.1.1/1: A C program need not all be translated at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this International Standard. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
What volatile does are a) never optimise away a read (or write) to the object, since the data can change in ways the compiler cannot see; and b) never move stores to the object across a sequence point. This does not mean other accesses cannot be reordered wrt the volatile access. If the abstract machine would do an access to a volatile- qualified object, the generated machine code will do that access too. But, for example, it can still be optimised away by the compiler, if it can prove it is allowed to. As (now) indicated above, I had meant multiple volatile accesses to the same object, obviously. Yes, accesses to volatile objects are never reordered with respect to each other. BTW: #define atomic_read(a) (*(volatile int *)(a)) #define atomic_set(a,i) (*(volatile int *)(a) = (i)) int a; void func(void) { int b; b = atomic_read(a); atomic_set(a, 20); b = atomic_read(a); } gives: func: pushl %ebp movla, %eax movl%esp, %ebp movl$20, a movla, %eax popl%ebp ret so the first atomic_read() wasn't optimized away. Of course. It is executed by the abstract machine, so it will be executed by the actual machine. On the other hand, try b = 0; if (b) b = atomic_read(a); or similar. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
What you probably mean is that the compiler has to assume any code it cannot currently see can do anything (insofar as allowed by the relevant standards etc.) I think this was just terminology confusion here again. Isn't any code that it cannot currently see the same as another compilation unit, It is not; try gcc -combine or the upcoming link-time optimisation stuff, for example. and wouldn't the compilation unit itself expand if we ask gcc to compile more than one unit at once? Or is there some more specific definition for compilation unit (in gcc lingo, possibly?) compilation unit is a C standard term. It typically boils down to single .c file. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Of course, if we find there are more callers in the kernel who want the volatility behaviour than those who don't care, we can re-define the existing ops to such variants, and re-name the existing definitions to somethine else, say atomic_read_nonvolatile for all I care. Do we really need another set of APIs? Well, if there's one set of users who do care about volatile behaviour, and another set that doesn't, it only sounds correct to provide both those APIs, instead of forcing one behaviour on all users. But since there currently is only one such API, and there are users expecting the stronger behaviour, the only sane thing to do is let the API provide that behaviour. You can always add a new API with weaker behaviour later, and move users that are okay with it over to that new API. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
No; compilation units have nothing to do with it, GCC can optimise across compilation unit boundaries just fine, if you tell it to compile more than one compilation unit at once. Last I checked, the Linux kernel build system did compile each .c file as a separate compilation unit. I have some patches to use -combine -fwhole-program for Linux. Highly experimental, you need a patched bleeding edge toolchain. If there's interest I'll clean it up and put it online. David Woodhouse had some similar patches about a year ago. In many cases, the compiler also has to assume that msleep_interruptible() might call back into a function in the current compilation unit, thus possibly modifying global static variables. It most often is smart enough to see what compilation-unit-local variables might be modified that way, though :-) Yep. For example, if it knows the current value of a given such local variable, and if all code paths that would change some other variable cannot be reached given that current value of the first variable. Or the most common thing: if neither the address of the translation- unit local variable nor the address of any function writing to that variable can escape from that translation unit, nothing outside the translation unit can write to the variable. At least given that gcc doesn't know about multiple threads of execution! Heh, only about the threads it creates itself (not relevant to the kernel, for sure :-) ) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Possibly these were too trivial to expose any potential problems that you may have been referring to, so would be helpful if you could write a more concrete example / sample code. The trick is to have a sufficiently complicated expression to force the compiler to run out of registers. You can use -ffixed-XXX to keep the testcase simple. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/24] make atomic_read() behave consistently on frv
Please check the definition of cache coherence. Which of the twelve thousand such definitions? :-) Summary: the CPU is indeed within its rights to execute loads and stores to a single variable out of order, -but- only if it gets the same result that it would have obtained by executing them in order. Which means that any reordering of accesses by a single CPU to a single variable will be invisible to the software. I'm still not sure if that applies to all architectures. Doesn't matter anyway, let's kill this thread :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
No; compilation units have nothing to do with it, GCC can optimise across compilation unit boundaries just fine, if you tell it to compile more than one compilation unit at once. Last I checked, the Linux kernel build system did compile each .c file as a separate compilation unit. I have some patches to use -combine -fwhole-program for Linux. Highly experimental, you need a patched bleeding edge toolchain. If there's interest I'll clean it up and put it online. David Woodhouse had some similar patches about a year ago. Sounds exciting... ;-) Yeah, the breakage is *quite* spectacular :-) In many cases, the compiler also has to assume that msleep_interruptible() might call back into a function in the current compilation unit, thus possibly modifying global static variables. It most often is smart enough to see what compilation-unit-local variables might be modified that way, though :-) Yep. For example, if it knows the current value of a given such local variable, and if all code paths that would change some other variable cannot be reached given that current value of the first variable. Or the most common thing: if neither the address of the translation- unit local variable nor the address of any function writing to that variable can escape from that translation unit, nothing outside the translation unit can write to the variable. But there is usually at least one externally callable function in a .c file. Of course, but often none of those will (indirectly) write a certain static variable. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler optimisations an acceptable price to pay for some peace of mind. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Precisely the point -- use of volatile (whether in casts or on asms) in these cases are intended to disable those optimizations likely to result in heisenbugs. The only thing volatile on an asm does is create a side effect on the asm statement; in effect, it tells the compiler do not remove this asm even if you don't need any of its outputs. It's not disabling optimisation likely to result in bugs, heisen- or otherwise; _not_ putting the volatile on an asm that needs it simply _is_ a bug :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re:
compilation unit is a C standard term. It typically boils down to single .c file. As you mentioned later, single .c file with all the other files (headers or other .c files) that it pulls in via #include is actually translation unit, both in the C standard as well as gcc docs. Yeah. single .c file after preprocessing. Same thing :-) Compilation unit doesn't seem to be nearly as standard a term, though in most places it is indeed meant to be same as translation unit, but with the new gcc inter-module-analysis stuff that you referred to above, I suspect one may reasonably want to call a compilation unit as all that the compiler sees at a given instant. That would be a bit confusing, would it not? They'd better find some better name for that if they want to name it at all (remember, none of these optimisations should have any effect on the semantics of the program, you just get fewer .o files etc.). Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
A volatile default would disable optimizations for atomic_read. atomic_read without volatile would allow for full optimization by the compiler. Seems that this is what one wants in many cases. Name one such case. An atomic_read should do a load from memory. If the programmer puts an atomic_read() in the code then the compiler should emit a load for it, not re-use a value returned by a previous atomic_read. I do not believe it would ever be useful for the compiler to collapse two atomic_read statements into a single load. An atomic_read() implemented as a normal C variable read would allow that read to be combined with another normal read from that variable. This could perhaps be marginally useful, although I'd bet you cannot see it unless counting cycles on a simulator or counting bits in the binary size. With an asm() implementation, the compiler can not do this; with a volatile implementation (either volatile variable or volatile-cast), this invokes undefined behaviour (in both C and GCC). Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make atomic_t volatile on all architectures
Note that last line. Segher, how about you just accept that Linux uses gcc as per reality, and that sometimes the reality is different from your expectations? +m works. It works _most of the time_. Ask Martin. Oh you don't even have to, he told you two mails ago. My last mail simply pointed out that this isn't a GCC bug, but merely documented behaviour. Well okay, it was documented only after people found problems with it; it is an edge case and pretty hard to trigger (impossible to trigger on RISC architectures, and not very likely on x86 either, for different reasons). Since realistic people keep using +m anyhow, current GCC has added a workaround that splits +m into an =m and an m constraint. It likely will be accepted as a permanent feature. However, that workaround doesn't yet exist on some of the compilers Linux still allows building with (but the problem does). We use it. It's better than the alternatives. How is this better than simply writing the slightly more verbose asm(... : =m(XX) : m(XX)) ? It's a few more characters. asm() is hard to write (correctly) anyway. I don't see the problem. Pointing to stale documentation doesn't change anything. It's not stale documentation, it's freshly built from GCC mainline. It's also not stale in the sense that it isn't updated; in fact, the email I pointed to is from a thread where a change in this exact paragraph in the documentation was suggested (a couple of weeks ago). The same is true of accesses through a volatile pointer. The kernel has done that for a *loong* time (all MMIO IO is done that way), (All MMIO on certain architectures). This however is a completely different case; there _is_ no underlying C object declared anywhere, so no way can GCC prove that that object isn't volatile, so it _has_ to assume it is. Also, in case of e.g. readb(), if the compiler can prove the resulting value isn't used it doesn't have to perform the access at all -- the documentation (again, not stale documentation) points this out quite literally. and it's totally pointless to say that volatile isn't guaranteed to do what it does. I actually asked a couple of other GCC developers last night, and they all agreed with me. If you look at historic GCC problem reports (I pointed at a few earlier in these threads) you would see that the situation is far from clear. It works, and quite frankly, if it didn't work, it would be a gcc BUG. How can it be a bug, if the semantics you think are guaranteed are guaranteed in your (and others') minds only? Of course, GCC likes to cater to its users, and tries to make things work the way the programmer probably intended. However it a) cannot *actually* read your mind, it just looks that way; b) it _also_ has to implement the _correct_ semantics of volatile; c) since this isn't part of any C version (no, not GNU99 or similar either), sometimes GCC developers do not think about what some people expect the compiler should do with such code, but just implement the requirements they *know* exist. And that's when BUGs happen. If you want GCC to do what you want it to do instead of what it does, why not file an enhancement request? http://gcc.gnu.org/bugzilla. To save you the trouble, I just did: gcc.gnu.org/PR33053. Let's see what comes from it. And again, this is not a C standard issue. It's a sane implementation issue. Maybe what you consider sane isn't as clear-cut as you think? I'm not alone with this opinion, as the mailing lists archives readily show. Linux expects the compiler to be sane. And the compiler expects the code it is asked to translate to be reasonably sane. Nothing new here. The compiler is more forgiving than the Linux code, IMHO; that's why I suggested using the obviously correct asm implementation of the atomic get/set, rather than using the not-so-obviously-correct and error-prone volatile thing. The semantics of the asm() are exactly the semantics you expect from an atomic get/set, while that of the volatile implementation are *not*; and on top of that it is very well-tested well-understood technology. If it isn't, that's not our problem. What, if the compiler doesn't work as you expect, like a silent miscompilation, or even only the compiler ICEs, that is not a problem for users and/or developers of Linux? Interesting opinion. I'd rather stay clear of known pitfalls. gcc *is* sane, Yes, it is. and I don't see why you constantly act as if it wasn't. I guess I didn't express myself clearly enough then. Hopefully some other people _did_ understand what I meant. In very harsh words: not all (proposed) kernel code is sane. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make atomic_t volatile on all architectures
Yes, though I would use =m on the output list and m on the input list. The reason is that I've seen gcc fall on its face with an ICE on s390 due to +m. The explanation I've got from our compiler people was quite esoteric, as far as I remember gcc splits +m to an input operand and an output operand. Now it can happen that the compiler chooses two different registers to access the same memory location. +m requires that the two memory references are identical which causes the ICE if they are not. The problem is very nicely described here, last paragraph: http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html It's not a problem anymore in (very) recent GCC, although that of course won't help you in the kernel (yet). So you are saying that gcc 3.x still has this problem ? Yes. A warning (read-write constraint does not allow a register) was added for GCC-3.4, but the fix/workaround is more recent (4.2 I believe, perhaps it was backported to the 4.1 branch). I do not know if the current compilers still do this. Has anyone else seen this happen ? In recent GCC, it's actually documented: The ordinary output operands must be write-only; GCC will assume that the values in these operands before the instruction are dead and need not be generated. Extended asm supports input-output or read-write operands. Use the constraint character `+' to indicate such an operand and list it with the output operands. You should only use read-write operands when the constraints for the operand (or the operand in which only some of the bits are to be changed) allow a register. Note that last line. I see, thanks for the info. My pleasure. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make atomic_t volatile on all architectures
+m works. We use it. It's better than the alternatives. Pointing to stale documentation doesn't change anything. Well, perhaps on i386. I've seen some older versions of the s390 gcc die with an ICE because I have used +m in some kernel inline assembly. I'm happy to hear that this issue is fixed in recent gcc. Now I'll have to find out if this is already true with gcc 3.x. It was fixed (that is, +m is translated into a separate read and write by GCC itself) in GCC-4.0.0, I just learnt. The duplication =m and m with the same constraint is rather annoying. Yeah. Compiler errors are more annoying though I dare say ;-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make atomic_t volatile on all architectures
Yeah. Compiler errors are more annoying though I dare say ;-) Actually, compile-time errors are fine, Yes, they don't cause data corruption or anything like that, but I still don't think the 390 people want to ship a kernel that doesn't build -- and it seems they still need to support GCC versions before 4.0. Up until the day they can finally drop 3.x, they'll have to... and easy to work around. ...use the simple workaround, annoying as it may be, at least it works. That's all I'm saying. *Much* more annoying is when gcc actively generates subtly bad code. Quite so. We've had use-after-free issues due to incorrect gcc liveness calculations etc, and inline asm has beeen one of the more common causes - exactly because the kernel is one of the few users (along with glibc) that uses it at all. The good news is that things are getting better since more and more of the RTL transformations are ripped out. Now *those* are hard to find - the code works most of the time, but the compiler has inserted a really subtle race condition into the code (deallocated a local stack entry before last use). We had that with our semaphore code at some point. Yeah, magnifying glass + lots of time kind of bug. Lovely :-/ Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make atomic_t volatile on all architectures
You'd have to use +m. Yes, though I would use =m on the output list and m on the input list. The reason is that I've seen gcc fall on its face with an ICE on s390 due to +m. The explanation I've got from our compiler people was quite esoteric, as far as I remember gcc splits +m to an input operand and an output operand. Now it can happen that the compiler chooses two different registers to access the same memory location. +m requires that the two memory references are identical which causes the ICE if they are not. The problem is very nicely described here, last paragraph: http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01816.html It's not a problem anymore in (very) recent GCC, although that of course won't help you in the kernel (yet). I do not know if the current compilers still do this. Has anyone else seen this happen ? In recent GCC, it's actually documented: The ordinary output operands must be write-only; GCC will assume that the values in these operands before the instruction are dead and need not be generated. Extended asm supports input-output or read-write operands. Use the constraint character `+' to indicate such an operand and list it with the output operands. You should only use read-write operands when the constraints for the operand (or the operand in which only some of the bits are to be changed) allow a register. Note that last line. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
The compiler is within its rights to read a 32-bit quantity 16 bits at at time, even on a 32-bit machine. I would be glad to help pummel any compiler writer that pulls such a dirty trick, but the C standard really does permit this. Code all over the kernel assumes that 32-bit reads/writes are atomic so while such a compiler might be legal it certainly can't compile Linux. That means GCC cannot compile Linux; it already optimises some accesses to scalars to smaller accesses when it knows it is allowed to. Not often though, since it hardly ever helps in the cost model it employs. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
That means GCC cannot compile Linux; it already optimises some accesses to scalars to smaller accesses when it knows it is allowed to. Not often though, since it hardly ever helps in the cost model it employs. Please give an example code snippet + gcc version + arch to back this up. unsigned char f(unsigned long *p) { return *p 1; } with both powerpc64-linux-gcc (GCC) 4.3.0 20070731 (experimental) and powerpc64-linux-gcc-4.2.0 (GCC) 4.2.0 (sorry, I don't have anything newer or older right now; if you really care, I can test with those too) generate (in 64-bit mode): .L.f: lbz 3,7(3) rldicl 3,3,0,63 blr and in 32-bit mode: f: stwu 1,-16(1) nop nop lbz 3,3(3) addi 1,1,16 rlwinm 3,3,0,31,31 blr (the nops are because I use --with-cpu=970). But perhaps you do not care for PowerPC, in which case: i686-linux-gcc (GCC) 4.2.0 20060410 (experimental) (sorry for the old version, I don't build x86 compilers all that often; also I don't have a 64-bit version right now): f: pushl %ebp movl%esp, %ebp movl8(%ebp), %eax popl%ebp movzbl (%eax), %eax andl$1, %eax ret If you want testing with any other versions, and/or for any other target architecture, I can do that; it takes a few minutes to build a compiler. It is quite hard to build a testcase that reads more than one part of the long, since for small testcases the compiler will almost always be smart enough to do one bigger read instead; but it certainly isn't inconceivable, and anyway the compiler would be fully in its right to do reads non-atomically if not instructed otherwise. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
That means GCC cannot compile Linux; it already optimises some accesses to scalars to smaller accesses when it knows it is allowed to. Not often though, since it hardly ever helps in the cost model it employs. Please give an example code snippet + gcc version + arch to back this up. unsigned char f(unsigned long *p) { return *p 1; } This doesn't really matter since we only care about the LSB. It is exactly what I claimed, and what you asked proof of. Do you have an example where gcc reads it non-atmoically and we care about all parts? Like I explained in the original mail; no, I suspect such a testcase will be really hard to construct, esp. as a small testcase. I have no reason to believe it is impossible to do so though -- maybe someone else can write trickier code than I can, in which case, please do so. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
We can't have split stores because we don't use atomic64_t on 32-bit architectures. That's not true; the compiler is free to split all stores (and reads) from memory however it wants. It is debatable whether volatile would prevent this as well, certainly it is unsafe if you want to be portable. GCC will do its best to not split volatile memory accesses, but bugs in this area do happen a lot (because the compiler code for volatile isn't as well exercised as most other compiler code, and because it is simply a hard subject; and there is no real formalised model for what GCC should do). The only safe way to get atomic accesses is to write assembler code. Are there any downsides to that? I don't see any. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/24] document volatile atomic_read() behavior
Historically this has been +accomplished by declaring the counter itself to be volatile, but the +ambiguity of the C standard on the semantics of volatile make this practice +vulnerable to overly creative interpretation by compilers. It's even worse when accessing through a volatile casted pointer; see for example the recent(*) GCC bugs in that area. (*) Well, not _all_ that recent. No one should be using the 3.x series anymore, right? Explicit +casting in atomic_read() ensures consistent behavior across architectures +and compilers. Even modulo compiler bugs, what makes you believe that? Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
The only safe way to get atomic accesses is to write assembler code. Are there any downsides to that? I don't see any. The assumption that aligned word reads and writes are atomic, and that words are aligned unless explicitly packed otherwise, is endemic in the kernel. No sane compiler violates this assumption. It's true that we're not portable to insane compilers after this patch, but we never were in the first place. You didn't answer my question: are there any downsides to using explicit coded-in-assembler accesses for atomic accesses? You can handwave all you want that it should just work with volatile accesses, but volatility != atomicity, volatile in C is really badly defined, GCC never officially gave stronger guarantees, and we have a bugzilla full of PRs to show what a minefield it is. So, why not use the well-defined alternative? Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
The compiler is within its rights to read a 32-bit quantity 16 bits at at time, even on a 32-bit machine. I would be glad to help pummel any compiler writer that pulls such a dirty trick, but the C standard really does permit this. Yes, but we don't write code for these compilers. There are countless pieces of kernel code which would break in this condition, and there doesn't seem to be any interest in fixing this. Other things are broken too. Great argument :-) Sequence points enforce read-after-write ordering, not write-after-write. Sequence points order *all* side effects; sequence points exist in the domain of the abstract sequential model of the C language only. The compiler translates that to machine code that is equivalent to that C code under the as-if rule; but this is still in that abstract model, which doesn't include things such as SMP, visibility by I/O devices, store queues, etc. etc. We flush writes with reads for MMIO because of this effect as well as the CPU/bus effects. You cannot flush all MMIO writes with reads; this is a PCI-specific thing. And even then, you need more than just the read itself: you have to make sure the read completed and returned data. In short, please retain atomic_set()'s volatility, especially on those architectures that declared the atomic_t's counter to be volatile. Like i386 and x86_64? These used to have volatile in the atomic_t declaration. We removed it, and the sky did not fall. And this proves what? Lots of stuff works by accident. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
The only safe way to get atomic accesses is to write assembler code. Are there any downsides to that? I don't see any. The assumption that aligned word reads and writes are atomic, and that words are aligned unless explicitly packed otherwise, is endemic in the kernel. No sane compiler violates this assumption. It's true that we're not portable to insane compilers after this patch, but we never were in the first place. You didn't answer my question: are there any downsides to using explicit coded-in-assembler accesses for atomic accesses? You can handwave all you want that it should just work with volatile accesses, but volatility != atomicity, volatile in C is really badly defined, GCC never officially gave stronger guarantees, and we have a bugzilla full of PRs to show what a minefield it is. So, why not use the well-defined alternative? Because we don't need to, You don't need to use volatile objects, or accesses through valatile-cast pointers, either. and it hurts performance. Please show how it does this -- one load is one load either way, and one store is one store. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
If you need to guarantee that the value is written to memory at a particular time in your execution sequence, you either have to read it from memory to force the compiler to store it first That isn't enough. The CPU will happily read the datum back from its own store queue before it ever hit memory or cache or any external bus. If you need the value to be globally visible before another store, you use wmb(); if you need it before some read, you use mb(). These concepts are completely separate from both atomicity and volatility (which aren't the same themselves). No, but I can reason about it and be confident that this won't break anything that isn't already broken. At worst, this patch will make any existing subtly incorrect uses of atomic_t much more obvious and easier to track down. PR22278, PR29753 -- both P2 bugs, and both about basic *(volatile *) usage. Yeah, it's definitely not problematic, esp. not if you want to support older compilers than 4.2 too./sarcasm Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/24] document volatile atomic_read() behavior
Explicit +casting in atomic_read() ensures consistent behavior across architectures +and compilers. Even modulo compiler bugs, what makes you believe that? When you declare a variable volatile, you don't actually tell the compiler where you want to override its default optimization behavior, giving it some freedom to guess your intentions incorrectly. When you put the cast on the data access itself, there is no question about precisely where in the code you want to override the compiler's default optimization behavior. ...except for the small point that this isn't how volatile works. Rule of thumb: even people who know the semantics of volatile shouldn't use it. If the compiler doesn't do what you want with a volatile declaration, it might have a plausible excuse in the ambiguity of the C standard. If the compiler doesn't do what you want in a cast specific to a single dereference, it's just plain broken. The other way around. volatile has pretty weak semantics, and certainly not the semantics you think it has, or that you wish it had; but *(volatile XX *) doesn't have *any* semantics. However much you cast that pointer it still doesn't point to a volatile object. GCC will generally take the path of least surprise and perform a volatile access anyway, but this has only be decided recently (a year ago or so), and there very likely still are some bugs in that area. We try to be compatible with plausibly correct compilers, but if they're completely broken, we're screwed no matter what. If you don't know what to expect, you're screwed for sure. Anyway, what's the supposed advantage of *(volatile *) vs. using a real volatile object? That you can access that same object in a non-volatile way? Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/24] document volatile atomic_read() behavior
Anyway, what's the supposed advantage of *(volatile *) vs. using a real volatile object? That you can access that same object in a non-volatile way? That's my understanding. That way accesses where you don't care about volatility may be optimised. But those accesses might be done non-atomically then (for example, if the compiler knows it only needs to write one byte, it might not bother writing the rest), so that's no good if you want to read the thing atomically too. For instance, in cases where there are already other things controlling visibility (as are needed for atomic increment, for example) you don't need to make the access itself volatile. Hrm, you mean within a lock or similar? You'll get the same semantics as volatile anyway there. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/24] document volatile atomic_read() behavior
Anyway, what's the supposed advantage of *(volatile *) vs. using a real volatile object? That you can access that same object in a non-volatile way? You'll have to take that up with Linus and the minds behind Volatile Considered Harmful, but the crux of it is that volatile objects are prone to compiler bugs too, and if we have to track down a compiler bug, it's a lot easier when we know exactly where the load is supposed to be because we deliberately put it there, rather than letting the compiler re-order everything that lacks a strict data dependency and trying to figure out where in a thousand lines of assembler the compiler should have put the load for the volatile object. So, why not do the access explicitly via an inline asm? It generates the same code, it's obviously correct, and it's even *actually* correct. Plus, you get good compiler support (and compiler people support). If we're going to assume that the compiler has bugs we'll never be able to find, we all need to find new careers. If we cannot find the bug in finite time, we cannot observe the bug in finite time either, so either way that's fine :-) If we're going to assume that it has bugs we *can* find, then let's use code that makes it easier to do that. And I'm saying this is a step in the wrong direction for that. I initially proposed a patch that made all the objects volatile, on the grounds that this was a special case where there wasn't much room to have a misunderstanding that resulted in anything worse than wasted loads. Linus objected, and now that I've seen all the responses to the new patchset, I understand exactly why. If our compilers really suck as much as everyone says they do, it'll be much easier to detect that with volatile casts than with volatile declarations. Except that accesses via volatile pointer casts open up a whole new can of worms. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
So, why not use the well-defined alternative? Because we don't need to, and it hurts performance. It hurts performance by implementing 32-bit atomic reads in assembler? No, I misunderstood the question. Implementing 32-bit atomic reads in assembler is redundant, because any sane compiler, *particularly* and optimizing compiler (and we're only in this mess because of optimizing compilers) Oh please, don't tell me you don't want an optimising compiler. And if you _do_ want one, well you're in this mess because you chose C as implementation language and C has some pretty strange rules. Trying to use not-all-that-well-defined-and-completely- misunderstood features of the language doesn't make things easier; trying to use something that isn't even part of the language and that your particular compiler originally supported by accident, and that isn't yet an officially supported feature, and that on top of it all has a track record of problems -- well it makes me wonder if you're in this game for fun or what. will give us that automatically without the assembler. No, it does *not* give it to you automatically; you have to do either the asm() thing, or the not-defined-at-all *(volatile *) thing. Yes, it is legal for a compiler to violate this assumption. It is also legal for us to refuse to maintain compatibility with compilers that suck this badly. So that's rm include/linux/compiler-gcc*.h then. Good luck with the intel compiler, maybe it works more to your liking. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ucc_geth.c, make PHY device optional.
How about separate autoneg to a property dumb-phy, which indicates the PHY/switch doesn't provide MII register interface. Something like that I suppose. But don't call it dumb phy, nor fake phy, nor anything similar -- there simply is _no_ phy. If the Linux code wants to pretend there is one, that's one thing, but there is no need to do any of this trickery in the device tree. Therefore, it should use the fixed speed and duplex from device node rather than registers. Yes. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Cbe-oss-dev] [PATCH 0/15] spidernet driver bug fixes
I wish there was a git option to just make my shit look like the remote, dammit! The above is the easiest way I know how to do that. git-fetch -f remote:local ? Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.
If you're going to be paranoid, shouldn't you do something here to make sure the value's hit the device? I thought the whole point of paranoia is that its inexplicable. Here's a delusional reply: I didn't see any point to it. 1) a wmb would add overhead A wmb() doesn't guarantee the write has reached the device. 2) the hardware is supposed to be looking at the status flag, anyway, and not misbehaving. But you're paranoid, right? Can't trust that device! :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] gianfar: Add I/O barriers when touching buffer descriptor ownership.
Well, Segher doesn't want me to use iobarrier (because it's not I/O). Andy doesn't want me to use wmb() (because it's sync). I don't think something like gfar_wmb() would be appropriate. So the remaining options are either eieio(), ? Just curious... the original intent of eieio was to order I/O, such as MMIO; it has no effect on memory that isn't marked cache-inhibited or write-trhough or guarded. Has this changed? eieio orders all accesses to address space that is WIMG=1xxx or WIMG=x1x1; separately, it orders stores to address space that is WIMG=001x. I guess I haven't kept up with the times ... is eieio now being used to provide some other kind of barrier? Nothing changed. Is eieio providing some sort of SMP synchronization side-effect? It orders stores to well-behaved memory yes. Point being: if Segher doesn't let you use iobarrier (because it's not I/O), then I don't understand why eieio would work (since that's for io only). iobarrier() is a kernel-level primitive, meant for ordering I/O only, as its name indicates. eieio is a CPU insn that orders stores to main memory (amongst other things), not that its name would tell you. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] gianfar: Add I/O barriers when touching buffer descriptor ownership.
So what about some thing like this where we do the read only once? - k diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index a06d8d1..9cd7d1e 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -1438,31 +1438,35 @@ int gfar_clean_rx_ring(struct net_device *dev, int rx_work_limit) { struct rxbd8 *bdp; struct sk_buff *skb; - u16 pkt_len; + u16 pkt_len, status; + u32 bd_info; I suggested that on IRC yesterday, and Segher was concerned that the compiler might, in theory, optimize it into to two lhz instructions. Yes. The same is true of the original code btw, but since you test only one bit there, all is fine. I'm rather skeptical that it would actually do so (even if it needs to load twice due to register pressure, why not just use lwz both times?), Sure. That doesn't make this code correct though. and there's probably many other places that would break if it did, Most other network drivers read from an MMIO reg to see which RX ring entries are kernel owned AFAICS. but I wasn't up for digging around GCC to prove otherwise. It doesn't matter what current GCC does -- simply look at what it is *allowed* to do instead. If you want a 32-bit read to be atomic, you should do the read via a (volatile u32 *). Doing this with a cast in the places where you need the atomic access makes sure you don't get unnecessary rereads. Plus, that wouldn't synchronize the bd_info read with the buffer data reads. Yes, you still need the rmb(). Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gianfar: Add I/O barriers when touching buffer descriptor ownership.
AFAICS you need stronger barriers though; {w,r,}mb(), to prevent _any_ reordering of those memory accesses, not just the compiler-generated ones. My impression was that the eieio used by iobarrier would be sufficient for that, as we're not trying to synchronize between accesses to different types of memory. Is sync really required here? For accesses to main system memory, eieio only orders writes, not reads, so iobarrier_r() doesn't do what you want; and iobarrier_w() isn't meant to be used for main memory access ordering either. Also, it is better to not use powerpc-specific interfaces in a device driver if you don't have a strong reason to. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] gianfar: Add I/O barriers when touching buffer descriptor ownership.
And the driver is already ppc-specific; it uses in/out_be32. True, but its hidden behind the gfar_read/write accessors. Your change is a bit more blatant. Well, Segher doesn't want me to use iobarrier (because it's not I/O). Andy doesn't want me to use wmb() (because it's sync). You should use wmb(), but unfortunately too strong semantics are required for that (ordering wrt I/O) so it's a full sync on PowerPC. I don't believe a priori that that would be notably slower, but if actually is, you could use eieio() I suppose since you say the driver is powerpc specific -- but please put a comment in the source code then saying why you don't use wmb() there. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gianfar: Add I/O barriers when touching buffer descriptor ownership.
The hardware must not see that is given ownership of a buffer until it is completely written, and when the driver receives ownership of a buffer, it must ensure that any other reads to the buffer reflect its final state. Thus, I/O barriers are added where required. Without this patch, I have observed GCC reordering the setting of bdp-length and bdp-status in gfar_new_skb. The :::memory in the barriers you used prevent GCC from reordering accesses around the barriers. AFAICS you need stronger barriers though; {w,r,}mb(), to prevent _any_ reordering of those memory accesses, not just the compiler-generated ones. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
I thought the motivation for div64() was that a 64:32-32 divide could be done a lot faster on a number of platforms (including the important x86) than a generic 64:64-64 divide, but gcc doesn't handle the devolution automatically -- there is no such libgcc function. That there's no such function in libgcc doesn't mean GCC cannot handle this; libgcc is a bunch of library functions that are really needed for generated code (because you really don't want to expand those functions inline everywhere) -- you won't find an addsi3 in libgcc either. There does exist a divmoddisi4, sort of. It used to be defined in three GCC targets, but commented out in all three. The NS32k is long gone. For Vax, a comment says the machine insn for this isn't used because it is just too slow. And the i386 version is disabled because it returns the wrong result on overflow (not the truncated 64-bit result, required by the implicit cast to 32-bit, but the i386 arch traps to the overflow handler). The only way to express the semantics you want in (GNU) C is to use asm() -- and that's exactly what div64() does :-) Blame it on the C language, but not on GCC. Not this time. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove powerpc specific parts of 3c509 driver
Sure, PCI busses are little-endian. But is readX()/writeX() for PCI only? Yes. For other buses, use foo_writel(), etc. Can this please be documented then? Never heard this before... You have come late to the party. WHat do you mean here? Could you please explain? This has been the case for many, many years. No, it was never documented AFAICS. And there is no point in a massive rename to pci_writel(), either. That would be really inconvenient, sure. It's also inconvenient that all the nice short names are PCI-only. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove powerpc specific parts of 3c509 driver
Nah. We have the basic rule that readl/writel are little endian. PowerPC additionally provides arch specific low level in_{be,le}32 type accessors with explicit endianness. Or you can also use cpu_to_le32/le32_to_cpu kind of macros to convert between native and explicit endianness. Sure, PCI busses are little-endian. But is readX()/writeX() for PCI only? I sure hope not. It would make a lot more sense if readX()/writeX() used the endianness of the bus they are performed on. PowerPC byteswaps are cheap -- for 16- and 32-bit accesses. They're quite bad for 64-bit though; it would be a pity to end up doing two of those for a 64-bit big-endian I/O access (one on the access itself, one to convert the data back to CPU order). This would happily solve the problem of the various variations of byte-swapping bus bridges, too (natural swap, 32-bit swap, 64-bit swap, perhaps others that I thankfully have never seen or cannot remember). Now you can say, use readl_be() or something similar, but that's a) ugly, b) error-prone, c) exponential interface explosion, d) ugly. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove powerpc specific parts of 3c509 driver
Sure, PCI busses are little-endian. But is readX()/writeX() for PCI only? I sure hope not. It's defined for PCI and possibly ISA memory. You can use it for other things if you whish to, but other things are arch specific in any case. Huh? You're saying that only PCI and ISA are standardised busses? It would make a lot more sense if readX()/writeX() used the endianness of the bus they are performed on. No way ! Again, it's evil if such a simple thing start doing different things depending on random external factors. That's your opinion, yes. I'm saying it's *not* doing different things: in both cases it just does the correct-endian access. Also it doesn't depend on random external factors -- they're not random factors, and not external either: it only depends on the bus the access is done on. Different bus - different accessor. Then please rename readX()/writeX() to pci_readX()/pci_writeX(). Now you can say, use readl_be() or something similar, but that's a) ugly, b) error-prone, c) exponential interface explosion, d) ugly. I'd rather has an interface explosion than having black endian magic happening inside of the accessors. Any comments on a), b) and d) as well? Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove powerpc specific parts of 3c509 driver
Sure, PCI busses are little-endian. But is readX()/writeX() for PCI only? Yes. For other buses, use foo_writel(), etc. Can this please be documented then? Never heard this before... Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove powerpc specific parts of 3c509 driver
Well, I'm having trouble thinking of other busses that have as strong a sense of the address-data style I/O as PCI. Busses like scsi and ide are primarily command-data or data-data in style. Only the address-data style busses need readl/writel-style routines. SBUS, JBUS, VMEbus, NuBus, RapidIO, eBus, various SoC busses, to name a few. There are many, and most are big-endian. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
#define tw32_rx_mbox(reg, val) do { wmb(); tp-write32_rx_mbox(tp, reg, val); } while(0) #define tw32_tx_mbox(reg, val) do { wmb(); tp-write32_tx_mbox(tp, reg, val); } while(0) That should do it. I think we need those tcpdump after all. Can you send it to me? Looks like adding a sync to writel does fix it though... I'm trying to figure out which specific writel in the driver makes a difference. I'll then look into slicing those tcpdumps. Adding a PowerPC sync to every writel() will slow down things by so much that it could well just hide the problem, not solve it... Michael, we see this problem only with TSO on, and then the failure we see is bad data being sent out, with the correct header (but the header is constructed by the tg3 in this case, so no surprise). I'm theorizing that this same failure can happen with TSO off as well, but the header sent on the wire will be bogus as well as the data, so the receiving NIC won't pick it up. tcpdump probably won't see it either... will need a low-level ethernet analyser. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
I've been chasing with Segher a data corruption problem lately. Basically transferring huge amount of data (several Gb) and I get corrupted data at the rx side. I cannot tell for sure wether what I've been observing here is the same problem that segher's been seing on is blades, he will confirm or not. He also seemed to imply that reverting to an older kernel on the -receiver- side fixed it, which makes me wonder, since it's looks really like a sending side problem (see explanation below), if some change in, for exmaple, window scaling, might hide or trigger it. Please send me lspci and tg3 probing output so that I know what tg3 hardware you're using. I use a 5780 rev. A3, but the problem is not limited to this chip. I also want to look at the tcpdump or ethereal on the mirrored port that shows the packet being corrupted. I don't have such, sorry. That's all the data I have at this point. I can't guarantee 100% that it's a TSO bug (it might be a bug that TSO renders visible due to timing effects) but it looks like it since I've not reproduced yet with TSO disabled. It seems to indeed to only be exposed by TSO, not actually a bug of it /an sich/. I've got a patch that seems so solve the problem, it needs more testing though (maybe Ben can do this :-) ). The problem is that there should be quite a few wmb()'s in the code that are just not there; adding some to tg3_set_txd() seems to fix the immediate problem but more is needed (and I don't see why those should be needed, unless tg3_set_txd() is updating a life ring entry in place or something like that). More testing is needed, but the problem is definitely the lack of memory ordering. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 problem on powerpc
The patch has a couple of places where I reversed 2 assignments, they are harmless, it was before I figured out that the chip will (apparently) not access a descriptor before it's been told to do so via MMIO, and thus the order of the writes to the descriptors is irrelevant (I was also adding wmb's though I removed them). wmb() between writing the descriptors to main memory and writing an MMIO to the device to tell it to go fetch them? That's still required. The 970 you tested on won't care though ;-) There is a couple of places where we were doing a BE and not LE conversion of a descriptor field (typically in the VLAN code). I'm not sure what's up there but BE felt wrong. I have turned them into LE conversions but then I haven't tested VLAN, and I might just misudnerstand what's happening there so I'll let you decide what to do about those. VLAN tags are big-endian on the wire, it might well be that these chips never swizzle that? --- linux-work.orig/drivers/net/sky2.c 2006-09-04 17:35:20.0 +1000 +++ linux-work/drivers/net/sky2.c 2006-09-04 17:41:50.0 +1000 @@ -811,8 +811,9 @@ static void rx_set_checksum(struct sky2_ struct sky2_rx_le *le; le = sky2_next_rx(sky2); - le-addr = (ETH_HLEN 16) | ETH_HLEN; + le-addr = cpu_to_le32((ETH_HLEN 16) | ETH_HLEN); le-ctrl = 0; + le-length = 0; le-opcode = OP_TCPSTART | HW_OWNER; Is this le-length thing a separate bugfix? Or just overly protective coding (you write the field later again). + /* Don't let it byte swap descriptors in hardware. Clear the bit +* in case a previous driver have set it +*/ + reg = sky2_pci_read32(hw, PCI_DEV_REG2); + reg = ~PCI_REV_DESC; + sky2_pci_write32(hw, PCI_DEV_REG2, reg); Probably not needed, the chip gets a full reset at startup (or so I hope anyway, heh). Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html