Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 2014-02-18 at 23:48 +, Peter Sewell wrote: On 18 February 2014 20:43, Torvald Riegel trie...@redhat.com wrote: On Tue, 2014-02-18 at 12:12 +, Peter Sewell wrote: Several of you have said that the standard and compiler should not permit speculative writes of atomics, or (effectively) that the compiler should preserve dependencies. In simple examples it's easy to see what that means, but in general it's not so clear what the language should guarantee, because dependencies may go via non-atomic code in other compilation units, and we have to consider the extent to which it's desirable to limit optimisation there. [...] 2) otherwise, the language definition should prohibit it but the compiler would have to preserve dependencies even in compilation units that have no mention of atomics. It's unclear what the (runtime and compiler development) cost of that would be in practice - perhaps Torvald could comment? If I'm reading the standard correctly, it requires that data dependencies are preserved through loads and stores, including nonatomic ones. That sounds convenient because it allows programmers to use temporary storage. The standard only needs this for consume chains, That's right, and the runtime cost / implementation problems of mo_consume was what I was making statements about. Sorry if that wasn't clear.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 2014-02-18 at 22:52 +0100, Peter Zijlstra wrote: 4.Some drivers allow user-mode code to mmap() some of their state. Any changes undertaken by the user-mode code would be invisible to the compiler. A good point, but a compiler that doesn't try to (incorrectly) assume something about the semantics of mmap will simply see that the mmap'ed data will escape to stuff if can't analyze, so it will not be able to make a proof. This is different from, for example, malloc(), which is guaranteed to return fresh nonaliasing memory. The kernel side of this is different.. it looks like 'normal' memory, we just happen to allow it to end up in userspace too. But on that point; how do you tell the compiler the difference between malloc() and mmap()? Is that some function attribute? Yes: malloc The malloc attribute is used to tell the compiler that a function may be treated as if any non-NULL pointer it returns cannot alias any other pointer valid when the function returns and that the memory has undefined content. This often improves optimization. Standard functions with this property include malloc and calloc. realloc-like functions do not have this property as the memory pointed to does not have undefined content. I'm not quite sure whether GCC assumes malloc() to be indeed C's malloc even if the function attribute isn't used, and/or whether that is different for freestanding environments.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 2014-02-18 at 14:58 -0800, Paul E. McKenney wrote: On Tue, Feb 18, 2014 at 10:40:15PM +0100, Torvald Riegel wrote: xagsmtp4.20140218214207.8...@vmsdvm9.vnet.ibm.com X-Xagent-Gateway: vmsdvm9.vnet.ibm.com (XAGSMTP4 at VMSDVM9) On Tue, 2014-02-18 at 09:16 -0800, Paul E. McKenney wrote: On Tue, Feb 18, 2014 at 08:49:13AM -0800, Linus Torvalds wrote: On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote: And exactly because I know enough, I would *really* like atomics to be well-defined, and have very clear - and *local* - rules about how they can be combined and optimized. Local? Yes. So I think that one of the big advantages of atomics over volatile is that they *can* be optimized, and as such I'm not at all against trying to generate much better code than for volatile accesses. But at the same time, that can go too far. For example, one of the things we'd want to use atomics for is page table accesses, where it is very important that we don't generate multiple accesses to the values, because parts of the values can be change *by*hardware* (ie accessed and dirty bits). So imagine that you have some clever global optimizer that sees that the program never ever actually sets the dirty bit at all in any thread, and then uses that kind of non-local knowledge to make optimization decisions. THAT WOULD BE BAD. Might as well list other reasons why value proofs via whole-program analysis are unreliable for the Linux kernel: 1.As Linus said, changes from hardware. This is what's volatile is for, right? (Or the weak-volatile idea I mentioned). Compilers won't be able to prove something about the values of such variables, if marked (weak-)volatile. Yep. 2.Assembly code that is not visible to the compiler. Inline asms will -normally- let the compiler know what memory they change, but some just use the memory tag. Worse yet, I suspect that most compilers don't look all that carefully at .S files. Any number of other programs contain assembly files. Are the annotations of changed memory really a problem? If the memory tag exists, isn't that supposed to mean all memory? To make a proof about a program for location X, the compiler has to analyze all uses of X. Thus, as soon as X escapes into an .S file, then the compiler will simply not be able to prove a thing (except maybe due to the data-race-free requirement for non-atomics). The attempt to prove something isn't unreliable, simply because a correct compiler won't claim to be able to prove something. I am indeed less worried about inline assembler than I am about files full of assembly. Or files full of other languages. One reason that could corrupt this is that if program addresses objects other than through the mechanisms defined in the language. For example, if one thread lays out a data structure at a constant fixed memory address, and another one then uses the fixed memory address to get access to the object with a cast (e.g., (void*)0x123). Or if the program uses gcc linker scripts to get the same effect. 3.Kernel modules that have not yet been written. Now, the compiler could refrain from trying to prove anything about an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() variable, but there is currently no way to communicate this information to the compiler other than marking the variable volatile. Even if the variable is just externally accessible, then the compiler knows that it can't do whole-program analysis about it. It is true that whole-program analysis will not be applicable in this case, but it will not be unreliable. I think that's an important difference. Let me make sure that I understand what you are saying. If my program has extern int foo;, the compiler will refrain from doing whole-program analysis involving foo? Yes. If it can't be sure to actually have the whole program available, it can't do whole-program analysis, right? Things like the linker scripts you mention or other stuff outside of the language semantics complicates this somewhat, and maybe some compilers assume too much. There's also the point that data-race-freedom is required for non-atomics even if those are shared with non-C-code. But except those corner cases, a compiler sees whether something escapes and becomes visible/accessible to other entities. Or to ask it another way, when you say whole-program analysis, are you restricting that analysis to the current translation unit? No. I mean, you can do analysis of the current translation unit, but that will do just that; if the variable, for example, is accessible outside of this translation unit, the compiler can't make a
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 2014-02-18 at 22:47 +0100, Peter Zijlstra wrote: On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote: Yes, I do. But that seems to be volatile territory. It crosses the boundaries of the abstract machine, and thus is input/output. Which fraction of your atomic accesses can read values produced by hardware? I would still suppose that lots of synchronization is not affected by this. Its not only hardware; also the kernel/user boundary has this same problem. We cannot a-priory say what userspace will do; in fact, because we're a general purpose OS, we must assume it will willfully try its bestest to wreck whatever assumptions we make about its behaviour. That's a good note, and I think a distinct case from those below, because here you're saying that you can't assume that the userspace code follows the C11 semantics ... We also have loadable modules -- much like regular userspace DSOs -- so there too we cannot say what will or will not happen. We also have JITs that generate code on demand. .. whereas for those, you might assume that the other code follows C11 semantics and the same ABI, which makes this just a normal case already handled as (see my other replies nearby in this thread). And I'm absolutely sure (with the exception of the JITs, its not an area I've worked on) that we have atomic usage across all those boundaries. That would be fine as long as all involved parties use the same memory model and ABI to implement it. (Of course, I'm assuming here that the compiler is aware of sharing with other entities, which is always the case except in those corner case like accesses to (void*)0x123 magically aliasing with something else).
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 19, 2014 at 12:07:02PM +0100, Torvald Riegel wrote: Its not only hardware; also the kernel/user boundary has this same problem. We cannot a-priory say what userspace will do; in fact, because we're a general purpose OS, we must assume it will willfully try its bestest to wreck whatever assumptions we make about its behaviour. That's a good note, and I think a distinct case from those below, because here you're saying that you can't assume that the userspace code follows the C11 semantics ... Right; we can malfunction in those cases though; as long as the malfunctioning happens on the userspace side. That is, whatever userspace does should not cause the kernel to crash, but userspace crashing itself, or getting crap data or whatever is its own damn fault for not following expected behaviour. To stay on topic; if the kernel/user interface requires memory ordering and userspace explicitly omits the barriers all malfunctioning should be on the user. For instance it might loose a fwd progress guarantee or data integrity guarantees. In specific, given a kernel/user lockless producer/consumer buffer, if the user-side allows the tail write to happen before its data reads are complete, the kernel might overwrite the data its still reading. Or in case of futexes, if the user side doesn't use the appropriate operations its lock state gets corrupt but only userspace should suffer. But yes, this does require some care and consideration from our side.
Stack layout change during lra
Vlad, When fixing PR60169, I found that reload fail to assert verify_initial_elim_offsets () if (insns_need_reload != 0 || something_needs_elimination || something_needs_operands_changed) { HOST_WIDE_INT old_frame_size = get_frame_size (); reload_as_needed (global); gcc_assert (old_frame_size == get_frame_size ()); gcc_assert (verify_initial_elim_offsets ()); } The reason is that stack layout changes during reload_as_needed as a result of a thumb1 backend heuristic. I have a patch to make sure the heuristic doesn't change stack layout during and after reload, and the assertion disappeared. However, I'm not sure if it will also be a problem in lra. Here is the question more specific: Is that any chance during lra_in_progress that: stack layout can no longer be altered, but insns can still be added or changed? Thanks, Joey
Re: ARM inline assembly usage in Linux kernel
Andrew Pinski pins...@gmail.com writes: On Tue, Feb 18, 2014 at 6:56 PM, Saleem Abdulrasool compn...@compnerd.org wrote: Hello. I am sending this at the behest of Renato. I have been working on the ARM integrated assembler in LLVM and came across an interesting item in the Linux kernel. I am wondering if this is an unstated covenant between the kernel and GCC or simply a clever use of an unintended/undefined behaviour. The Linux kernel uses the *compiler* as a fancy preprocessor to generate a specially crafted assembly file. This file is then post-processed via sed to generate a header containing constants which is shared across assembly and C sources. In order to clarify the question, I am selecting a particular example and pulling out the relevant bits of the source code below. #define DEFINE(sym, val) asm volatile(\n- #sym %0 #val : : i (val)) #define __NR_PAGEFLAGS 22 void definitions(void) { DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS); } This is then assembled to generate the following: -NR_PAGEFLAGS #22 __NR_PAGEFLAGS This will later be post-processed to generate: #define NR_PAGELAGS 22 /* __NR_PAGEFLAGS */ By using the inline assembler to evaluate (constant) expressions into constant values and then emit that using a special identifier (-) is a fairly clever trick. This leads to my question: is this just use of an unintentional feature or something that was worked out between the two projects. If the output is being post-processed by sed then maybe you could put a comment character at the beginning of the line and sed it out? But I tend to agree with Andrew that for -S output the compiler should be prepared to accept asm strings that it can't parse, even if the integrated assembler thinks it understands every instruction. I don't see why this is a bad use of the inline-asm. GCC does not know and is not supposed to know what the string inside the inline-asm is going to be. In fact if you have a newer assembler than the compiler, you could use instructions that GCC does not even know about. Yeah, FWIW, I agree this is a valid use of inline asm. The use of volatile in a reachable part of definitions() means that the asm (and thus the asm string) must be kept if definitions() is kept. I doubt the idea was agreed with GCC developers because no GCC changes were needed to use inline asm this way. This is the purpose of inline-asm. I think it was a bad design decision on LLVM/clang's part that it would check the assembly code up front. Being able to parse it is a useful feature. E.g. it means you can get an accurate byte length for the asm, which is something that we otherwise have to guess by multiplying the number of lines by a constant factor. (And that's wrong for MIPS assembly macros, unless you use a very conservative constant factor.) I agree that having an unrecognised asm shouldn't be a hard error until assembly time though. Saleem, is the problem that this is being rejected earlier? Thanks, Richard
Re: TYPE_BINFO and canonical types at LTO
On Tue, 18 Feb 2014, Jan Hubicka wrote: Non-ODR types born from other frontends will then need to be made to alias all the ODR variants that can be done by storing them into the current canonical type hash. (I wonder if we want to support cross language aliasing for non-POD?) Surely for accessing components of non-POD types, no? Like class Foo { Foo(); int *get_data (); int *data; } glob_foo; extern C int *get_foo_data() { return glob_foo.get_data(); } OK, if we want to support this, then we want to merge. What about types with vtbl pointer? :) I can easily create a C struct variant covering that. Basically in _practice_ I can inter-operate with any language from C if I know its ABI. Do we really want to make this undefined? See the (even standard) Fortran - C interoperability spec. I'm sure something exists for Ada interoperating with C (or even C++). ? But you are talking about the tree merging part using ODR info to also merge types which differ in completeness of contained pointer types, right? (exactly equal cases should be already merged) Actually I was speaking of canonical types here. I want to preserve more of TBAA via honoring ODR and local types. So, are you positive there will be a net gain in optimization when doing that? Please factor in the surprises you'll get when code gets miscompiled because of slight ODR violations or interoperability that no longer works. I want to change lto to not merge canonical types for pairs of types of same layout (i.e. equivalent in the current canonical type definition) but with different mangled names. Names are nothing ;) In C I very often see different _names_ used in headers vs. implementation (when the implementation uses a different internal header). You have struct Foo; in public headers vs. struct Foo_impl; in the implementation. I also want it to never merge when types are local. For inter-language TBAA we will need to ensure aliasing in between non-ODR type of same layout and all unmerged variants of ODR type. Can it be done by attaching chains of ODR types into the canonical type hash and when non-ODR type appears, just make it alias with all of them? No, how would that work? It would make sense to ODR merge in tree merging, too, but I am not sure if this fits the current design, since you would need to merge SCC components of different shape then that seems hard, right? Right. You'd lose the nice incremental SCC merging (where we haven't even yet implemented the nicest way - avoid re-materializing the SCC until we know it prevails). It may be easier to ODR merge after streaming (during DECL fixup) just to make WPA streaming cheaper and to reduce debug info size. If you use -fdump-ipa-devirt, it will dump you ODR types that did not get merged (only ones with vtable pointers in them ATM) and there are quite long chains for firefox. Surely then hundreds of duplicated ODR types will end up in the ltrans partition streams and they eventually hit debug output machinery. Eric sent me presentation about doing this in LLVM. http://llvm.org/devmtg/2013-11/slides/Christopher-DebugInfo.pdf Debuginfo is sth completely separate and should be done separately (early debug), avoiding to stream the types in the first place. The canonical type computation happens separately (only for prevailing types, of course), and there we already merge types which differ in completeness. Canonical type merging is conservative the other way aroud - if we merge _all_ types to a single canonical type then TBAA is still correct (we get a single alias set). Yes, I think I understand that. One equivalence is kind of minimal so we merge only if we are sure there is no informationloss, other is maximal so we are sure that types that needs to be equivalent by whatever underlying langauge TBAA rules are actually equivalent. The former is just not correct - it would mean that not merging at all would be valid, which it is not (you'd create wrong-code all over the place). We still don't merge enough (because of latent bugs that I didn't manage to fix in time) - thus we do not merge all structurally equivalent types right now. I also think we want explicit representation of types known to be local to compilation unit - anonymous namespaces in C/C++, types defined within function bodies in C and god knows what in Ada/Fortran/Java. But here you get into the idea of improving TBAA, thus having _more_ distinct canonical types? Yes. Just to make sure to not mix those two ;) And whatever frontend knowledge we want to excercise - please make sure we get a reliable way for the middle-end to see that frontend knowledge (no langhooks!). Thus, make it middle-end knowledge. Sure that is what I am proposing - just have DECL_ASSEMBLER_NAME on TYPE_DECL and ODR flag. Middle-end when comparing types will test ODR flag and if flag is
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 2014-02-18 at 14:14 -0800, Linus Torvalds wrote: On Tue, Feb 18, 2014 at 1:21 PM, Torvald Riegel trie...@redhat.com wrote: So imagine that you have some clever global optimizer that sees that the program never ever actually sets the dirty bit at all in any thread, and then uses that kind of non-local knowledge to make optimization decisions. THAT WOULD BE BAD. Do you see what I'm aiming for? Yes, I do. But that seems to be volatile territory. It crosses the boundaries of the abstract machine, and thus is input/output. Which fraction of your atomic accesses can read values produced by hardware? I would still suppose that lots of synchronization is not affected by this. The hardware can change things case is indeed pretty rare. But quite frankly, even when it isn't hardware, as far as the compiler is concerned you have the exact same issue - you have TLB faults happening on other CPU's that do the same thing asynchronously using software TLB fault handlers. So *semantically*, it really doesn't make any difference what-so-ever if it's a software TLB handler on another CPU, a microcoded TLB fault, or an actual hardware path. I think there are a few semantic differences: * If a SW handler uses the C11 memory model, it will synchronize like any other thread. HW might do something else entirely, including synchronizing differently, not using atomic accesses, etc. (At least that's the constraints I had in mind). * If we can treat any interrupt handler like Just Another Thread, then the next question is whether the compiler will be aware that there is another thread. I think that in practice it will be: You'll set up the handler in some way by calling a function the compiler can't analyze, so the compiler will know that stuff accessible to the handler (e.g., global variables) will potentially be accessed by other threads. * Similarly, if the C code is called from some external thing, it also has to assume the presence of other threads. (Perhaps this is what the compiler has to assume in a freestanding implementation anyway...) However, accessibility will be different for, say, stack variables that haven't been shared with other functions yet; those are arguably not reachable by other things, at least not through mechanisms defined by the C standard. So optimizing these should be possible with the assumption that there is no other thread (at least as default -- I'm not saying that this is the only reasonable semantics). So if the answer for all of the above is use volatile, then I think that means that the C11 atomics are badly designed. The whole *point* of atomic accesses is that stuff like above should JustWork(tm) I think that it should in the majority of cases. If the other thing potentially accessing can do as much as a valid C11 thread can do, the synchronization itself will work just fine. In most cases except the (void*)0x123 example (or linker scripts etc.) the compiler is aware when data is made visible to other threads or other non-analyzable functions that may spawn other threads (or just by being a plain global variable accessible to other (potentially .S) translation units. Do you perhaps want a weaker form of volatile? That is, one that, for example, allows combining of two adjacent loads of the dirty bits, but will make sure that this is treated as if there is some imaginary external thread that it cannot analyze and that may write? Yes, that's basically what I would want. And it is what I would expect an atomic to be. Right now we tend to use ACCESS_ONCE(), which is a bit of a misnomer, because technically we really generally want ACCESS_AT_MOST_ONCE() (but once is what we get, because we use volatile, and is a hell of a lot simpler to write ;^). So we obviously use volatile for this currently, and generally the semantics we really want are: - the load or store is done as a single access (atomic) - the compiler must not try to re-materialize the value by reloading it from memory (this is the at most once part) In the presence of other threads performing operations unknown to the compiler, that's what you should get even if the compiler is trying to optimize C11 atomics. The first requirement is clear, and the at most once follows from another thread potentially writing to the variable. The only difference I can see right now is that a compiler may be able to *prove* that it doesn't matter whether it reloaded the value or not. But this seems very hard to prove for me, and likely to require whole-program analysis (which won't be possible because we don't know what other threads are doing). I would guess that this isn't a problem in practice. I just wanted to note it because it theoretically does have a different semantics than plain volatiles. and quite frankly, volatile is a big hammer for this. In practice it tends to work pretty well, though, because in _most_ cases, there really is just
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 18 Feb 2014, Torvald Riegel wrote: On Tue, 2014-02-18 at 22:40 +0100, Peter Zijlstra wrote: On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote: Well, that's how atomics that aren't volatile are defined in the standard. I can see that you want something else too, but that doesn't mean that the other thing is broken. Well that other thing depends on being able to see the entire program at compile time. PaulMck already listed various ways in which this is not feasible even for normal userspace code. In particular; DSOs and JITs were mentioned. No it doesn't depend on whole-program analysis being possible. Because if it isn't, then a correct compiler will just not do certain optimizations simply because it can't prove properties required for the optimization to hold. With the exception of access to objects via magic numbers (e.g., fixed and known addresses (see my reply to Paul), which are outside of the semantics specified in the standard), I don't see a correctness problem here. Are you really sure that the compiler can figure out every possible thing that a loadable module or JITed code can access? That seems like a pretty strong claim. David Lang
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 19, 2014 at 11:59:08AM +0100, Torvald Riegel wrote: On Tue, 2014-02-18 at 14:58 -0800, Paul E. McKenney wrote: On Tue, Feb 18, 2014 at 10:40:15PM +0100, Torvald Riegel wrote: xagsmtp4.20140218214207.8...@vmsdvm9.vnet.ibm.com X-Xagent-Gateway: vmsdvm9.vnet.ibm.com (XAGSMTP4 at VMSDVM9) On Tue, 2014-02-18 at 09:16 -0800, Paul E. McKenney wrote: On Tue, Feb 18, 2014 at 08:49:13AM -0800, Linus Torvalds wrote: On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote: And exactly because I know enough, I would *really* like atomics to be well-defined, and have very clear - and *local* - rules about how they can be combined and optimized. Local? Yes. So I think that one of the big advantages of atomics over volatile is that they *can* be optimized, and as such I'm not at all against trying to generate much better code than for volatile accesses. But at the same time, that can go too far. For example, one of the things we'd want to use atomics for is page table accesses, where it is very important that we don't generate multiple accesses to the values, because parts of the values can be change *by*hardware* (ie accessed and dirty bits). So imagine that you have some clever global optimizer that sees that the program never ever actually sets the dirty bit at all in any thread, and then uses that kind of non-local knowledge to make optimization decisions. THAT WOULD BE BAD. Might as well list other reasons why value proofs via whole-program analysis are unreliable for the Linux kernel: 1. As Linus said, changes from hardware. This is what's volatile is for, right? (Or the weak-volatile idea I mentioned). Compilers won't be able to prove something about the values of such variables, if marked (weak-)volatile. Yep. 2. Assembly code that is not visible to the compiler. Inline asms will -normally- let the compiler know what memory they change, but some just use the memory tag. Worse yet, I suspect that most compilers don't look all that carefully at .S files. Any number of other programs contain assembly files. Are the annotations of changed memory really a problem? If the memory tag exists, isn't that supposed to mean all memory? To make a proof about a program for location X, the compiler has to analyze all uses of X. Thus, as soon as X escapes into an .S file, then the compiler will simply not be able to prove a thing (except maybe due to the data-race-free requirement for non-atomics). The attempt to prove something isn't unreliable, simply because a correct compiler won't claim to be able to prove something. I am indeed less worried about inline assembler than I am about files full of assembly. Or files full of other languages. One reason that could corrupt this is that if program addresses objects other than through the mechanisms defined in the language. For example, if one thread lays out a data structure at a constant fixed memory address, and another one then uses the fixed memory address to get access to the object with a cast (e.g., (void*)0x123). Or if the program uses gcc linker scripts to get the same effect. 3. Kernel modules that have not yet been written. Now, the compiler could refrain from trying to prove anything about an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() variable, but there is currently no way to communicate this information to the compiler other than marking the variable volatile. Even if the variable is just externally accessible, then the compiler knows that it can't do whole-program analysis about it. It is true that whole-program analysis will not be applicable in this case, but it will not be unreliable. I think that's an important difference. Let me make sure that I understand what you are saying. If my program has extern int foo;, the compiler will refrain from doing whole-program analysis involving foo? Yes. If it can't be sure to actually have the whole program available, it can't do whole-program analysis, right? Things like the linker scripts you mention or other stuff outside of the language semantics complicates this somewhat, and maybe some compilers assume too much. There's also the point that data-race-freedom is required for non-atomics even if those are shared with non-C-code. But except those corner cases, a compiler sees whether something escapes and becomes visible/accessible to other entities. The traditional response to except those corner cases is of course Murphy was an optimist. ;-) That said, point taken -- you
Re: Stack layout change during lra
On 2/19/2014, 6:54 AM, Joey Ye wrote: Vlad, When fixing PR60169, I found that reload fail to assert verify_initial_elim_offsets () if (insns_need_reload != 0 || something_needs_elimination || something_needs_operands_changed) { HOST_WIDE_INT old_frame_size = get_frame_size (); reload_as_needed (global); gcc_assert (old_frame_size == get_frame_size ()); gcc_assert (verify_initial_elim_offsets ()); } The reason is that stack layout changes during reload_as_needed as a result of a thumb1 backend heuristic. I have a patch to make sure the heuristic doesn't change stack layout during and after reload, and the assertion disappeared. However, I'm not sure if it will also be a problem in lra. Here is the question more specific: Is that any chance during lra_in_progress that: stack layout can no longer be altered, but insns can still be added or changed? I believe LRA is less prone to the above reload problem because of its design. You can change stack in the backend and all will work if you provide the right sfp to hfp/sp offsets. In fact, LRA itself can allocate stack slots several times and add and change insns between these allocations.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, 2014-02-19 at 07:14 -0800, Paul E. McKenney wrote: On Wed, Feb 19, 2014 at 11:59:08AM +0100, Torvald Riegel wrote: On Tue, 2014-02-18 at 14:58 -0800, Paul E. McKenney wrote: On Tue, Feb 18, 2014 at 10:40:15PM +0100, Torvald Riegel wrote: xagsmtp4.20140218214207.8...@vmsdvm9.vnet.ibm.com X-Xagent-Gateway: vmsdvm9.vnet.ibm.com (XAGSMTP4 at VMSDVM9) On Tue, 2014-02-18 at 09:16 -0800, Paul E. McKenney wrote: On Tue, Feb 18, 2014 at 08:49:13AM -0800, Linus Torvalds wrote: On Tue, Feb 18, 2014 at 7:31 AM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote: And exactly because I know enough, I would *really* like atomics to be well-defined, and have very clear - and *local* - rules about how they can be combined and optimized. Local? Yes. So I think that one of the big advantages of atomics over volatile is that they *can* be optimized, and as such I'm not at all against trying to generate much better code than for volatile accesses. But at the same time, that can go too far. For example, one of the things we'd want to use atomics for is page table accesses, where it is very important that we don't generate multiple accesses to the values, because parts of the values can be change *by*hardware* (ie accessed and dirty bits). So imagine that you have some clever global optimizer that sees that the program never ever actually sets the dirty bit at all in any thread, and then uses that kind of non-local knowledge to make optimization decisions. THAT WOULD BE BAD. Might as well list other reasons why value proofs via whole-program analysis are unreliable for the Linux kernel: 1.As Linus said, changes from hardware. This is what's volatile is for, right? (Or the weak-volatile idea I mentioned). Compilers won't be able to prove something about the values of such variables, if marked (weak-)volatile. Yep. 2.Assembly code that is not visible to the compiler. Inline asms will -normally- let the compiler know what memory they change, but some just use the memory tag. Worse yet, I suspect that most compilers don't look all that carefully at .S files. Any number of other programs contain assembly files. Are the annotations of changed memory really a problem? If the memory tag exists, isn't that supposed to mean all memory? To make a proof about a program for location X, the compiler has to analyze all uses of X. Thus, as soon as X escapes into an .S file, then the compiler will simply not be able to prove a thing (except maybe due to the data-race-free requirement for non-atomics). The attempt to prove something isn't unreliable, simply because a correct compiler won't claim to be able to prove something. I am indeed less worried about inline assembler than I am about files full of assembly. Or files full of other languages. One reason that could corrupt this is that if program addresses objects other than through the mechanisms defined in the language. For example, if one thread lays out a data structure at a constant fixed memory address, and another one then uses the fixed memory address to get access to the object with a cast (e.g., (void*)0x123). Or if the program uses gcc linker scripts to get the same effect. 3.Kernel modules that have not yet been written. Now, the compiler could refrain from trying to prove anything about an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() variable, but there is currently no way to communicate this information to the compiler other than marking the variable volatile. Even if the variable is just externally accessible, then the compiler knows that it can't do whole-program analysis about it. It is true that whole-program analysis will not be applicable in this case, but it will not be unreliable. I think that's an important difference. Let me make sure that I understand what you are saying. If my program has extern int foo;, the compiler will refrain from doing whole-program analysis involving foo? Yes. If it can't be sure to actually have the whole program available, it can't do whole-program analysis, right? Things like the linker scripts you mention or other stuff outside of the language semantics complicates this somewhat, and maybe some compilers assume too much. There's also the point that data-race-freedom is required for non-atomics even if those are shared with non-C-code. But except those corner cases, a compiler sees whether something escapes and becomes visible/accessible
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, 2014-02-19 at 07:23 -0800, David Lang wrote: On Tue, 18 Feb 2014, Torvald Riegel wrote: On Tue, 2014-02-18 at 22:40 +0100, Peter Zijlstra wrote: On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote: Well, that's how atomics that aren't volatile are defined in the standard. I can see that you want something else too, but that doesn't mean that the other thing is broken. Well that other thing depends on being able to see the entire program at compile time. PaulMck already listed various ways in which this is not feasible even for normal userspace code. In particular; DSOs and JITs were mentioned. No it doesn't depend on whole-program analysis being possible. Because if it isn't, then a correct compiler will just not do certain optimizations simply because it can't prove properties required for the optimization to hold. With the exception of access to objects via magic numbers (e.g., fixed and known addresses (see my reply to Paul), which are outside of the semantics specified in the standard), I don't see a correctness problem here. Are you really sure that the compiler can figure out every possible thing that a loadable module or JITed code can access? That seems like a pretty strong claim. If the other code can be produced by a C translation unit that is valid to be linked with the rest of the program, then I'm pretty sure the compiler has a well-defined notion of whether it does or does not see all other potential accesses. IOW, if the C compiler is dealing with C semantics and mechanisms only (including the C mechanisms for sharing with non-C code!), then it will know what to do. If you're playing tricks behind the C compiler's back using implementation-defined stuff outside of the C specification, then there's nothing the compiler really can do. For example, if you're trying to access a variable on a function's stack from some other function, you better know how the register allocator of the compiler operates. In contrast, if you let this function simply export the address of the variable to some external place, all will be fine. The documentation of GCC's -fwhole-program and -flto might also be interesting for you. GCC wouldn't need to have -fwhole-program if it weren't conservative by default (correctly so).
Re: TYPE_BINFO and canonical types at LTO
On Tue, 18 Feb 2014, Jan Hubicka wrote: Non-ODR types born from other frontends will then need to be made to alias all the ODR variants that can be done by storing them into the current canonical type hash. (I wonder if we want to support cross language aliasing for non-POD?) Surely for accessing components of non-POD types, no? Like class Foo { Foo(); int *get_data (); int *data; } glob_foo; extern C int *get_foo_data() { return glob_foo.get_data(); } OK, if we want to support this, then we want to merge. What about types with vtbl pointer? :) I can easily create a C struct variant covering that. Basically in _practice_ I can inter-operate with any language from C if I know its ABI. Do we really want to make this undefined? See the (even standard) Fortran - C interoperability spec. I'm sure something exists for Ada interoperating with C (or even C++). Well, if you know the ABI, you can interoperate C struct {int a,b;}; with struct {struct {int a;} a; struct {int b;}}; So we don't interoperate everything. We may eventually want to define what kind of inter-operation we support. non-POD class layout is not always a natural extension of C struct layout as one would expect http://mentorembedded.github.io/cxx-abi/abi.html#layout so it may make sense to declare it uninteroperable if it helps something in real world (which I am not quite sure about) I am not really shooting for this. I just want to get something that would improve TBAA in pre-dominantly C++ programs (with some C code in them) such as firefox, chromium, openoffice or GCC. Because I see us giving up on TBAA very often when playing with cases that should be devirtualized by GVN/aggregate propagation in ipa-prop but aren't. Also given that C++ standard actually have notion of inter-module type equivalence it may be good idea to honnor it. ? But you are talking about the tree merging part using ODR info to also merge types which differ in completeness of contained pointer types, right? (exactly equal cases should be already merged) Actually I was speaking of canonical types here. I want to preserve more of TBAA via honoring ODR and local types. So, are you positive there will be a net gain in optimization when doing that? Please factor in the surprises you'll get when code gets miscompiled because of slight ODR violations or interoperability that no longer works. I want to change lto to not merge canonical types for pairs of types of same layout (i.e. equivalent in the current canonical type definition) but with different mangled names. Names are nothing ;) In C I very often see different _names_ used in headers vs. implementation (when the implementation uses a different internal header). You have struct Foo; in public headers vs. struct Foo_impl; in the implementation. Yes, I am aware of that - I had to go through all those issues with --combine code in mid 2000s. C++ is different. What I want is to make TBAA in between C++ types stronger and TBAA between C++ and other languages to do pretty much what we donot, if possible. I also want it to never merge when types are local. For inter-language TBAA we will need to ensure aliasing in between non-ODR type of same layout and all unmerged variants of ODR type. Can it be done by attaching chains of ODR types into the canonical type hash and when non-ODR type appears, just make it alias with all of them? No, how would that work? This is what I am asking you about. At the end of streaming process, we will have canonical type hash with one leader and list of ODR variants that was not merged based on ODR rules. If leader happens to be non-ODR, then it must be made to alias all of them. I think either we can rewrite all ODR cases to have the non-ODR as canonical type or make something more fine grained so we can force loads/stores through non-ODR types to alias with all of them, but loads/stores through ODR types to alias as within C++ compilation unit. It would make sense to ODR merge in tree merging, too, but I am not sure if this fits the current design, since you would need to merge SCC components of different shape then that seems hard, right? Right. You'd lose the nice incremental SCC merging (where we haven't even yet implemented the nicest way - avoid re-materializing the SCC until we know it prevails). Yes, SCC merging w/o re-materialization is a priority, indeed. It may be easier to ODR merge after streaming (during DECL fixup) just to make WPA streaming cheaper and to reduce debug info size. If you use -fdump-ipa-devirt, it will dump you ODR types that did not get merged (only ones with vtable pointers in them ATM) and there are quite long chains for firefox. Surely then hundreds of duplicated ODR types will end up in the ltrans partition streams and they eventually hit debug output machinery. Eric sent me
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 19, 2014 at 6:40 AM, Torvald Riegel trie...@redhat.com wrote: If all those other threads written in whichever way use the same memory model and ABI for synchronization (e.g., choice of HW barriers for a certain memory_order), it doesn't matter whether it's a hardware thread, microcode, whatever. In this case, C11 atomics should be fine. (We have this in userspace already, because correct compilers will have to assume that the code generated by them has to properly synchronize with other code generated by different compilers.) If the other threads use a different model, access memory entirely differently, etc, then we might be back to volatile because we don't know anything, and the very strict rules about execution steps of the abstract machine (ie, no as-if rule) are probably the safest thing to do. Oh, I don't even care about architectures that don't have real hardware atomics. So if there's a software protocol for atomics, all bets are off. The compiler almost certainly has to do atomics with function calls anyway, and we'll just plug in out own. And frankly, nobody will ever care, because those architectures aren't relevant, and never will be. Sure, there are some ancient Sparc platforms that only support a single-byte ldstub and there are some embedded chips that don't really do SMP, but have some pseudo-smp with special separate locking. Really, nobody cares. The C standard has that crazy lock-free atomic tests, and talks about address-free, but generally we require both lock-free and address-free in the kernel, because otherwise it's just too painful to do interrupt-safe locking, or do atomics in user-space (for futexes). So if your worry is just about software protocols for CPU's that aren't actually designed for modern SMP, that's pretty much a complete non-issue. Linus
Re: Update x86-64 PLT for MPX
On Mon, Jan 27, 2014 at 1:50 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Jan 27, 2014 at 1:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, Jan 18, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote: Hi, Here is the proposal to update x86-64 PLT for MPX. The linker change is implemented on hjl/mpx/pltext8 branch. Any comments/feedbacks? Thanks. -- H.J. --- Intel MPX: http://software.intel.com/en-us/file/319433-017pdf introduces 4 bound registers, which will be used for parameter passing in x86-64. Bound registers are cleared by branch instructions. Branch instructions with BND prefix will keep bound register contents. This leads to 2 requirements to 64-bit MPX run-time: 1. Dynamic linker (ld.so) should save and restore bound registers during symbol lookup. 2. Change the current 16-byte PLT0: ff 35 08 00 00 00pushq GOT+8(%rip) ff 25 00 10 00jmpq *GOT+16(%rip) 0f 1f 40 00nopl 0x0(%rax) and 16-byte PLT1: ff 25 00 00 00 00jmpq *name@GOTPCREL(%rip) 68 00 00 00 00 pushq $index e9 00 00 00 00 jmpq PLT0 which clear bound registers, to preserve bound registers. We use 2 new relocations: #define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */ #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */ to mark branch instructions with BND prefix. When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, it switches to a different PLT0: ff 35 08 00 00 00pushq GOT+8(%rip) f2 ff 25 00 10 00bnd jmpq *GOT+16(%rip) 0f 1f 00nopl (%rax) to preserve bound registers for symbol lookup and it also creates an external PLT section, .pl.bnd. Linker will create a BND PLT1 entry in .plt: 68 00 00 00 00 pushq $index f2 e9 00 00 00 00 bnd jmpq PLT0 0f 1f 44 00 00nopl 0(%rax,%rax,1) and a 8-byte BND PLT entry in .plt.bnd: f2 ff 25 00 00 00 00 bnd jmpq *name@GOTPCREL(%rip) 90nop Otherwise, linker will create a legacy PLT entry in .plt: 68 00 00 00 00 pushq $index e9 00 00 00 00jmpq PLT0 66 0f 1f 44 00 00 nopw 0(%rax,%rax,1) and a 8-byte legacy PLT in .plt.bnd: ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip) 66 90 xchg %ax,%ax The initial value of the GOT entry for name will be set to the the pushq instruction in the corresponding entry in .plt. Linker will resolve reference of symbol name to the entry in the second PLT, .plt.bnd. Prelink stores the offset of pushq of PLT1 (plt_base + 0x10) in GOT[1] and GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing the corresponding the pushq offset with GOT[1] + (GOT offset - GOT[3]) * 2 Since for each entry in .plt except for PLT0 we create a 8-byte entry in .plt.bnd, there is extra 8-byte per PLT symbol. We also investigated the 16-byte entry for .plt.bnd. We compared the 8-byte entry vs the the 16-byte entry for .plt.bnd on Sandy Bridge. There are no performance differences in SPEC CPU 2000/2006 as well as micro benchmarks. Pros: No change to undo prelink in dynamic linker. Only 8-byte memory overhead for each PLT symbol. Cons: Extra .plt.bnd section is needed. Extra 8 byte for legacy branches to PLT. GDB is unware of the new layout of .plt and .plt.bnd. Hi, I am enclosing the updated x86-64 psABI with PLT change. I checkeMy email is rejected due to PDF attachment. I am resubmitting it with out PDF file. d it onto hjl/mpx/master branch at https://github.com/hjl-tools/x86-64-psABI I will check in the binutils changes if there are no disagreements in 2 weeks. Thanks. My email is rejected due to PDF attachment. I am resubmitting it with out PDF file. I pushed the MPX binutils change into master: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=0ff2b86e7c14177ec7f9e1257f8e697814794017 -- H.J.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 19, 2014 at 06:55:51PM +0100, Torvald Riegel wrote: On Wed, 2014-02-19 at 07:14 -0800, Paul E. McKenney wrote: On Wed, Feb 19, 2014 at 11:59:08AM +0100, Torvald Riegel wrote: [ . . . ] On both sides, the compiler will see that mmap() (or similar) is called, so that means the data escapes to something unknown, which could create threads and so on. So first, it can't do whole-program analysis for this state anymore, and has to assume that other C11 threads are accessing this memory. Next, lock-free atomics are specified to be address-free, meaning that they must work independent of where in memory the atomics are mapped (see C++ (e.g., N3690) 29.4p3; that's a should and non-normative, but essential IMO). Thus, this then boils down to just a simple case of synchronization. (Of course, the rest of the ABI has to match too for the data exchange to work.) The compiler will see mmap() on the user side, but not on the kernel side. On the kernel side, something special is required. Maybe -- you'll certainly know better :) But maybe it's not that hard: For example, if the memory is in current code made available to userspace via calling some function with an asm implementation that the compiler can't analyze, then this should be sufficient. The kernel code would need to explicitly tell the compiler what portions of the kernel address space were covered by this. I would not want the compiler to have to work it out based on observing interactions with the page tables. ;-) Agree that address-free would be nice as shall rather than should. I echo Peter's question about how one tags functions like mmap(). I will also remember this for the next time someone on the committee discounts volatile. ;-) 5. JITed code produced based on BPF: https://lwn.net/Articles/437981/ This might be special, or not, depending on how the JITed code gets access to data. If this is via fixed addresses (e.g., (void*)0x123), then see above. If this is through function calls that the compiler can't analyze, then this is like 4. It could well be via the kernel reading its own symbol table, sort of a poor-person's reflection facility. I guess that would be for all intents and purposes equivalent to your (void*)0x123. If it is replacing code generated by the compiler, then yes. If the JIT is just filling in functions that had been undefined yet declared before, then the compiler will have seen the data escape through the function interfaces, and should be aware that there is other stuff. So one other concern would then be things things like ftrace, kprobes, ksplice, and so on. These rewrite the kernel binary at runtime, though in very limited ways. Yes. Nonetheless, I wouldn't see a problem if they, say, rewrite with C11-compatible code (and same ABI) on a function granularity (and when the function itself isn't executing concurrently) -- this seems to be similar to just having another compiler compile this particular function. Well, they aren't using C11-compatible code yet. They do patch within functions. And in some cases, they make staged sequences of changes to allow the patching to happen concurrently with other CPUs executing the code being patched. Not sure that any of the latter is actually in the kernel at the moment, but it has at least been prototyped and discussed. Thanx, Paul
Re: ARM inline assembly usage in Linux kernel
On 19 February 2014 11:58, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: I agree that having an unrecognised asm shouldn't be a hard error until assembly time though. Saleem, is the problem that this is being rejected earlier? Hi Andrew, Richard, Thanks for your reviews! We agree that we should actually just ignore the contents until object emission. Just for context, one of the reasons why we enabled inline assembly checks is for some obscure cases when the snippet changes the instructions set (arm - thumb) and the rest of the function becomes garbage. Our initial implementation was to always emit .arm/.thumb after *any* inline assembly, which would become a nop in the worst case. But since we had easy access to the assembler, we thought: why not?. The idea is now to try to parse the snippet for cases like .arm/.thumb but only emit a warning IFF -Wbad-inline-asm (or whatever) is set (and not to make it on by default), otherwise, ignore. We're hoping our assembler will be able to cope with the multiple levels of indirection automagically. ;) Thanks again! --renato
Re: ARM inline assembly usage in Linux kernel
On Wed, Feb 19, 2014 at 3:17 PM, Renato Golin renato.go...@linaro.org wrote: On 19 February 2014 11:58, Richard Sandiford rsand...@linux.vnet.ibm.com wrote: I agree that having an unrecognised asm shouldn't be a hard error until assembly time though. Saleem, is the problem that this is being rejected earlier? Hi Andrew, Richard, Thanks for your reviews! We agree that we should actually just ignore the contents until object emission. Just for context, one of the reasons why we enabled inline assembly checks is for some obscure cases when the snippet changes the instructions set (arm - thumb) and the rest of the function becomes garbage. Our initial implementation was to always emit .arm/.thumb after *any* inline assembly, which would become a nop in the worst case. But since we had easy access to the assembler, we thought: why not?. With the unified assembly format, you should not need those .arm/.thumb and in fact emitting them can make things even worse. Thanks, Andrew Pinski The idea is now to try to parse the snippet for cases like .arm/.thumb but only emit a warning IFF -Wbad-inline-asm (or whatever) is set (and not to make it on by default), otherwise, ignore. We're hoping our assembler will be able to cope with the multiple levels of indirection automagically. ;) Thanks again! --renato
Re: ARM inline assembly usage in Linux kernel
On 19 February 2014 23:19, Andrew Pinski pins...@gmail.com wrote: With the unified assembly format, you should not need those .arm/.thumb and in fact emitting them can make things even worse. If only we could get rid or all pre-UAL inline assembly on the planet... :) The has been the only reason why we added support for those in our assembler, because GAS supports them and people still use (or have legacy code they won't change). If the binutils folks (and you guys) are happy to start seriously de-phasing pre-UAL support, I'd be more than happy to do so on our end. Do you think I should start that conversation on the binutils list? Maybe a new serious compulsory warning, to start? cheers, --renato
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, Feb 18, 2014 at 11:47 AM, Torvald Riegel trie...@redhat.com wrote: On Tue, 2014-02-18 at 09:44 -0800, Linus Torvalds wrote: Can you point to it? Because I can find a draft standard, and it sure as hell does *not* contain any clarity of the model. It has a *lot* of verbiage, but it's pretty much impossible to actually understand, even for somebody who really understands memory ordering. http://www.cl.cam.ac.uk/~mjb220/n3132.pdf This has an explanation of the model up front, and then the detailed formulae in Section 6. This is from 2010, and there might have been smaller changes since then, but I'm not aware of any bigger ones. Ahh, this is different from what others pointed at. Same people, similar name, but not the same paper. I will read this version too, but from reading the other one and the standard in parallel and trying to make sense of it, it seems that I may have originally misunderstood part of the whole control dependency chain. The fact that the left side of ? :, and || breaks data dependencies made me originally think that the standard tried very hard to break any control dependencies. Which I felt was insane, when then some of the examples literally were about the testing of the value of an atomic read. The data dependency matters quite a bit. The fact that the other Mathematical paper then very much talked about consume only in the sense of following a pointer made me think so even more. But reading it some more, I now think that the whole data dependency logic (which is where the special left-hand side rule of the ternary and logical operators come in) are basically an exception to the rule that sequence points end up being also meaningful for ordering (ok, so C11 seems to have renamed sequence points to sequenced before). So while an expression like atomic_read(p, consume) ? a : b; doesn't have a data dependency from the atomic read that forces serialization, writing if (atomic_read(p, consume)) a; else b; the standard *does* imply that the atomic read is happens-before wrt a, and I'm hoping that there is no question that the control dependency still acts as an ordering point. THAT was one of my big confusions, the discussion about control dependencies and the fact that the logical ops broke the data dependency made me believe that the standard tried to actively avoid the whole issue with control dependencies can break ordering dependencies on some CPU's due to branch prediction and memory re-ordering by the CPU. But after all the reading, I'm starting to think that that was never actually the implication at all, and the logical ops breaks the data dependency rule is simply an exception to the sequence point rule. All other sequence points still do exist, and do imply an ordering that matters for consume Am I now reading it right? So the clarification is basically to the statement that the if (consume(p)) a version *would* have an ordering guarantee between the read of p and a, but the consume(p) ? a : b would *not* have such an ordering guarantee. Yes? Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 19, 2014 at 04:53:49PM -0800, Linus Torvalds wrote: On Tue, Feb 18, 2014 at 11:47 AM, Torvald Riegel trie...@redhat.com wrote: On Tue, 2014-02-18 at 09:44 -0800, Linus Torvalds wrote: Can you point to it? Because I can find a draft standard, and it sure as hell does *not* contain any clarity of the model. It has a *lot* of verbiage, but it's pretty much impossible to actually understand, even for somebody who really understands memory ordering. http://www.cl.cam.ac.uk/~mjb220/n3132.pdf This has an explanation of the model up front, and then the detailed formulae in Section 6. This is from 2010, and there might have been smaller changes since then, but I'm not aware of any bigger ones. Ahh, this is different from what others pointed at. Same people, similar name, but not the same paper. I will read this version too, but from reading the other one and the standard in parallel and trying to make sense of it, it seems that I may have originally misunderstood part of the whole control dependency chain. The fact that the left side of ? :, and || breaks data dependencies made me originally think that the standard tried very hard to break any control dependencies. Which I felt was insane, when then some of the examples literally were about the testing of the value of an atomic read. The data dependency matters quite a bit. The fact that the other Mathematical paper then very much talked about consume only in the sense of following a pointer made me think so even more. But reading it some more, I now think that the whole data dependency logic (which is where the special left-hand side rule of the ternary and logical operators come in) are basically an exception to the rule that sequence points end up being also meaningful for ordering (ok, so C11 seems to have renamed sequence points to sequenced before). So while an expression like atomic_read(p, consume) ? a : b; doesn't have a data dependency from the atomic read that forces serialization, writing if (atomic_read(p, consume)) a; else b; the standard *does* imply that the atomic read is happens-before wrt a, and I'm hoping that there is no question that the control dependency still acts as an ordering point. The control dependency should order subsequent stores, at least assuming that a and b don't start off with identical stores that the compiler could pull out of the if and merge. The same might also be true for ?: for all I know. (But see below) That said, in this case, you could substitute relaxed for consume and get the same effect. The return value from atomic_read() gets absorbed into the if condition, so there is no dependency-ordered-before relationship, so nothing for consume to do. One caution... The happens-before relationship requires you to trace a full path between the two operations of interest. This is illustrated by the following example, with both x and y initially zero: T1: atomic_store_explicit(x, 1, memory_order_relaxed); r1 = atomic_load_explicit(y, memory_order_relaxed); T2: atomic_store_explicit(y, 1, memory_order_relaxed); r2 = atomic_load_explicit(x, memory_order_relaxed); There is a happens-before relationship between T1's load and store, and another happens-before relationship between T2's load and store, but there is no happens-before relationship from T1 to T2, and none in the other direction, either. And you don't get to assume any ordering based on reasoning about these two disjoint happens-before relationships. So it is quite possible for r1==1r2==1 after both threads complete. Which should be no surprise: This misordering can happen even on x86, which would need a full smp_mb() to prevent it. THAT was one of my big confusions, the discussion about control dependencies and the fact that the logical ops broke the data dependency made me believe that the standard tried to actively avoid the whole issue with control dependencies can break ordering dependencies on some CPU's due to branch prediction and memory re-ordering by the CPU. But after all the reading, I'm starting to think that that was never actually the implication at all, and the logical ops breaks the data dependency rule is simply an exception to the sequence point rule. All other sequence points still do exist, and do imply an ordering that matters for consume Am I now reading it right? As long as there is an unbroken chain of -data- dependencies from the consume to the later access in question, and as long as that chain doesn't go through the excluded operations, yes. So the clarification is basically to the statement that the if (consume(p)) a version *would* have an ordering guarantee between the read of p and a, but the consume(p) ? a : b would *not* have such an ordering guarantee. Yes? Neither has a data-dependency guarantee, because there is no data dependency from the load to either a or b.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 19, 2014 at 8:01 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: The control dependency should order subsequent stores, at least assuming that a and b don't start off with identical stores that the compiler could pull out of the if and merge. The same might also be true for ?: for all I know. (But see below) Stores I don't worry about so much because (a) you can't sanely move stores up in a compiler anyway (b) no sane CPU or moves stores up, since they aren't on the critical path so a read-cmp-store is actually really hard to make anything sane re-order. I'm sure it can be done, and I'm sure it's stupid as hell. But that it's hard to screw up is *not* true for a load-cmp-load. So lets make this really simple: if you have a consume-cmp-read, is the ordering of the two reads guaranteed? As long as there is an unbroken chain of -data- dependencies from the consume to the later access in question, and as long as that chain doesn't go through the excluded operations, yes. So let's make it *really* specific, and make it real code doing a real operation, that is actually realistic and reasonable in a threaded environment, and may even be in some critical code. The issue is the read-side ordering guarantee for 'a' and 'b', for this case: - Initial state: a = b = 0; - Thread 1 (consumer): if (atomic_read(a, consume)) return b; /* not yet initialized */ return -1; - Thread 2 (initializer): b = some_value_lets_say_42; /* We are now ready to party */ atomic_write(a, 1, release); and quite frankly, if there is no ordering guarantee between the read of a and the read of b in the consumer thread, then the C atomics standard is broken. Put another way: I claim that if thread 1 ever sees a return value other than -1 or 42, then the whole definition of atomics is broken. Question 2: and what changes if the atomic_read() is turned into an acquire, and why? Does it start working? Neither has a data-dependency guarantee, because there is no data dependency from the load to either a or b. After all, the value loaded got absorbed into the if condition. However, according to discussions earlier in this thread, the if variant would have a control-dependency ordering guarantee for any stores in a and b (but not loads!). So exactly what part of the standard allows the loads to be re-ordered, and why? Quite frankly, I'd think that any sane person will agree that the above code snippet is realistic, and that my requirement that thread 1 sees either -1 or 42 is valid. And if the C standards body has said that control dependencies break the read ordering, then I really think that the C standards committee has screwed up. If the consumer of an atomic load isn't a pointer chasing operation, then the consume should be defined to be the same as acquire. None of this conditionals break consumers. No, conditionals on the dependency path should turn consumers into acquire, because otherwise the consume load is dangerous as hell. And if the definition of acquire doesn't include the control dependency either, then the C atomic memory model is just completely and utterly broken, since the above *trivial* and clearly useful example is broken. I really think the above example is pretty damn black-and-white. Either it works, or the standard isn't worth wiping your ass with. Linus
Re: Building GCC with -Wmissing-declarations and addressing its warnings
On 13 February 2014 20:47, Patrick Palka wrote: On a related note, would a patch to officially enable -Wmissing-declarations in the build process be well regarded? What would be the advantage? Since -Wmissing-prototypes is currently enabled, I assume it is the intention of the GCC devs to address these warnings, and that during the transition from a C to C++ bootstrap compiler a small oversight was made (that -Wmissing-prototypes is a no-op against C++ source files). The additional safety provided by -Wmissing-prototypes is already guaranteed for C++. In C a missing prototype causes the compiler to guess, probably incorrectly, how to call the function. In C++ a function cannot be called without a previous declaration and the linker will notice if you declare a function with one signature and define it differently.
[Bug c++/60258] Member initialization for atomic fail.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60258 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-19 Ever confirmed|0 |1
[Bug c/57896] [4.8 Regression] ICE in expand_expr_real_2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org Target Milestone|--- |4.8.4 Summary|ICE in expand_expr_real_2 |[4.8 Regression] ICE in ||expand_expr_real_2 --- Comment #7 from Marek Polacek mpolacek at gcc dot gnu.org --- Currently, I see ICE only with 4.8: ./cc1 xx.c -m32 -quiet -march=x86-64 In function ‘__get_cpuid_max’: Segmentation fault } ^ 0xa3924f crash_signal /home/marek/src/gcc/gcc/toplev.c:332 0x11fc5fe pp_base_string(pretty_print_info*, char const*) /home/marek/src/gcc/gcc/pretty-print.c:835 0x11fd0a4 pp_base_format(pretty_print_info*, text_info*) /home/marek/src/gcc/gcc/pretty-print.c:496 0x11fa7ac diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*) /home/marek/src/gcc/gcc/diagnostic.c:755 0x11faabb internal_error(char const*, ...) /home/marek/src/gcc/gcc/diagnostic.c:1094 0x9b9aa7 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*) /home/marek/src/gcc/gcc/rtl.c:773 0xc93a5d pro_epilogue_adjust_stack /home/marek/src/gcc/gcc/config/i386/i386.c:9517 0xca0b20 ix86_expand_prologue() /home/marek/src/gcc/gcc/config/i386/i386.c:10491 0xd645da gen_prologue() /home/marek/src/gcc/gcc/config/i386/i386.md:11810 0x80dc79 thread_prologue_and_epilogue_insns /home/marek/src/gcc/gcc/function.c:5949 0x8110a2 rest_of_handle_thread_prologue_and_epilogue /home/marek/src/gcc/gcc/function.c:6973 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions.
[Bug target/57896] [4.8 Regression] ICE in expand_expr_real_2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Component|c |target --- Comment #8 from Marek Polacek mpolacek at gcc dot gnu.org --- Not a C FE issue.
[Bug c/60170] No -Wtype-limits warning with -O1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60170 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-19 CC||mpolacek at gcc dot gnu.org Ever confirmed|0 |1
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||mpolacek at gcc dot gnu.org Resolution|--- |INVALID --- Comment #4 from Marek Polacek mpolacek at gcc dot gnu.org --- There's no good reproducer, but if -fno-aggressive-loop-optimizations helps, most likely the code invokes undefined behavior. Tentatively closing as invalid (PR59982 was closed too).
[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 32167 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32167action=edit gcc49-pr60267.patch Untested fix for the preprocessing ICE. So, with this patch you should be able to preprocess the file now.
[Bug rtl-optimization/60268] [4.9 regression] ICE: in rank_for_schedule, at haifa-sched.c:2557
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60268 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0
[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- Similar to recently fixed maybe this is either the call fn or the static chain?
[Bug fortran/59537] Automatic array cannot have an initializer, for -finit-real and a SAVE statement present in subroutine
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59537 --- Comment #1 from Lorenz Hüdepohl bugs at stellardeath dot org --- Maybe related to the already fixed #51800?
[Bug fortran/59537] Automatic array cannot have an initializer, for -finit-real and a SAVE statement present in subroutine
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59537 Dominique d'Humieres dominiq at lps dot ens.fr changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-19 Ever confirmed|0 |1 --- Comment #2 from Dominique d'Humieres dominiq at lps dot ens.fr --- Maybe related to the already fixed #51800? Maybe, but this PR is still present on trunk (4.9 r207856), 4.7.4 and 4.8.2.
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #8 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Wed Feb 19 09:29:34 2014 New Revision: 207879 URL: http://gcc.gnu.org/viewcvs?rev=207879root=gccview=rev Log: 2014-02-19 Richard Biener rguent...@suse.de PR ipa/60243 * ipa-prop.c: Include stringpool.h and tree-ssanames.h. (ipa_modify_call_arguments): Emit an argument load explicitely and preserve virtual SSA form there and for the replacement call. Do not update SSA form nor free dominance info. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-prop.c
[Bug rtl-optimization/60268] [4.9 regression] ICE: in rank_for_schedule, at haifa-sched.c:2557
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60268 Andrey Belevantsev abel at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |abel at gcc dot gnu.org --- Comment #3 from Andrey Belevantsev abel at gcc dot gnu.org --- I'm out of office today, so I'll have a look properly tomorrow, but... (In reply to Jakub Jelinek from comment #2) So perhaps: --- gcc/haifa-sched.c 2014-02-18 08:18:53.045024428 +0100 +++ gcc/haifa-sched.c 2014-02-19 07:58:38.191381581 +0100 @@ -2550,7 +2550,7 @@ rank_for_schedule (const void *x, const return INSN_LUID (tmp) - INSN_LUID (tmp2); } - if (live_range_shrinkage_p) + if (live_range_shrinkage_p sched_pressure != SCHED_PRESSURE_NONE) { /* Don't use SCHED_PRESSURE_MODEL -- it results in much worse code. */ ... the fired assert below this code means that we have turned off sched-pressure on the new region (unexpectedly to live_range_shrinkage) and I'd like to know how this region was added. I guess I missed some entry point within the new scheduler code when fixing the previous PR. BTW, why if (sched_pressure != SCHED_PRESSURE_NONE) free_global_sched_pressure_data (); when free_global_sched_pressure_data () contains the same guard and thus could be called unconditionally? Pilot error while being over cautious, I will simplify that too.
[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2014-02-19 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 32168 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32168action=edit gcc49-pr60267-2.patch Untested fix for the tsubst ICE. Of course, without preprocessed testcase I can't be sure if this patch fixed it.
[Bug c/59193] Unused postfix operator temporaries
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193 Manuel López-Ibáñez manu at gcc dot gnu.org changed: What|Removed |Added CC||manu at gcc dot gnu.org --- Comment #5 from Manuel López-Ibáñez manu at gcc dot gnu.org --- (In reply to Max TenEyck Woodbury from comment #4) Since there are hundreds, if not thousands of instances of this defect in the GCC code and there is no urgency in correcting these defects, this bug will only get resolved slowly. Closing it for invalid reasons does the community a disservice. Are you planning to help in fixing these and other problems? If so, please start the copyright assignment process: http://gcc.gnu.org/contribute.html#legal Then, to get your feet wet, it would be better to start with some uncontroversial bugs like: PR25801, or PR55080, or PR57622 or PR52347. I have a long list of easy hacks that would help a lot GCC and its users.
[Bug c++/60269] New: #pragma simd tsubst related ICE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60269 Bug ID: 60269 Summary: #pragma simd tsubst related ICE Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org CC: bviyer at gcc dot gnu.org template int N void foo (int *a, int *b, int *c) { #pragma simd vectorlength (N) for (int i = 0; i N; i++) a[i] = b[i] * c[i]; } void bar (int *a, int *b, int *c) { foo 64 (a, b, c); } ICEs with -fcilkplus, so clearly tsubst on it is not performed correctly.
[Bug target/60204] struct with __m512i is mishandled in function parameter passing and return
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60204 Uroš Bizjak ubizjak at gmail dot com changed: What|Removed |Added CC||ubizjak at gmail dot com Target Milestone|--- |4.9.0 Severity|normal |major --- Comment #3 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to H.J. Lu from comment #2) It is a bug in psABI. It should read as eight \eightbytes. So, let's fix the psABI first and then fix gcc. This PR should be resolved before 4.9 is released.
[Bug target/60204] struct with __m512i is mishandled in function parameter passing and return
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60204 --- Comment #4 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Uroš Bizjak from comment #3) So, let's fix the psABI first and then fix gcc. psABI is fixed in [1]. [1] https://github.com/hjl-tools/x86-64-psABI/commit/6d7ccd614fe67111d2aecec853c3df0310b372d2
[Bug target/57232] wcstol.c:213:1: internal compiler error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57232 Alexandre Oliva aoliva at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |aoliva at gcc dot gnu.org --- Comment #15 from Alexandre Oliva aoliva at gcc dot gnu.org --- Mine. It looks like the call to cselib_preserve_cfa_base_value in vt_initialize should be guarded by some conditions, like this: if (reg != hard_frame_pointer_rtx fixed_regs[REGNO (reg)]) cselib_preserve_cfa_base_value (val, REGNO (reg)); This fixes the reported problem for me (though I haven't otherwise regtested it). Daniel, Jon, Nick, Sebastian, does it fix the problem for you?
[Bug c/59193] Unused postfix operator temporaries
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |INVALID --- Comment #6 from Andrew Pinski pinskia at gcc dot gnu.org --- Take: int f(int a) { a++; return a; } int g(int a) { ++a; return a; } --- CUT --- The gimplifier produces the exact same IR for both cases: f (int a) { int D.1790; a = a + 1; D.1790 = a; return D.1790; } g (int a) { int D.1792; a = a + 1; D.1792 = a; return D.1792; } --- CUT --- So the compiler is already smart enough to remove the temporary storage even at -O0. With: int f(int a) { int b = a++; return a; } It does not remove it but that is because the result of a++ is not unused. So it is the gimplifier knows if the result is unused and will not use them otherwise. This is the same issue as memcpy and its return value.
[Bug fortran/60255] Deferred character length variable at (1) cannot yet be associated with unlimited polymorphic entities
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60255 Dominique d'Humieres dominiq at lps dot ens.fr changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-19 Ever confirmed|0 |1
[Bug fortran/60238] Allow colon-separated triplet in array initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60238 Dominique d'Humieres dominiq at lps dot ens.fr changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2014-02-19 Ever confirmed|0 |1 --- Comment #2 from Dominique d'Humieres dominiq at lps dot ens.fr --- Anybody against closing this PR as WONTFIX?
[Bug c/59193] Unused postfix operator temporaries
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59193 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org --- Also note that even at -O0, at no point during compilation with GCC/G++ if the value of ++a or a++ isn't used and a has integral/pointer type one form is more efficient than the other. It is just a different tree code ({PRE,POST}{IN,DE}CREMENT_EXPR), but with the same operand, type etc. So, there is no waste of any resources, it is not a defect to use either style, it is purely coding convention matter.
[Bug libstdc++/60270] New: [C++1y] std::quoted is too eager to clear the string
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60270 Bug ID: 60270 Summary: [C++1y] std::quoted is too eager to clear the string Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Lars Gullik Bjønnes pointed out that we clear the string too early in std::quoted, which causes this to fail: #include string #include sstream #include iomanip #include cassert int main() { std::istringstream in; std::string s = xxx; in s; assert( !s.empty() ); in std::quoted(s); assert( !s.empty() ); // fails }
[Bug libstdc++/60271] New: [C++1y] std::max(initializer_listT) cannot use std::max_element
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60271 Bug ID: 60271 Summary: [C++1y] std::max(initializer_listT) cannot use std::max_element Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3893.html#2350 (approved in Issaquah) adds constexpr to std::max(initializer_listT), which means we can't use std::max_element to implement it.
[Bug target/60204] struct with __m512i is mishandled in function parameter passing and return
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60204 --- Comment #5 from tocarip.intel at gmail dot com --- Created attachment 32169 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32169action=edit Proposed patch. Currently testing attached patch.
[Bug c++/60272] New: atomic::compare_exchange_weak has spurious store and can cause race conditions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272 Bug ID: 60272 Summary: atomic::compare_exchange_weak has spurious store and can cause race conditions Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: anthony.ajw at gmail dot com Created attachment 32170 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32170action=edit Sample code that demonstrates the problem G++ 4.8.1 is producing incorrect code for std::atomic::compare_exchange_weak on x86-64 linux. In particular, if the exchange succeeds, then there is an additional spurious store to the expected parameter after the exchange, which may race with other threads and cause problems. e.g. #include atomic struct Node { Node* next; }; void Push(std::atomicNode* head, Node* node) { node-next = head.load(); while(!head.compare_exchange_weak(node-next, node)) ; } When compiled with g++-4.8 -S -std=c++11 -pthread -O3 t.cpp the generated code is: movq(%rdi), %rax movq%rax, (%rsi) movq(%rsi), %rax .p2align 4,,10 .p2align 3 .L3: lock; cmpxchgq%rsi, (%rdi) movq%rax, (%rsi) *** jne.L3 rep; ret The line marked *** is an unconditional store to node-next in this example, and will be executed even if the exchange is successful. This will cause a race with code that uses the compare-exchange to order memory operations. e.g. void Pop(std::atomicNode* head){ for(;;){ Node* value=head.exchange(nullptr); if(value){ delete value; break; } } } If the exchange successfully retrieves a non-null value, it should be OK to delete it (assuming the node was allocated with new). However, if one thread is calling Push() and is suspended after the CMPXCHG and before the line marked *** is executed then another thread running Pop() can successfully complete the exchange and call delete. When the first thread is resumed, the line marked *** will then store to deallocated memory. This is in contradiction to 29.6.5p21 of the C++ Standard, which states that expected is only updated in the case of failure.
[Bug c++/60272] atomic::compare_exchange_weak has spurious store and can cause race conditions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-19 Ever confirmed|0 |1
[Bug tree-optimization/60172] ARM performance regression from trunk@207239
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 --- Comment #10 from Joey Ye joey.ye at arm dot com --- (In reply to rguent...@suse.de from comment #9) On Mon, 17 Feb 2014, joey.ye at arm dot com wrote: But that doesn't make sense - it means that -fdisable-tree-forwprop4 should get numbers back to good speed, no? Because that's the only change forwprop4 does. -fdisable-tree-forwprop4 dooms other transformation and results slightly worse code than before. So the number isn't back to the best. I think forwprop4 does some good stuff here and disabling it isn't the solution. For completeness please base checks on r207316 (it contains a fix for the blamed revision, but as far as I can see it shouldn't make a difference for the testcase). I'm playing with r207686 and it is the same for this case. Did you check whether my hackish patch fixes things? I did with trunk 20140208. But it didn't make any difference to Proc_8
[Bug tree-optimization/60172] ARM performance regression from trunk@207239
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 --- Comment #11 from Joey Ye joey.ye at arm dot com --- Repost from another record. It is annoying that after commenting one record it automatically jumps to the next. Here is good expansion: ;; _41 = _42 * 4; (insn 20 19 0 (set (reg:SI 126 [ D.5038 ]) (ashift:SI (reg/v:SI 131 [ Int_1_Par_Val ]) (const_int 2 [0x2]))) -1 (nil)) ;; _40 = _2 + _41; (insn 21 20 22 (set (reg:SI 136 [ D.5035 ]) (plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ]) (reg:SI 119 [ D.5036 ]))) -1 (nil)) (insn 22 21 0 (set (reg/f:SI 125 [ D.5035 ]) (plus:SI (reg:SI 136 [ D.5035 ]) (reg:SI 126 [ D.5038 ]))) -1 (nil)) ;; MEM[(int[25] *)_51 + 20B] = _34; (insn 29 28 30 (set (reg:SI 139) (plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ]) (reg:SI 119 [ D.5036 ]))) Proc_8.c:23 -1 (nil)) (insn 30 29 31 (set (reg:SI 140) (plus:SI (reg:SI 139) (reg:SI 126 [ D.5038 ]))) Proc_8.c:23 -1 (nil)) (insn 31 30 32 (set (reg/f:SI 141) (plus:SI (reg:SI 140) (const_int 1000 [0x3e8]))) Proc_8.c:23 -1 (nil)) (insn 32 31 0 (set (mem:SI (plus:SI (reg/f:SI 141) (const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32]) (reg:SI 124 [ D.5039 ])) Proc_8.c:23 -1 (nil)) After cse1 140 can be replaced by 125, thus lead a series of transformation make it much more efficient. Here is bad expansion: ;; _40 = Arr_2_Par_Ref_22(D) + _12; (insn 22 21 23 (set (reg:SI 138 [ D.5038 ]) (plus:SI (reg:SI 128 [ D.5038 ]) (reg:SI 121 [ D.5036 ]))) -1 (nil)) (insn 23 22 0 (set (reg/f:SI 127 [ D.5035 ]) (plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ]) (reg:SI 138 [ D.5038 ]))) -1 (nil)) ;; _32 = _20 + 1000; (insn 29 28 0 (set (reg:SI 124 [ D.5038 ]) (plus:SI (reg:SI 121 [ D.5036 ]) (const_int 1000 [0x3e8]))) Proc_8.c:23 -1 (nil)) ;; MEM[(int[25] *)_51 + 20B] = _34; (insn 32 31 33 (set (reg:SI 141) (plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ]) (reg:SI 124 [ D.5038 ]))) Proc_8.c:23 -1 (nil)) (insn 33 32 34 (set (reg/f:SI 142) (plus:SI (reg:SI 141) (reg:SI 128 [ D.5038 ]))) Proc_8.c:23 -1 (nil)) (insn 34 33 0 (set (mem:SI (plus:SI (reg/f:SI 142) (const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32]) (reg:SI 126 [ D.5039 ])) Proc_8.c:23 -1 (nil)) Here cse doesn't happen, resulting in less optimal insns. Reason why cse doesn't happen is unclear yet.
[Bug tree-optimization/54742] Switch elimination in FSM loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742 --- Comment #36 from Joey Ye joey.ye at arm dot com --- Please ignore previous comment as it shouldn't be here.
[Bug fortran/60232] [OOP] The rank of the element in the structure constructor does not match that of the component
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60232 --- Comment #5 from janus at gcc dot gnu.org --- Author: janus Date: Wed Feb 19 11:52:39 2014 New Revision: 207896 URL: http://gcc.gnu.org/viewcvs?rev=207896root=gccview=rev Log: 2014-02-19 Janus Weil ja...@gcc.gnu.org PR fortran/60232 * expr.c (gfc_get_variable_expr): Don't add REF_ARRAY for dimensionful functions, which are used as procedure pointer target. 2014-02-19 Janus Weil ja...@gcc.gnu.org PR fortran/60232 * gfortran.dg/typebound_proc_33.f90: New. Added: trunk/gcc/testsuite/gfortran.dg/typebound_proc_33.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/expr.c trunk/gcc/testsuite/ChangeLog
[Bug fortran/60232] [OOP] The rank of the element in the structure constructor does not match that of the component
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60232 janus at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from janus at gcc dot gnu.org --- Fixed on trunk with r207896. Closing. Thanks for the report!
[Bug fortran/60255] Deferred character length variable at (1) cannot yet be associated with unlimited polymorphic entities
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60255 --- Comment #4 from janus at gcc dot gnu.org --- Antony, is it possible for you to try the patch in comment 2, in order to check if it produces the expected runtime behavior for your code?
[Bug debug/56563] no debuginfo for explicit operator
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56563 Mark Wielaard mark at gcc dot gnu.org changed: What|Removed |Added CC||mark at gcc dot gnu.org --- Comment #1 from Mark Wielaard mark at gcc dot gnu.org --- PR debug/37959 did add support for explicit constructors by adding a LANG_HOOKS_FUNCTION_DECL_EXPLICIT_P that dwarf2out uses to tag functions with DW_AT_explicit attributes.
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 --- Comment #5 from Mark Warner warnerme at ptd dot net --- sizeof(NSQ_del_dec_struct) / sizeof(opus_int32) is guaranteed to produced a even number with a remainder of 0. Note the __attribute__ ((__aligned__ (8))) to make it a multiple of 8 in size.
[Bug c++/60272] atomic::compare_exchange_weak has spurious store and can cause race conditions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||rth at gcc dot gnu.org, ||torvald at gcc dot gnu.org --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Even our __atomic_compare_exchange* documentation states that: If they are not equal, the current contents of @code{*@var{ptr}} is written into @code{*@var{expected}}. But then expand_builtin_atomic_compare_exchange doesn't care: oldval = expect; if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : target), oldval, mem, oldval, desired, is_weak, success, failure)) return NULL_RTX; if (oldval != expect) emit_move_insn (expect, oldval); That effectively means that expect will be stored unconditionally. So, either we'd need to change this function, so that it sets oldval to NULL_RTX first, and passes ..., oldval, mem, expected, ... and needs to also always ask for target, then conditionally on target store to expected, or perhaps add extra parameter to expand_atomic_compare_and_swap and do the store only conditionally in that case. Richard/Torvald?
[Bug c++/60273] New: gcc gets confused when one class uses variadic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60273 Bug ID: 60273 Summary: gcc gets confused when one class uses variadic Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: walter.mascarenhas at gmail dot com Created attachment 32171 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32171action=edit gcc asked me to submit this file // // When compiling this file in Ubuntu 13.04, gcc 4.8.1 crashes with the // following message. Clang 3.0 compiles the file with no problems. // // /home/walter/code/klein/tests/platform/gcc_bug/main.cc:-1: In instantiation of 'struct BarFooint ': // /home/walter/code/klein/tests/platform/gcc_bug/main.cc:25: required from here // /home/walter/code/klein/tests/platform/gcc_bug/main.cc:20: internal compiler error: Segmentation fault // templateint... Ns using Buggy = typename X::template SNs...; // :-1: error: [main.o] Error 1^ // struct A {}; template class X struct Foo { using Type = int; // if the next line is replaced by template int... N then all is fine template int N1, int N2 using S = A; }; template class X struct Bar { // if the next line is commented then all is fine. using Type = typename X::Type; // if the next line is commented then all is fine. templateint... Ns using Buggy = typename X::template SNs...; }; void foobar() { Bar Fooint bf; }
[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267 --- Comment #7 from Sylwester Arabas slayoo at staszic dot waw.pl --- Created attachment 32172 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32172action=edit preprocessed source trigerring ICE with g++ snapshot 20140212 Thanks a lot for looking at it. I'm attaching the source proprocessed with g++ 4.8.2. It gives the tsubst_copy ICE with the 20140212 g++ snapshot. HTH, Sylwester
[Bug c++/60274] New: String as template parameter - regression in 4.8.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60274 Bug ID: 60274 Summary: String as template parameter - regression in 4.8.2 Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ondrej.kolacek1 at centrum dot cz Greetings, this issue happened to me with Debian's g++ so it is possible it is just their bug but hopefully (well this is debatable :) ) it is not. begin file test.cpp typedef const char *const ProtocolIdType; //typedef int ProtocolIdType; template ProtocolIdType protocolId class C { public: typedef int ProtocolVersion; class D { public: ProtocolVersion GetProtocolVersion(); }; }; template ProtocolIdType protocolId typename CprotocolId::ProtocolVersion CprotocolId::D::GetProtocolVersion() { return 1; } int main(void) { } end file test.cpp g++ test.cpp test.cpp:18:41: error: prototype for ‘typename CprotocolId::ProtocolVersion CprotocolId::D::GetProtocolVersion()’ does not match any in class ‘CprotocolId::D’ typename CprotocolId::ProtocolVersion CprotocolId::D::GetProtocolVersion() ^ test.cpp:13:19: error: candidate is: CprotocolId::ProtocolVersion CprotocolId::D::GetProtocolVersion() ProtocolVersion GetProtocolVersion(); The code used to work for ages, is compilable with MSVC, clang and gcc on various platforms, was compilable with 4.8.1 but broke with 4.8.2. The issue is with string template parameter; replacing typedef const char *const ProtocolIdType; by typedef int ProtocolIdType; makes the error go away. g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.8.2-15' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.8.2 (Debian 4.8.2-15)
[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267 --- Comment #8 from Sylwester Arabas slayoo at staszic dot waw.pl --- BTW, I have initially reported it as a comment to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60198 (the same file/line in ICE error message). S.
[Bug c++/60274] [4.8/4.9 Regression] String as template parameter - regression in 4.8.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60274 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P1 Status|UNCONFIRMED |NEW Known to work||4.8.2 Keywords||rejects-valid Last reconfirmed||2014-02-19 Ever confirmed|0 |1 Summary|String as template |[4.8/4.9 Regression] String |parameter - regression in |as template parameter - |4.8.2 |regression in 4.8.3 Target Milestone|--- |4.8.3 Known to fail||4.8.3, 4.9.0 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- Actually 4.8.2 works but the top of the branch doesn't.
[Bug sanitizer/60275] New: [UBSAN] Add -f[no-]sanitize-recover/-fsanitize-undefined-trap-on-error to make UBSAN's runtime errors fatal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60275 Bug ID: 60275 Summary: [UBSAN] Add -f[no-]sanitize-recover/-fsanitize-undefined-trap-on-e rror to make UBSAN's runtime errors fatal Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, mpolacek at gcc dot gnu.org While I personally would like to see more fine tuning via UBSAN_FLAGS - similar to ASAN, LSAN and TSAN, adding CLANG's -fsanitize-recover/-fno-sanitize-recover and -fsanitize-undefined-trap-on-error would be useful as additional feature. From CLANG: Extra features of UndefinedBehaviorSanitizer: - ``-fno-sanitize-recover``: By default, after a sanitizer diagnoses an issue, it will attempt to continue executing the program if there is a reasonable behavior it can give to the faulting operation. This option causes the program to abort instead. - ``-fsanitize-undefined-trap-on-error``: Causes traps to be emitted rather than calls to runtime libraries when a problem is detected. This option is intended for use in cases where the sanitizer runtime cannot be used (for instance, when building libc or a kernel module). This is only compatible with the sanitizers in the ``undefined-trap`` group. That would be BUILT_IN_UNREACHABLE and BUILT_IN_TRAP. (But unreachable shouldn't be dressed by SANITIZE_UNREACHABLE ;-) See also LLVM's * tools/clang/docs/UsersManual.rst * tools/clang/lib/CodeGen/CGExpr.cpp (search for SanitizeUndefinedTrapOnError and SanitizeRecover)
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 --- Comment #6 from Mark Warner warnerme at ptd dot net --- If it is invalid, why does -Wall not trigger anything ?
[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266 --- Comment #2 from Markus Trippelsdorf trippels at gcc dot gnu.org --- It's caused by mixing -O0 and -O2 with LTO: markus@x4 ~S % cat TableCopyHelper.ii namespace com { namespace sun { namespace star {} } } namespace css = com::sun::star; namespace com { namespace sun { namespace star { class A {}; template class interface_type class C : A { public: interface_type *operator-(); }; } } typedef struct { } uno_Any; namespace sun { namespace star { class D : uno_Any {}; class B { virtual css::D m_fn1(); virtual void m_fn2(); virtual void m_fn3(); }; class F : css::B { virtual int m_fn4(); }; namespace sdb { namespace application { class XCopyTableWizard : css::F { public: virtual void m_fn5(); virtual void m_fn6(); }; } } } } using namespace com::sun::star; using namespace com::sun::star::sdb::application; void fn1(Cint ) try { CXCopyTableWizard a; a-m_fn6(); } catch (int ) { } } markus@x4 ~S % cat copytablewizard.ii namespace com { namespace sun { namespace star { class A {}; namespace sdb { namespace application { class XCopyTableWizard { virtual int m_fn1(); }; } } } } } class OPropertyArrayUsageHelper { public: virtual ~OPropertyArrayUsageHelper(); }; using com::sun::star::A; using com::sun::star::sdb::application::XCopyTableWizard; class CopyTableWizard : XCopyTableWizard, OPropertyArrayUsageHelper { ~CopyTableWizard(); }; CopyTableWizard::~CopyTableWizard() try {} catch (A ) { } markus@x4 ~S % g++ -flto -fPIC -O0 -c copytablewizard.ii markus@x4 ~S % g++ -flto -fPIC -std=gnu++11 -O2 -c TableCopyHelper.ii markus@x4 ~S % g++ -w -r -nostdlib -O2 TableCopyHelper.o copytablewizard.o lto1: internal compiler error: in ipa_get_parm_lattices, at ipa-cp.c:261 0x50c7ac ipa_get_parm_lattices ../../gcc/gcc/ipa-cp.c:261 0xc41824 ipa_get_parm_lattices ../../gcc/gcc/ipa-cp.c:261 0xc41824 propagate_constants_accross_call ../../gcc/gcc/ipa-cp.c:1443 0xc44308 propagate_constants_topo ../../gcc/gcc/ipa-cp.c:2231 0xc44308 ipcp_propagate_stage ../../gcc/gcc/ipa-cp.c:2327 0xc44308 ipcp_driver ../../gcc/gcc/ipa-cp.c:3705 0xc44308 execute ../../gcc/gcc/ipa-cp.c:3804 Please submit a full bug report, with preprocessed source if appropriate. (I think the ODR violation got introduced during reduction.)
[Bug sanitizer/60275] [UBSAN] Add -f[no-]sanitize-recover/-fsanitize-undefined-trap-on-error to make UBSAN's runtime errors fatal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60275 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2014-02-19 Assignee|unassigned at gcc dot gnu.org |mpolacek at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org --- Mine. I think this is 5.0 material.
[Bug other/50925] [4.7/4.8/4.9 Regression][avr] ICE at spill_failure, at reload1.c:2118
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50925 Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org changed: What|Removed |Added CC||amylaar at gcc dot gnu.org --- Comment #28 from Jorn Wolfgang Rennecke amylaar at gcc dot gnu.org --- I can't reproduce this with the current trunk. Can was mark this as known to work for 4.9 ?
[Bug ipa/60243] IPA is slow on large cgraph tree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243 --- Comment #9 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Wed Feb 19 14:25:47 2014 New Revision: 207899 URL: http://gcc.gnu.org/viewcvs?rev=207899root=gccview=rev Log: 2014-02-19 Richard Biener rguent...@suse.de PR ipa/60243 * tree-inline.c (estimate_num_insns): Avoid calling cgraph_get_node for all calls. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-inline.c
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 --- Comment #7 from Marek Polacek mpolacek at gcc dot gnu.org --- (int)(sizeof(NSQ_del_dec_struct) / sizeof(opus_int32) seems to be 1168/4 = 292, but sLPC_Q14 has only 112 elements.
[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155 John David Anglin danglin at gcc dot gnu.org changed: What|Removed |Added CC||mikulas at artax dot karlin.mff.cu ||ni.cz --- Comment #5 from John David Anglin danglin at gcc dot gnu.org --- *** Bug 54737 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/54737] ICE on PA-RISC with LTO and -ftrapv
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54737 John David Anglin danglin at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||danglin at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #2 from John David Anglin danglin at gcc dot gnu.org --- There is a patch in 60155 that probably fixes this PR. *** This bug has been marked as a duplicate of bug 60155 ***
[Bug target/57232] wcstol.c:213:1: internal compiler error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57232 --- Comment #16 from Sebastian Götte gcc at jaseg dot net --- Alexandre, curiously, applying this patch to the cross-compiler source tree fixes the problem for me building 4.8.2 for rx-elf using a 4.8.2 x86_64 host gcc. I did not even have to rebuild the host gcc.
[Bug target/59799] aarch64_pass_by_reference never passes arrays by value, contrary to ABI documentation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59799 --- Comment #9 from yroux at gcc dot gnu.org --- Author: yroux Date: Wed Feb 19 15:32:54 2014 New Revision: 207908 URL: http://gcc.gnu.org/viewcvs?rev=207908root=gccview=rev Log: 2014-02-19 Michael Hudson-Doyle michael.hud...@linaro.org PR target/59799 * config/aarch64/aarch64.c (aarch64_pass_by_reference): The rules for passing arrays in registers are the same as for structs, so remove the special case for them. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c
[Bug target/57232] wcstol.c:213:1: internal compiler error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57232 --- Comment #17 from Nick Clifton nickc at redhat dot com --- Hi Alex, if (reg != hard_frame_pointer_rtx fixed_regs[REGNO (reg)]) cselib_preserve_cfa_base_value (val, REGNO (reg)); This works for the RX port - thanks! Cheers Nick
[Bug c++/60274] [4.8/4.9 Regression] String as template parameter - regression in 4.8.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60274 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||jason at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- Started with r207167.
[Bug c++/60064] [c++1y] ICE with auto as parameter of friend function
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60064 Volker Reichelt reichelt at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |4.9.0 --- Comment #3 from Volker Reichelt reichelt at gcc dot gnu.org --- Fixed with Adam's patch.
[Bug debug/56563] no debuginfo for explicit operator
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56563 --- Comment #2 from Mark Wielaard mark at gcc dot gnu.org --- Jakub proposed a patch: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01166.html
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org --- The code is not invalid C, just triggers undefined behavior, so it is not invalid at compile time, just at runtime if you ever hit this. GCC optimizes based on the assumption that undefined behavior doesn't happen in a correct program. While we have -Waggressive-loop-optimizations warning, it (intentionally) warns solely about the case where the loop has single exit and constant loop iteration count, which is not the case here, the number of iterations is i = 292 ? 0 : 292 - i. The loop will trigger undefined behavior whenever i is 292, if it is bigger, then there is no bug.
[Bug fortran/51976] [F2003] Support deferred-length character components of derived types (allocatable string length)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51976 --- Comment #13 from janus at gcc dot gnu.org --- The latest patch posted at http://gcc.gnu.org/ml/fortran/2014-02/msg00109.html works smoothly on the test case in comment 12.
[Bug target/59794] [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794 --- Comment #17 from uros at gcc dot gnu.org --- Author: uros Date: Wed Feb 19 15:53:59 2014 New Revision: 207910 URL: http://gcc.gnu.org/viewcvs?rev=207910root=gccview=rev Log: PR target/59794 * config/i386/i386.c (type_natural_mode): Warn for ABI changes only when -Wpsabi is enabled. testsuite/ChangeLog: PR target/59794 * gcc.target/i386/pr39162.c: Add dg-prune-output. (dg-options): Remove -Wno-psabi. * gcc.target/i386/59794-2.c: Ditto. * gcc.target/i386/60205-1.c: Ditto. * gcc.target/i386/sse-5.c: Ditto. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr39162.c trunk/gcc/testsuite/gcc.target/i386/pr59794-2.c trunk/gcc/testsuite/gcc.target/i386/pr59794-3.c trunk/gcc/testsuite/gcc.target/i386/pr60205-1.c trunk/gcc/testsuite/gcc.target/i386/sse-5.c
[Bug c++/60251] [4.9 Regression] [c++11] ICE capturing variable-length array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60251 --- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com --- Not sure this is valid. Anyway, the ICE is due to the COMPONENT_REF being wrapped in a NOP_EXPR.
[Bug target/59797] GCC doesn't warn AVX-512 ABI change
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59797 Uroš Bizjak ubizjak at gmail dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Uroš Bizjak ubizjak at gmail dot com --- Dup. *** This bug has been marked as a duplicate of bug 60205 ***
[Bug target/60205] No ABI warning for AVX-512
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60205 --- Comment #5 from Uroš Bizjak ubizjak at gmail dot com --- *** Bug 59797 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/57320] [4.9 Regression] Shrink-wrapping leaves unreachable blocks in the CFG
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57320 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- This has been fixed by r204211 on the trunk, any reason to keep this PR open? Note that Steven's patch has been approved, but never committed: http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01020.html
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 --- Comment #9 from Ian Hamilton ian at g0tcd dot com --- Yes, that's all proper and correct. The invalid C code induces undefined behaviour. I don't think anyone is disputing that. However, to be pragmatic for a moment, the experience of thousands of developers out there, working with legacy code, and trying to update their toolset to include gcc 4.8 is that code which compiled without warnings and worked with the old gcc compiler now still compiles without warnings, but fails at runtime with the 4.8 series compiler. Sometimes, the runtime failures are occasional and difficult to track down if (for example) it lies on an error handling path. This makes it even harder for these developers to figure out what's going on. If the compiler could provide a warning when it encounters this sort of invalid code, that would be a good thing, as it would highlight the old latent bugs and give developers the opportunity to fix them. However, it doesn't, so the developers working on legacy code really have no alternative to either using the -fno-aggressive-loop-optimizations switch to stabilse their legacy code (even assuming they understand what's happening), or sticking with the old version of the compiler. So I think the request to the gcc developers is to find a way of providing a compiler warning when the loop optimiser encounters problem code, to give developers a fighting chance of debugging their legacy code.
[Bug c/59933] for loop goes wild with assert() enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59933 --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org --- We have -fsanitize=undefined which can catch some issues, though the array bounds instrumentation (nor __builtin_object_size based instrumentation) has not been added yet for GCC 4.9, will be hopefully there in the next release. As for warnings, even the current -Waggressive-loop-optimizationsh warning (for the const number of iterations, single loop exit easy case where you know that if the loop is reachable, if there is undefined behavior in some loop iteration, you will trigger it) still has occassional false positives (various PRs about that, usually the issue is that while it is true there is such a loop, the loop is actually dead), further warnings would have huge false positive rate, to the extent that it would be rarely useful.
[Bug c++/60267] ICE in c_pp_lookup_pragma, at c-family/c-pragma.c:1232; ICE in tsubst_copy, at cp/pt.c:12887
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60267 --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Wed Feb 19 16:45:21 2014 New Revision: 207911 URL: http://gcc.gnu.org/viewcvs?rev=207911root=gccview=rev Log: PR c++/60267 * c-pragma.c (init_pragma): Don't call cpp_register_deferred_pragma for PRAGMA_IVDEP if flag_preprocess_only. * gcc.dg/pr60267.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr60267.c Modified: trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-pragma.c trunk/gcc/testsuite/ChangeLog
[Bug target/59794] [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794 --- Comment #18 from uros at gcc dot gnu.org --- Author: uros Date: Wed Feb 19 16:50:22 2014 New Revision: 207912 URL: http://gcc.gnu.org/viewcvs?rev=207912root=gccview=rev Log: Backport from mainline 2014-02-19 Uros Bizjak ubiz...@gmail.com PR target/59794 * config/i386/i386.c (type_natural_mode): Warn for ABI changes only when -Wpsabi is enabled. testsuite/ChangeLog: Backport from mainline 2014-02-19 Uros Bizjak ubiz...@gmail.com PR target/59794 * gcc.target/i386/pr39162.c: Add dg-prune-output. (dg-options): Remove -Wno-psabi. * gcc.target/i386/pr59794-2.c: Ditto. * gcc.target/i386/sse-5.c: Ditto. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-1.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-2.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-3.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-4.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-5.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-6.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr59794-7.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/config/i386/i386.c branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/pr39162.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/i386/sse-5.c
[Bug target/59797] GCC doesn't warn AVX-512 ABI change
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59797 Bug 59797 depends on bug 59794, which changed state. Bug 59794 Summary: [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/59794] [4.7/4.8 Regression] i386 backend fails to detect MMX/SSE/AVX ABI changes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59794 Uroš Bizjak ubizjak at gmail dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|4.7.4 |4.8.3 --- Comment #19 from Uroš Bizjak ubizjak at gmail dot com --- This is now implemented for 4.8.3 and 4.9, including -Wpsabi option. No plan to backport these patches to 4.7.x. FIXED for 4.8.3+.
[Bug other/50925] [4.7/4.8/4.9 Regression][avr] ICE at spill_failure, at reload1.c:2118
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50925 Ralf Corsepius corsepiu at gcc dot gnu.org changed: What|Removed |Added CC||corsepiu at gcc dot gnu.org --- Comment #29 from Ralf Corsepius corsepiu at gcc dot gnu.org --- (In reply to Jorn Wolfgang Rennecke from comment #28) I can't reproduce this with the current trunk. Confirmed. gcc-4.9 doesn't show this bug for --target=avr-rtems4.11, anymore. Can was mark this as known to work for 4.9 ? I am inclined to agree.
[Bug c/37743] Bogus printf format warning with __builtin_bswap32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37743 --- Comment #11 from joseph at codesourcery dot com joseph at codesourcery dot com --- Yes, we could do something like that (but I also think it's time to put the targets without this type information on the deprecation list and warn their maintainers that the target support will be removed in the absence of this information being added soon).
[Bug target/57896] [4.8 Regression] ICE in expand_expr_real_2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896 --- Comment #9 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Vittorio Zecca from comment #6) As an aside, in gcc 4.8.1 source code, before line 6995 of gcc/expr.c I put printf(\nexpr.c:6995 value-code=%d NUM_RTX_CODE=%d\n,(int) value-code,NUM_RTX_CODE); gcc_assert((int) value-code NUM_RTX_CODE); and I get an ICE there because value-code is 34816 and NUM_RTX_CODE is 145 Indeed at line 6995 ARITHMETIC_P (value) accesses rtx_class[(int) value-code] but the array rtx_class has only NUM_RTX_CODE elements. However, I do not know how this is relevant to this issue. This one points to infrastructure problem. Adding a debug patch: --cut here-- Index: explow.c === --- explow.c(revision 207910) +++ explow.c(working copy) @@ -186,8 +186,13 @@ plus_constant (enum machine_mode mode, rtx x, HOST } if (c != 0) -x = gen_rtx_PLUS (mode, x, GEN_INT (c)); +{ + rtx z = GEN_INT (c); + printf (cc, %li\n, c); + debug_rtx (z); + x = gen_rtx_PLUS (mode, x, z); +} if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF) return x; else if (all_constant) --cut here-- ~/gcc-build-48/gcc/cc1 pr57896.c ... __get_cpuidcc, -4 (const_int -4 [0xfffc]) cc, -16 (const_int -16 [0xfff0]) cc, -24 (const_int -24 [0xffe8]) cc, -32 (const_int -32 [0xffe0]) cc, -40 (??? bad code 47104 ) pr57896.c: In function ‘__get_cpuid’: pr57896.c:5:5: internal compiler error: in emit_move_insn_1, at expr.c:3437 int __get_cpuid (unsigned int __level, unsigned int *__eax, unsigned int *__ebx, unsigned int *__ecx, unsigned int *__edx) { ^ 0x62d74d emit_move_insn_1(rtx_def*, rtx_def*) /home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:3437 0x62d7b5 emit_move_insn(rtx_def*, rtx_def*) /home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/expr.c:3535 Please note that the debug patch only encloses GEN_INT (...)
[Bug target/57896] [4.8 Regression] ICE in expand_expr_real_2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57896 --- Comment #10 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Vittorio Zecca from comment #5) Adding option -m32 I get ICE in ix86_expand_prologue, at config/i386/i386.c:10559 I can confirm this with: gcc version 4.8.3 20140219 (prerelease) [gcc-4_8-branch revision 207910] (GCC) ~/gcc-build-48/gcc/cc1 -m32 -march=x86-64 pr57896.c pr57896.c: In function ‘__get_cpuid_max’: pr57896.c:4:1: internal compiler error: in ix86_expand_prologue, at config/i386/i386.c:10539 } ^ 0x98fdf5 ix86_expand_prologue() /home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/config/i386/i386.c:10539 0xa1ce5a gen_prologue() /home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/config/i386/i386.md:11829 0x673927 thread_prologue_and_epilogue_insns /home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/function.c:5949 0x673927 rest_of_handle_thread_prologue_and_epilogue /home/uros/gcc-svn/branches/gcc-4_8-branch/gcc/function.c:6973 The output from the debug patch doesn't look suspicious, so this looks like a real bug to me. Can someone please bisect which commit fixed/hid this bug?
[Bug c/37743] Bogus printf format warning with __builtin_bswap32.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37743 --- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 32173 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32173action=edit gcc49-pr37743.patch Untested fix. The deprecation can hopefully be done separately.
[Bug target/60207] Wrong TFmode check in construct_container
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60207 --- Comment #2 from hjl at gcc dot gnu.org hjl at gcc dot gnu.org --- Author: hjl Date: Wed Feb 19 18:10:04 2014 New Revision: 207913 URL: http://gcc.gnu.org/viewcvs?rev=207913root=gccview=rev Log: Remove TFmode check for X86_64_INTEGER_CLASS PR target/60207 * config/i386/i386.c (construct_container): Remove TFmode check for X86_64_INTEGER_CLASS. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c