Re: Vectorizer Pragmas
On 16 February 2014 23:44, Tim Prince n...@aol.com wrote: I don't think many people want to use both OpenMP 4 and older Intel directives together. I'm having less and less incentives to use anything other than omp4, cilk and whatever. I think we should be able to map all our internal needs to those pragmas. On the other hand, if you guys have any cross discussion with Intel folks about it, I'd love to hear. Since our support for those directives are a bit behind, would be good not to duplicate the efforts in the long run. Thanks! --renato
Re: TYPE_BINFO and canonical types at LTO
On Mon, 17 Feb 2014, Jan Hubicka wrote: On Fri, 14 Feb 2014, Jan Hubicka wrote: This smells bad, since it is given a canonical type that is after the structural equivalency merging that ignores BINFOs, so it may be completely different class with completely different bases than the original. Bases are structuraly merged, too and may be exchanged for normal fields because DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of the canonical type definition in LTO. Can you elaborate on that DECL_ARTIFICIAL thing? That is, what is broken by considering all fields during that merging? To make the code work with LTO, one can not merge struct B {struct A a} struct B: A {} these IMO differ only by DECL_ARTIFICIAL flag on the fields. The code == that BINFO walk? Is that because we walk a completely Yes. unrelated BINFO chain? I'd say we should have merged its types so that difference shouldn't matter. Hopefully ;) I am trying to make point that will matter. Here is completed testcase above: struct A {int a;}; struct C:A {}; struct B {struct A a;}; struct C *p2; struct B *p1; int t() { p1-a.a = 2; return p2-a; } With patch I get: Index: lto/lto.c === --- lto/lto.c (revision 20) +++ lto/lto.c (working copy) @@ -49,6 +49,8 @@ along with GCC; see the file COPYING3. #include data-streamer.h #include context.h #include pass_manager.h +#include print-tree.h /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver. */ @@ -619,6 +621,15 @@ gimple_canonical_type_eq (const void *p1 { const_tree t1 = (const_tree) p1; const_tree t2 = (const_tree) p2; + if (gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1), + CONST_CAST_TREE (t2)) + TREE_CODE (CONST_CAST_TREE (t1)) == RECORD_TYPE) + { + debug_tree (CONST_CAST_TREE (t1)); + fprintf (stderr, bases:%i\n, BINFO_BASE_BINFOS (TYPE_BINFO (t1))-length()); + debug_tree (CONST_CAST_TREE (t2)); + fprintf (stderr, bases:%i\n, BINFO_BASE_BINFOS (TYPE_BINFO (t2))-length()); + } return gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1), CONST_CAST_TREE (t2)); } record_type 0x76c52888 B SI size integer_cst 0x76ae83a0 type integer_type 0x76ae5150 bitsizetype constant 32 unit size integer_cst 0x76ae83c0 type integer_type 0x76ae50a8 sizetype constant 4 align 32 symtab 0 alias set -1 canonical type 0x76c52888 fields field_decl 0x76adec78 a type record_type 0x76c52738 A SI size integer_cst 0x76ae83a0 32 unit size integer_cst 0x76ae83c0 4 align 32 symtab 0 alias set -1 canonical type 0x76c52738 fields field_decl 0x76adebe0 a context translation_unit_decl 0x76af2e60 D.2821 chain type_decl 0x76af2f18 A nonlocal SI file t.C line 3 col 20 size integer_cst 0x76ae83a0 32 unit size integer_cst 0x76ae83c0 4 align 32 offset_align 128 offset integer_cst 0x76ae8060 constant 0 bit offset integer_cst 0x76ae80e0 constant 0 context record_type 0x76c52888 B chain type_decl 0x76c55170 B type record_type 0x76c52930 B nonlocal VOID file t.C line 3 col 10 align 1 context record_type 0x76c52888 B result record_type 0x76c52888 B context translation_unit_decl 0x76af2e60 D.2821 pointer_to_this pointer_type 0x76c529d8 chain type_decl 0x76c550b8 B bases:0 record_type 0x76c52b28 C SI size integer_cst 0x76ae83a0 type integer_type 0x76ae5150 bitsizetype constant 32 unit size integer_cst 0x76ae83c0 type integer_type 0x76ae50a8 sizetype constant 4 align 32 symtab 0 alias set -1 structural equality fields field_decl 0x76adeda8 D.2831 type record_type 0x76c52738 A SI size integer_cst 0x76ae83a0 32 unit size integer_cst 0x76ae83c0 4 align 32 symtab 0 alias set -1 canonical type 0x76c52738 fields field_decl 0x76adebe0 a context translation_unit_decl 0x76af2e60 D.2821 chain type_decl 0x76af2f18 A ignored SI file t.C line 2 col 8 size integer_cst 0x76ae83a0 32 unit size integer_cst 0x76ae83c0 4 align 32 offset_align 128 offset integer_cst 0x76ae8060 constant 0 bit offset integer_cst 0x76ae80e0 constant 0 context record_type 0x76c52a80 C chain type_decl 0x76c552e0 C type record_type 0x76c52b28 C nonlocal VOID file t.C line 2 col 12 align 1 context record_type 0x76c52a80 C result record_type 0x76c52a80 C
Re: Need help: Is a VAR_DECL type builtin or not?
On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote: On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote: Given a specific VAR_DECL tree node, I need to find out whether its type is built in or not. Up to now I have tree tn = TYPE_NAME (TREE_TYPE (var_decl)); if (tn != NULL_TREE TREE_CODE (tn) == TYPE_DECL DECL_NAME (tn)) { ... } This if-condition is true for both, int x; const int x; ... and typedef int i_t; i_t x; const i_t x; ... I need to weed out the class of VAR_DECLs that directly use built in types. Try DECL_IS_BUILTIN. But I question how you define builtin here? Well, actually I'm working on the variable output function in godump.c. At the moment, if the code comes across typedef char c_t chat c1; c_t c2; it emits type _c_t byte var c1 byte var c2 byte This is fine for c1, but for c2 it should really use the type: var c2 _c_t So the rule I'm trying to implement is: Given a Tree node that is a VAR_DECL, if its type is an alias (defined with typedef/union/struct/class etc.), use the name of the alias, otherwise resolve the type recursively until only types built into the language are left. It's really only about the underlying data types (int, float, _Complex etc.), not about storage classes, pointers, attributes, qualifiers etc. Well, since godump.c already caches all declarations it has come across, I could assume that these declarations are not built-in and use that in the rule above. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: Need help: Is a VAR_DECL type builtin or not?
On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt v...@linux.vnet.ibm.com wrote: On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote: On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote: Given a specific VAR_DECL tree node, I need to find out whether its type is built in or not. Up to now I have tree tn = TYPE_NAME (TREE_TYPE (var_decl)); if (tn != NULL_TREE TREE_CODE (tn) == TYPE_DECL DECL_NAME (tn)) { ... } This if-condition is true for both, int x; const int x; ... and typedef int i_t; i_t x; const i_t x; ... I need to weed out the class of VAR_DECLs that directly use built in types. Try DECL_IS_BUILTIN. But I question how you define builtin here? Well, actually I'm working on the variable output function in godump.c. At the moment, if the code comes across typedef char c_t chat c1; c_t c2; it emits type _c_t byte var c1 byte var c2 byte This is fine for c1, but for c2 it should really use the type: var c2 _c_t So the rule I'm trying to implement is: Given a Tree node that is a VAR_DECL, if its type is an alias (defined with typedef/union/struct/class etc.), use the name of the alias, otherwise resolve the type recursively until only types built into the language are left. It's really only about the underlying data types (int, float, _Complex etc.), not about storage classes, pointers, attributes, qualifiers etc. Well, since godump.c already caches all declarations it has come across, I could assume that these declarations are not built-in and use that in the rule above. Not sure what GO presents us as location info, but DECL_IS_BUILTIN looks if the line the type was declared is sth impossible (reserved and supposed to be used for all types that do not have to be declared). Richard. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: Vectorizer Pragmas
On 2/17/2014 4:42 AM, Renato Golin wrote: On 16 February 2014 23:44, Tim Prince n...@aol.com wrote: I don't think many people want to use both OpenMP 4 and older Intel directives together. I'm having less and less incentives to use anything other than omp4, cilk and whatever. I think we should be able to map all our internal needs to those pragmas. On the other hand, if you guys have any cross discussion with Intel folks about it, I'd love to hear. Since our support for those directives are a bit behind, would be good not to duplicate the efforts in the long run. I'm continuing discussions with former Intel colleagues. If you are asking for insight into how Intel priorities vary over time, I don't expect much, unless the next beta compiler provides some inferences. They have talked about implementing all of OpenMP 4.0 except user defined reduction this year. That would imply more activity in that area than on cilkplus, although some fixes have come in the latter. On the other hand I had an issue on omp simd reduction(max: ) closed with the decision will not be fixed. I have an icc problem report in on fixing omp simd safelen so it is more like the standard and less like the obsolete pragma simd vectorlength. Also, I have some problem reports active attempting to get clarification of their omp target implementation. You may have noticed that omp parallel for simd in current Intel compilers can be used for combined thread and simd parallelism, including the case where the outer loop is parallelizable and vectorizable but the inner one is not. -- Tim Prince
Re: Vectorizer Pragmas
On 17 February 2014 14:47, Tim Prince n...@aol.com wrote: I'm continuing discussions with former Intel colleagues. If you are asking for insight into how Intel priorities vary over time, I don't expect much, unless the next beta compiler provides some inferences. They have talked about implementing all of OpenMP 4.0 except user defined reduction this year. That would imply more activity in that area than on cilkplus, I'm expecting this. Any proposal to support Cilk in LLVM would be purely temporary and not endorsed in any way. although some fixes have come in the latter. On the other hand I had an issue on omp simd reduction(max: ) closed with the decision will not be fixed. We still haven't got pragmas for induction/reduction logic, so I'm not too worried about them. I have an icc problem report in on fixing omp simd safelen so it is more like the standard and less like the obsolete pragma simd vectorlength. Our width metadata is slightly different in that it means try to use that length, rather than it's safe to use that length, this is why I'm holding on use safelen for the moment. Also, I have some problem reports active attempting to get clarification of their omp target implementation. Same here... RTFM is not enough in this case. ;) You may have noticed that omp parallel for simd in current Intel compilers can be used for combined thread and simd parallelism, including the case where the outer loop is parallelizable and vectorizable but the inner one is not. That's my fear of going with omp simd directly. I don't want to be throwing threads all over the place when all I really want is vector code. For the time, my proposal is to use legacy pragmas: vector/novector, unroll/nounroll and simd vectorlength which map nicely to the metadata we already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up with simple non-threaded pragmas, we should use those and deprecate the legacy ones. If GCC is trying to do the same thing regarding non-threaded-vector code, I'd be glad to be involved in the discussion. Some LLVM folks think this should be an OpenMP discussion, I personally think it's pushing the boundaries a bit too much on an inherently threaded library extension. cheers, --renato
Re: [RFC] Offloading Support in libgomp
On 14 Feb 16:43, Jakub Jelinek wrote: So, perhaps we should just stop for now oring the copyfrom in and just use the copyfrom from the very first mapping only, and wait for what the committee actually agrees on. Jakub Like this? @@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn, [%p..%p) is already mapped, (void *) newn-host_start, (void *) newn-host_end, (void *) oldn-host_start, (void *) oldn-host_end); +#if 0 + /* FIXME: Remove this when OpenMP 4.0 will be standardized. Currently it's + unclear regarding overwriting copy_from for the existing mapping. + See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details. */ if (((kind 7) == 2 || (kind 7) == 3) !oldn-copy_from oldn-host_start == newn-host_start oldn-host_end == newn-host_end) oldn-copy_from = true; +#endif oldn-refcount++; } -- Ilya
Re: [RFC] Offloading Support in libgomp
On Mon, Feb 17, 2014 at 07:59:16PM +0400, Ilya Verbin wrote: On 14 Feb 16:43, Jakub Jelinek wrote: So, perhaps we should just stop for now oring the copyfrom in and just use the copyfrom from the very first mapping only, and wait for what the committee actually agrees on. Jakub Like this? @@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, splay_tree_key newn, [%p..%p) is already mapped, (void *) newn-host_start, (void *) newn-host_end, (void *) oldn-host_start, (void *) oldn-host_end); +#if 0 + /* FIXME: Remove this when OpenMP 4.0 will be standardized. Currently it's + unclear regarding overwriting copy_from for the existing mapping. + See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details. */ if (((kind 7) == 2 || (kind 7) == 3) !oldn-copy_from oldn-host_start == newn-host_start oldn-host_end == newn-host_end) oldn-copy_from = true; +#endif oldn-refcount++; } Well, OpenMP 4.0 is a released standard, just in some cases ambiguous or buggy. I'd just remove the code rather than putting it into #if 0, patch preapproved. It will stay in the SVN history... Jakub
Re: Need help: Is a VAR_DECL type builtin or not?
On Mon, Feb 17, 2014 at 5:28 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt v...@linux.vnet.ibm.com wrote: On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote: On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote: Given a specific VAR_DECL tree node, I need to find out whether its type is built in or not. Up to now I have tree tn = TYPE_NAME (TREE_TYPE (var_decl)); if (tn != NULL_TREE TREE_CODE (tn) == TYPE_DECL DECL_NAME (tn)) { ... } This if-condition is true for both, int x; const int x; ... and typedef int i_t; i_t x; const i_t x; ... I need to weed out the class of VAR_DECLs that directly use built in types. Try DECL_IS_BUILTIN. But I question how you define builtin here? Well, actually I'm working on the variable output function in godump.c. At the moment, if the code comes across typedef char c_t chat c1; c_t c2; it emits type _c_t byte var c1 byte var c2 byte This is fine for c1, but for c2 it should really use the type: var c2 _c_t So the rule I'm trying to implement is: Given a Tree node that is a VAR_DECL, if its type is an alias (defined with typedef/union/struct/class etc.), use the name of the alias, otherwise resolve the type recursively until only types built into the language are left. It's really only about the underlying data types (int, float, _Complex etc.), not about storage classes, pointers, attributes, qualifiers etc. Well, since godump.c already caches all declarations it has come across, I could assume that these declarations are not built-in and use that in the rule above. Not sure what GO presents us as location info, but DECL_IS_BUILTIN looks if the line the type was declared is sth impossible (reserved and supposed to be used for all types that do not have to be declared). godump.c is actually not used by the Go frontend. The purpose of godump.c is to read C header files and dump them in a Go representation. It's used when building the Go library, to get Go versions of system structures like struct stat. I'm not quite sure what Dominik is after. For system structures using the basic type, the underlying type of a typedef, is normally what you want. But to answer the question as stated, I think I would look at functions like is_naming_typedef_decl in dwarf2out.c, since this sounds like the kind of question that debug info needs to sort out. Ian
Re: [RFC][PATCH 0/5] arch: atomic rework
On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: You need volatile semantics to force the compiler to ignore any proofs it might otherwise attempt to construct. Hence all the ACCESS_ONCE() calls in my email to Torvald. (Hopefully I translated your example reasonably.) My brain gave out for today; but it did appear to have the right structure. I can relate. ;-) I would prefer it C11 would not require the volatile casts. It should simply _never_ speculate with atomic writes, volatile or not. I agree with not needing volatiles to prevent speculated writes. However, they will sometimes be needed to prevent excessive load/store combining. The compiler doesn't have the runtime feedback mechanisms that the hardware has, and thus will need help from the developer from time to time. Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics until the compiler has learned to be sufficiently conservative in its load-store combining decisions. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sat, 15 Feb 2014, Torvald Riegel wrote: glibc is a counterexample that comes to mind, although it's a smaller code base. (It's currently not using C11 atomics, but transitioning there makes sense, and some thing I want to get to eventually.) glibc is using C11 atomics (GCC builtins rather than _Atomic / stdatomic.h, but using __atomic_* with explicitly specified memory model rather than the older __sync_*) on AArch64, plus in certain cases on ARM and MIPS. -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 06:59:31PM +, Joseph S. Myers wrote: On Sat, 15 Feb 2014, Torvald Riegel wrote: glibc is a counterexample that comes to mind, although it's a smaller code base. (It's currently not using C11 atomics, but transitioning there makes sense, and some thing I want to get to eventually.) glibc is using C11 atomics (GCC builtins rather than _Atomic / stdatomic.h, but using __atomic_* with explicitly specified memory model rather than the older __sync_*) on AArch64, plus in certain cases on ARM and MIPS. Hmm, actually that results in a change in behaviour for the __sync_* primitives on AArch64. The documentation for those states that: `In most cases, these built-in functions are considered a full barrier. That is, no memory operand is moved across the operation, either forward or backward. Further, instructions are issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation.' which is stronger than simply mapping them to memory_model_seq_cst, which seems to be what the AArch64 compiler is doing (so you get acquire + release instead of a full fence). Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote: On Sat, 15 Feb 2014, Torvald Riegel wrote: glibc is a counterexample that comes to mind, although it's a smaller code base. (It's currently not using C11 atomics, but transitioning there makes sense, and some thing I want to get to eventually.) glibc is using C11 atomics (GCC builtins rather than _Atomic / stdatomic.h, but using __atomic_* with explicitly specified memory model rather than the older __sync_*) on AArch64, plus in certain cases on ARM and MIPS. I think the major steps remaining is moving the other architectures over, and rechecking concurrent code (e.g., for the code that I have seen, it was either asm variants (eg, on x86), or built before C11; ARM pthread_once was lacking memory_barriers (see pthread_once unification patches I posted)). We also need/should to move towards using relaxed-MO atomic loads instead of plain loads.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote: On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel trie...@redhat.com wrote: I think a major benefit of C11's memory model is that it gives a *precise* specification for how a compiler is allowed to optimize. Clearly it does *not*. This whole discussion is proof of that. It's not at all clear, It might not be an easy-to-understand specification, but as far as I'm aware it is precise. The Cambridge group's formalization certainly is precise. From that, one can derive (together with the usual rules for as-if etc.) what a compiler is allowed to do (assuming that the standard is indeed precise). My replies in this discussion have been based on reasoning about the standard, and not secret knowledge (with the exception of no-out-of-thin-air, which is required in the standard's prose but not yet formalized). I agree that I'm using the formalization as a kind of placeholder for the standard's prose (which isn't all that easy to follow for me either), but I guess there's no way around an ISO standard using prose. If you see a case in which the standard isn't precise, please bring it up or open a C++ CWG issue for it. and the standard apparently is at least debatably allowing things that shouldn't be allowed. Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? It's also a whole lot more complicated than volatile, so the likelihood of a compiler writer actually getting it right - even if the standard does - is lower. It's not easy, that's for sure, but none of the high-performance alternatives are easy either. There are testing tools out there based on the formalization of the model, and we've found bugs with them. And the alternative of using something not specified by the standard is even worse, I think, because then you have to guess what a compiler might do, without having any constraints; IOW, one is resorting to no sane compiler would do that, and that doesn't seem to very robust either.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel trie...@redhat.com wrote: Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? Well, Paul seems to still think that the standard possibly allows speculative writes or possibly value speculation in ways that break the hardware-guaranteed orderings. And personally, I can't read standards paperwork. It is invariably written in some basically impossible-to-understand lawyeristic mode, and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering (that depends on what the meaning of 'is' is). The whole lvalue vs rvalue expression vs 'what is a volatile access' thing for C++ was/is a great example of that. So quite frankly, as a result I refuse to have anything to do with the process directly. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote: On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote: On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel trie...@redhat.com wrote: I think a major benefit of C11's memory model is that it gives a *precise* specification for how a compiler is allowed to optimize. Clearly it does *not*. This whole discussion is proof of that. It's not at all clear, It might not be an easy-to-understand specification, but as far as I'm aware it is precise. The Cambridge group's formalization certainly is precise. From that, one can derive (together with the usual rules for as-if etc.) what a compiler is allowed to do (assuming that the standard is indeed precise). My replies in this discussion have been based on reasoning about the standard, and not secret knowledge (with the exception of no-out-of-thin-air, which is required in the standard's prose but not yet formalized). I agree that I'm using the formalization as a kind of placeholder for the standard's prose (which isn't all that easy to follow for me either), but I guess there's no way around an ISO standard using prose. If you see a case in which the standard isn't precise, please bring it up or open a C++ CWG issue for it. I suggest that I go through the Linux kernel's requirements for atomics and memory barriers and see how they map to C11 atomics. With that done, we would have very specific examples to go over. Without that done, the discussion won't converge very well. Seem reasonable? Thanx, Paul and the standard apparently is at least debatably allowing things that shouldn't be allowed. Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? It's also a whole lot more complicated than volatile, so the likelihood of a compiler writer actually getting it right - even if the standard does - is lower. It's not easy, that's for sure, but none of the high-performance alternatives are easy either. There are testing tools out there based on the formalization of the model, and we've found bugs with them. And the alternative of using something not specified by the standard is even worse, I think, because then you have to guess what a compiler might do, without having any constraints; IOW, one is resorting to no sane compiler would do that, and that doesn't seem to very robust either.
Re: [RFC][PATCH 0/5] arch: atomic rework
On February 17, 2014 7:18:15 PM GMT+01:00, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: You need volatile semantics to force the compiler to ignore any proofs it might otherwise attempt to construct. Hence all the ACCESS_ONCE() calls in my email to Torvald. (Hopefully I translated your example reasonably.) My brain gave out for today; but it did appear to have the right structure. I can relate. ;-) I would prefer it C11 would not require the volatile casts. It should simply _never_ speculate with atomic writes, volatile or not. I agree with not needing volatiles to prevent speculated writes. However, they will sometimes be needed to prevent excessive load/store combining. The compiler doesn't have the runtime feedback mechanisms that the hardware has, and thus will need help from the developer from time to time. Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics until the compiler has learned to be sufficiently conservative in its load-store combining decisions. Sounds backwards. Currently the compiler does nothing to the atomics. I'm sure we'll eventually add something. But if testing coverage is zero outside then surely things get worse, not better with time. Richard. Thanx, Paul
FreeBSD users of gcc
Greetings, I am the named maintainer of the freebsd port. I have been for approximately twelve years; although I haven't been very active the last four years. The last major work I put into the freebsd port was at the end of 2009. I have reviewed others' patches since then; but it really hasn't required anything major since David O'Brien and I did foundational work in the early 200Xs (which itself was based on many others' foundations). Gerald Pfeifer has also done much to keep the port in a good shape. (I also don't want to ignore the many patches that came from members of the FreeBSD core team and other FreeBSD users.) To complicate matters, I haven't been using FreeBSD on my primary desktop or otherwise since early 2011. FreeBSD is listed as a tier one platform. Therefore, I am looking for someone that both the GCC steering committee and I would be willing to hand over the reigns before I drop my officially-listed maintainership. The expected person will likely already have Write After Approval status. Please contact me directly, if you are qualified and interested in becoming the freebsd OS port maintainer. Regards, Loren
Re: TYPE_BINFO and canonical types at LTO
Yeah, ok. But we treat those types (B and C) TBAA equivalent because structurally they are the same ;) Luckily C has a proper field for its base (proper means that offset and size are correct as well as the type). It indeed has DECL_ARTIFICIAL set and yes, we treat those as real fields when doing the structural comparison. Yep, the difference is that depending if C or D win, we will end up walking the BINFO or not. So we should not depend on the BINFo walk for correctness. More interesting is of course when we can re-use tail-padding in one but not the other (works as expected - not merged). Yep. struct A { A (); short x; bool a;}; struct C:A { bool b; }; struct B {struct A a; bool b;}; struct C *p2; struct B *p1; int t() { p1-a.a = 2; return p2-a; } Yes, zero sized classes are those having no fields (but other stuff, type decls, bases etc.) Yeah, but TBAA obviously doesn't care about type decls and bases. So I guess the conclussion is that the BINFO walk in alias.c is pointless? Concerning the merging details and LTO aliasing, I think for 4.10 we should make C++ to compute mangled names of types (i.e. call DECL_ASSEMBLER_NAME on the associated type_decl + explicitly mark that type is driven by ODR) and then we can do merging driven by ODR rule. Non-ODR types born from other frontends will then need to be made to alias all the ODR variants that can be done by storing them into the current canonical type hash. (I wonder if we want to support cross language aliasing for non-POD?) I also think we want explicit representation of types known to be local to compilation unit - anonymous namespaces in C/C++, types defined within function bodies in C and god knows what in Ada/Fortran/Java. Honza Richard.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 12:23 -0800, Paul E. McKenney wrote: On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote: On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote: On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel trie...@redhat.com wrote: I think a major benefit of C11's memory model is that it gives a *precise* specification for how a compiler is allowed to optimize. Clearly it does *not*. This whole discussion is proof of that. It's not at all clear, It might not be an easy-to-understand specification, but as far as I'm aware it is precise. The Cambridge group's formalization certainly is precise. From that, one can derive (together with the usual rules for as-if etc.) what a compiler is allowed to do (assuming that the standard is indeed precise). My replies in this discussion have been based on reasoning about the standard, and not secret knowledge (with the exception of no-out-of-thin-air, which is required in the standard's prose but not yet formalized). I agree that I'm using the formalization as a kind of placeholder for the standard's prose (which isn't all that easy to follow for me either), but I guess there's no way around an ISO standard using prose. If you see a case in which the standard isn't precise, please bring it up or open a C++ CWG issue for it. I suggest that I go through the Linux kernel's requirements for atomics and memory barriers and see how they map to C11 atomics. With that done, we would have very specific examples to go over. Without that done, the discussion won't converge very well. Seem reasonable? Sounds good!
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel trie...@redhat.com wrote: Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? Well, Paul seems to still think that the standard possibly allows speculative writes or possibly value speculation in ways that break the hardware-guaranteed orderings. That's true, I just didn't see any specific examples so far. And personally, I can't read standards paperwork. It is invariably written in some basically impossible-to-understand lawyeristic mode, Yeah, it's not the most intuitive form for things like the memory model. and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering (that depends on what the meaning of 'is' is). That assumption about people working on compilers is a little too broad, don't you think? I think that it is important to stick to a specification, in the same way that one wouldn't expect a program with undefined behavior make any sense of it, magically, in cases where stuff is undefined. However, that of course doesn't include trying to exploit weasel-wording (BTW, both users and compiler writers try to do it). IMHO, weasel-wording in a standard is a problem in itself even if not exploited, and often it indicates that there is a real issue. There might be reasons to have weasel-wording (e.g., because there's no known better way to express it like in case of the not really precise no-out-of-thin-air rule today), but nonetheless those aren't ideal. The whole lvalue vs rvalue expression vs 'what is a volatile access' thing for C++ was/is a great example of that. I'm not aware of the details of this. So quite frankly, as a result I refuse to have anything to do with the process directly. That's unfortunate. Then please work with somebody that isn't uncomfortable with participating directly in the process. But be warned, it may very well be a person working on compilers :) Have you looked at the formalization of the model by Batty et al.? The overview of this is prose, but the formalized model itself is all formal relations and logic. So there should be no language-lawyering issues with that form. (For me, the formalized model is much easier to reason about.)
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote: and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering (that depends on what the meaning of 'is' is). That assumption about people working on compilers is a little too broad, don't you think? Let's just say that *some* are that way, and those are the ones that I end up butting heads with. The sane ones I never have to argue with - point them at a bug, and they just say yup, bug. The insane ones say we don't need to fix that, because if you read this copy of the standards that have been translated to chinese and back, it clearly says that this is acceptable. The whole lvalue vs rvalue expression vs 'what is a volatile access' thing for C++ was/is a great example of that. I'm not aware of the details of this. The argument was that an lvalue doesn't actually access the memory (an rvalue does), so this: volatile int *p = ...; *p; doesn't need to generate a load from memory, because *p is still an lvalue (since you could assign things to it). This isn't an issue in C, because in C, expression statements are always rvalues, but C++ changed that. The people involved with the C++ standards have generally been totally clueless about their subtle changes. I may have misstated something, but basically some C++ people tried very hard to make volatile useless. We had other issues too. Like C compiler people who felt that the type-based aliasing should always override anything else, even if the variable accessed (through different types) was statically clearly aliasing and used the exact same pointer. That made it impossible to do a syntactically clean model of this aliases, since the _only_ exception to the type-based aliasing rule was to generate a union for every possible access pairing. We turned off type-based aliasing (as I've mentioned before, I think it's a fundamentally broken feature to begin with, and a horrible horrible hack that adds no value for anybody but the HPC people). Gcc eventually ended up having some sane syntax for overriding it, but by then I was too disgusted with the people involved to even care. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote: On Sat, Feb 15, 2014 at 9:30 AM, Torvald Riegel trie...@redhat.com wrote: I think the example is easy to misunderstand, because the context isn't clear. Therefore, let me first try to clarify the background. (1) The abstract machine does not write speculatively. (2) Emitting a branch instruction and executing a branch at runtime is not part of the specified behavior of the abstract machine. Of course, the abstract machine performs conditional execution, but that just specifies the output / side effects that it must produce (e.g., volatile stores) -- not with which hardware instructions it is producing this. (3) A compiled program must produce the same output as if executed by the abstract machine. Ok, I'm fine with that. Thus, we need to be careful what speculative store is meant to refer to. A few examples: if (atomic_load(x, mo_relaxed) == 1) atomic_store(y, 3, mo_relaxed)); No, please don't use this idiotic example. It is wrong. It won't be useful in practice in a lot of cases, but that doesn't mean it's wrong. It's clearly not illegal code. It also serves a purpose: a simple example to reason about a few aspects of the memory model. The fact is, if a compiler generates anything but the obvious sequence (read/cmp/branch/store - where branch/store might obviously be done with some other machine conditional like a predicate), the compiler is wrong. Why? I've reasoned why (1) to (3) above allow in certain cases (i.e., the first load always returning 1) for the branch (or other machine conditional) to not be emitted. So please either poke holes into this reasoning, or clarify that you don't in fact, contrary to what you wrote above, agree with (1) to (3). Anybody who argues anything else is wrong, or confused, or confusing. I appreciate your opinion, and maybe I'm just one of the three things above (my vote is on confusing). But without you saying why doesn't help me see what's the misunderstanding here. Instead, argue about *other* sequences where the compiler can do something. I'd prefer if we could clarify the misunderstanding for the simple case first that doesn't involve stronger ordering requirements in the form of non-relaxed MOs. For example, this sequence: atomic_store(x, a, mo_relaxed); b = atomic_load(x, mo_relaxed); can validly be transformed to atomic_store(x, a, mo_relaxed); b = (typeof(x)) a; and I think everybody agrees about that. In fact, that optimization can be done even for mo_strict. Yes. But even that obvious optimization has subtle cases. What if the store is relaxed, but the load is strict? You can't do the optimization without a lot of though, because dropping the strict load would drop an ordering point. So even the store followed by exact same load case has subtle issues. Yes if a compiler wants to optimize that, it has to give it more thought. My gut feeling is that either the store should get the stronger ordering, or the accesses should be merged. But I'd have to think more about that one (which I can do on request). With similar caveats, it is perfectly valid to merge two consecutive loads, and to merge two consecutive stores. Now that means that the sequence atomic_store(x, 1, mo_relaxed); if (atomic_load(x, mo_relaxed) == 1) atomic_store(y, 3, mo_relaxed); can first be optimized to atomic_store(x, 1, mo_relaxed); if (1 == 1) atomic_store(y, 3, mo_relaxed); and then you get the end result that you wanted in the first place (including the ability to re-order the two stores due to the relaxed ordering, assuming they can be proven to not alias - and please don't use the idiotic type-based aliasing rules). Bringing up your first example is pure and utter confusion. Sorry if it was confusing. But then maybe we need to talk about it more, because it shouldn't be confusing if we agree on what the memory model allows and what not. I had originally picked the example because it was related to the example Paul/Peter brought up. Don't do it. Instead, show what are obvious and valid transformations, and then you can bring up these kinds of combinations as look, this is obviously also correct. I have my doubts whether the best way to reason about the memory model is by thinking about specific compiler transformations. YMMV, obviously. The -- kind of vague -- reason is that the allowed transformations will be more complicated to reason about than the allowed output of a concurrent program when understanding the memory model (ie, ordering and interleaving of memory accesses, etc.). However, I can see that when trying to optimize with a hardware memory model in mind, this might look appealing. What the compiler will do is exploiting knowledge about all possible executions. For example, if it knows that x is always 1, it will do the transform. The user would
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote: On February 17, 2014 7:18:15 PM GMT+01:00, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: You need volatile semantics to force the compiler to ignore any proofs it might otherwise attempt to construct. Hence all the ACCESS_ONCE() calls in my email to Torvald. (Hopefully I translated your example reasonably.) My brain gave out for today; but it did appear to have the right structure. I can relate. ;-) I would prefer it C11 would not require the volatile casts. It should simply _never_ speculate with atomic writes, volatile or not. I agree with not needing volatiles to prevent speculated writes. However, they will sometimes be needed to prevent excessive load/store combining. The compiler doesn't have the runtime feedback mechanisms that the hardware has, and thus will need help from the developer from time to time. Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics until the compiler has learned to be sufficiently conservative in its load-store combining decisions. Sounds backwards. Currently the compiler does nothing to the atomics. I'm sure we'll eventually add something. But if testing coverage is zero outside then surely things get worse, not better with time. Perhaps we solve this chicken-and-egg problem by creating a test suite? Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote: and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering (that depends on what the meaning of 'is' is). That assumption about people working on compilers is a little too broad, don't you think? Let's just say that *some* are that way, and those are the ones that I end up butting heads with. The sane ones I never have to argue with - point them at a bug, and they just say yup, bug. The insane ones say we don't need to fix that, because if you read this copy of the standards that have been translated to chinese and back, it clearly says that this is acceptable. The whole lvalue vs rvalue expression vs 'what is a volatile access' thing for C++ was/is a great example of that. I'm not aware of the details of this. The argument was that an lvalue doesn't actually access the memory (an rvalue does), so this: volatile int *p = ...; *p; doesn't need to generate a load from memory, because *p is still an lvalue (since you could assign things to it). This isn't an issue in C, because in C, expression statements are always rvalues, but C++ changed that. Huhh. I can see the problems that this creates in terms of C/C++ compatibility. The people involved with the C++ standards have generally been totally clueless about their subtle changes. This isn't a fair characterization. There are many people that do care, and certainly not all are clueless. But it's a limited set of people, bugs happen, and not all of them will have the same goals. I think one way to prevent such problems in the future could be to have someone in the kernel community volunteer to look through standard revisions before they are published. The standard needs to be fixed, because compilers need to conform to the standard (e.g., a compiler's extension fixing the above wouldn't be conforming anymore because it emits more volatile reads than specified). Or maybe those of us working on the standard need to flag potential changes of interest to the kernel folks. But that may be less reliable than someone from the kernel side looking at them; I don't know.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel trie...@redhat.com wrote: On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote: if (atomic_load(x, mo_relaxed) == 1) atomic_store(y, 3, mo_relaxed)); No, please don't use this idiotic example. It is wrong. It won't be useful in practice in a lot of cases, but that doesn't mean it's wrong. It's clearly not illegal code. It also serves a purpose: a simple example to reason about a few aspects of the memory model. It's not illegal code, but i you claim that you can make that store unconditional, it's a pointless and wrong example. The fact is, if a compiler generates anything but the obvious sequence (read/cmp/branch/store - where branch/store might obviously be done with some other machine conditional like a predicate), the compiler is wrong. Why? I've reasoned why (1) to (3) above allow in certain cases (i.e., the first load always returning 1) for the branch (or other machine conditional) to not be emitted. So please either poke holes into this reasoning, or clarify that you don't in fact, contrary to what you wrote above, agree with (1) to (3). The thing is, the first load DOES NOT RETURN 1. It returns whatever that memory location contains. End of story. Stop claiming it can return 1.. It *never* returns 1 unless you do the load and *verify* it, or unless the load itself can be made to go away. And with the code sequence given, that just doesn't happen. END OF STORY. So your argument is *shit*. Why do you continue to argue it? I told you how that load can go away, and you agreed. But IT CANNOT GO AWAY any other way. You cannot claim the compiler knows. The compiler doesn't know. It's that simple. So why do I say you are wrong, after I just gave you an example of how it happens? Because my example went back to the *real* issue, and there are actual real semantically meaningful details with doing things like load merging. To give an example, let's rewrite things a bit more to use an extra variable: atomic_store(x, 1, mo_relaxed); a = atomic_load(1, mo_relaxed); if (a == 1) atomic_store(y, 3, mo_relaxed); which looks exactly the same. I'm confused. Is this a new example? That is a new example. The important part is that it has left a trace for the programmer: because 'a' contains the value, the programmer can now look at the value later and say oh, we know we did a store iff a was 1 This sequence: atomic_store(x, 1, mo_relaxed); a = atomic_load(x, mo_relaxed); atomic_store(y, 3, mo_relaxed); is actually - and very seriously - buggy. Why? Because you have effectively split the atomic_load into two loads - one for the value of 'a', and one for your 'proof' that the store is unconditional. I can't follow that, because it isn't clear to me which code sequences are meant to belong together, and which transformations the compiler is supposed to make. If you would clarify that, then I can reply to this part. Basically, if the compiler allows the condition of I wrote 3 to the y, but the programmer sees 'a' has another value than 1 later then the compiler is one buggy pile of shit. It fundamentally broke the whole concept of atomic accesses. Basically the atomic access to 'x' turned into two different accesses: the one that proved that x had the value 1 (and caused the value 3 to be written), and the other load that then write that other value into 'a'. It's really not that complicated. And this is why descriptions like this should ABSOLUTELY NOT BE WRITTEN as if the compiler can prove that 'x' had the value 1, it can remove the branch. Because that IS NOT SUFFICIENT. That was not a valid transformation of the atomic load. The only valid transformation was the one I stated, namely to remove the load entirely and replace it with the value written earlier in the same execution context. Really, why is so hard to understand? Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:14 -0800, Paul E. McKenney wrote: On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote: On February 17, 2014 7:18:15 PM GMT+01:00, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote: On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote: You need volatile semantics to force the compiler to ignore any proofs it might otherwise attempt to construct. Hence all the ACCESS_ONCE() calls in my email to Torvald. (Hopefully I translated your example reasonably.) My brain gave out for today; but it did appear to have the right structure. I can relate. ;-) I would prefer it C11 would not require the volatile casts. It should simply _never_ speculate with atomic writes, volatile or not. I agree with not needing volatiles to prevent speculated writes. However, they will sometimes be needed to prevent excessive load/store combining. The compiler doesn't have the runtime feedback mechanisms that the hardware has, and thus will need help from the developer from time to time. Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics until the compiler has learned to be sufficiently conservative in its load-store combining decisions. Sounds backwards. Currently the compiler does nothing to the atomics. I'm sure we'll eventually add something. But if testing coverage is zero outside then surely things get worse, not better with time. Perhaps we solve this chicken-and-egg problem by creating a test suite? Perhaps. The test suite might also be a good set of examples showing which cases we expect to be optimized in a certain way, and which not. I suppose the uses of (the equivalent) of atomics in the kernel would be a good start.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote: The argument was that an lvalue doesn't actually access the memory (an rvalue does), so this: volatile int *p = ...; *p; doesn't need to generate a load from memory, because *p is still an lvalue (since you could assign things to it). This isn't an issue in C, because in C, expression statements are always rvalues, but C++ changed that. Huhh. I can see the problems that this creates in terms of C/C++ compatibility. That's not the biggest problem. The biggest problem is that you have compiler writers that don't care about sane *use* of the features they write a compiler for, they just care about the standard. So they don't care about C vs C++ compatibility. Even more importantly, they don't care about the *user* that uses only C++ and the fact that their reading of the standard results in *meaningless* behavior. They point to the standard and say that's what the standard says, suck it, and silently generate code (or in this case, avoid generating code) that makes no sense. So it's not about C++ being incompatible with C, it's about C++ having insane and bad semantics unless you just admit that oh, ok, I need to not just read the standard, I also need to use my brain, and admit that a C++ statement expression needs to act as if it is an access wrt volatile variables. In other words, as a compiler person, you do need to read more than the paper of standard. You need to also take into account what is reasonable behavior even when the standard could possibly be read some other way. And some compiler people don't. The volatile access in statement expression did get resolved, sanely, at least in gcc. I think gcc warns about some remaining cases. Btw, afaik, C++11 actually clarifies the standard to require the reads, because everybody *knew* that not requiring the read was insane and meaningless behavior, and clearly against the intent of volatile. But that didn't stop compiler writers from saying hey, the standard allows my insane and meaningless behavior, so I'll implement it and not consider it a bug. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 17 Feb 2014, Torvald Riegel wrote: On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote: On Sat, 15 Feb 2014, Torvald Riegel wrote: glibc is a counterexample that comes to mind, although it's a smaller code base. (It's currently not using C11 atomics, but transitioning there makes sense, and some thing I want to get to eventually.) glibc is using C11 atomics (GCC builtins rather than _Atomic / stdatomic.h, but using __atomic_* with explicitly specified memory model rather than the older __sync_*) on AArch64, plus in certain cases on ARM and MIPS. I think the major steps remaining is moving the other architectures over, and rechecking concurrent code (e.g., for the code that I have I don't think we'll be ready to require GCC = 4.7 to build glibc for another year or two, although probably we could move the requirement up from 4.4 to 4.6. (And some platforms only had the C11 atomics optimized later than 4.7.) -- Joseph S. Myers jos...@codesourcery.com
Re: MSP430 in gcc4.9 ... enable interrupts?
I presume these will be part of the headers for the library distributed for msp430 gcc by TI/Redhat? I can't speak for TI's or Red Hat's plans. GNU's typical non-custom embedded runtime is newlib/libgloss, which usually doesn't have that much in the way of chip-specific headers or library functions. is that for the critical attribute that exists in the old msp430 port (which disables interrupts for the duration of the function)? Yes, for things like that. They're documented under Function Attributes in the Extensions to the C Language Family chapter of the current GCC manual.
Re: [RFC][PATCH 0/5] arch: atomic rework
On 17/02/14 20:18, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegeltrie...@redhat.com wrote: Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? Well, Paul seems to still think that the standard possibly allows speculative writes or possibly value speculation in ways that break the hardware-guaranteed orderings. And personally, I can't read standards paperwork. It is invariably Can't = Don't - evidently. written in some basically impossible-to-understand lawyeristic mode, You mean unambiguous - try reading a patent (Apple have 1000s of trivial ones, I tried reading one once thinking how could they have phrased it so this got approved, their technique was to make the reader want to start cutting themselves to prove they wern't numb to everything) and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering (that depends on what the meaning of 'is' is). The whole lvalue vs rvalue expression vs 'what is a volatile access' thing for C++ was/is a great example of that. I'm not going to teach you what rvalues and lvalues, but! http://lmgtfy.com/?q=what+are+rvalues might help. So quite frankly, as a result I refuse to have anything to do with the process directly. Is this goodbye? Linus That aside, what is the problem? If the compiler has created code that that has different program states than what would be created without optimisation please file a bug report and/or send something to the mailing list USING A CIVIL TONE, there's no need for swear-words and profanities all the time - use them when you want to emphasise something. Additionally if you are always angry, start calling that state normal then reserve such words for when you are outraged. There are so many emails from you bitching about stuff, I've lost track of what you're bitching about you bitch that much about it. Like this standards stuff above (notice I said stuff, not crap or shit). What exactly is your problem, if the compiler is doing something the standard does not permit, or optimising something wrongly (read: puts the program in a different state than if the optimisation was not applied) that is REALLY serious, you are right to report it; but whining like a n00b on Stack-overflow when a question gets closed is not helping. I tried reading back though the emails (I dismissed them previously) but there's just so much ranting, and rants about the standard too (I would trash this if I deemed the effort required to delete was less than the storage of the bytes the message takes up) standardised behaviour is VERY important. So start again, what is the serious problem, have you got any code that would let me replicate it, what is your version of GCC? Oh and lastly! Optimisations are not as casual as oh, we could do this and it'd work better unlike kernel work or any other software that is being improved, it is very formal (and rightfully so). I seriously recommend you read the first 40 pages at least of a book called Compiler Design, Analysis and Transformation it's not about the parsing phases or anything, but it develops a good introduction and later a good foundation for exploring the field further. Compilers do not operate on what I call A-level logic and to show what I mean I use the shovel-to-the-face of real analysis, of course 1/x tends towards 0, it's not gonna be 5!! = A-level logic. Let epsilon 0 be given, then there exists an N - formal proof. So when one says the compiler can prove it's not some silly thing powered by A-level logic, it is the implementation of something that can be proven to be correct (in the sense of the program states mentioned before) So yeah, calm down and explain - no lashing out at standards bodies, what is the problem? Alec
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:32 -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel trie...@redhat.com wrote: On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote: if (atomic_load(x, mo_relaxed) == 1) atomic_store(y, 3, mo_relaxed)); No, please don't use this idiotic example. It is wrong. It won't be useful in practice in a lot of cases, but that doesn't mean it's wrong. It's clearly not illegal code. It also serves a purpose: a simple example to reason about a few aspects of the memory model. It's not illegal code, but i you claim that you can make that store unconditional, it's a pointless and wrong example. The fact is, if a compiler generates anything but the obvious sequence (read/cmp/branch/store - where branch/store might obviously be done with some other machine conditional like a predicate), the compiler is wrong. Why? I've reasoned why (1) to (3) above allow in certain cases (i.e., the first load always returning 1) for the branch (or other machine conditional) to not be emitted. So please either poke holes into this reasoning, or clarify that you don't in fact, contrary to what you wrote above, agree with (1) to (3). The thing is, the first load DOES NOT RETURN 1. It returns whatever that memory location contains. End of story. The memory location is just an abstraction for state, if it's not volatile. Stop claiming it can return 1.. It *never* returns 1 unless you do the load and *verify* it, or unless the load itself can be made to go away. And with the code sequence given, that just doesn't happen. END OF STORY. void foo(); { atomicint x = 1; if (atomic_load(x, mo_relaxed) == 1) atomic_store(y, 3, mo_relaxed)); } This is a counter example to your claim, and yes, the compiler has proof that x is 1. It's deliberately simple, but I can replace this with other more advanced situations. For example, if x comes out of malloc (or, on the kernel side, something else that returns non-aliasing memory) and hasn't provably escaped to other threads yet. I haven't posted this full example, but I've *clearly* said that *if* the compiler can prove that the load would always return 1, it can remove it. And it's simple to see why that's the case: If this holds, then in all allowed executions it would load from a know store, the relaxed_mo gives no further ordering guarantees so we can just take the value, and we're good. So your argument is *shit*. Why do you continue to argue it? Maybe because it isn't? Maybe you should try to at least trust that my intentions are good, even if distrusting my ability to reason. I told you how that load can go away, and you agreed. But IT CANNOT GO AWAY any other way. You cannot claim the compiler knows. The compiler doesn't know. It's that simple. Oh yes it can. Because of the same rules that allow you to perform the other transformations. Please try to see the similarities here. You previously said you don't want to mix volatile semantics and atomics. This is something that's being applied in this example. So why do I say you are wrong, after I just gave you an example of how it happens? Because my example went back to the *real* issue, and there are actual real semantically meaningful details with doing things like load merging. To give an example, let's rewrite things a bit more to use an extra variable: atomic_store(x, 1, mo_relaxed); a = atomic_load(1, mo_relaxed); if (a == 1) atomic_store(y, 3, mo_relaxed); which looks exactly the same. I'm confused. Is this a new example? That is a new example. The important part is that it has left a trace for the programmer: because 'a' contains the value, the programmer can now look at the value later and say oh, we know we did a store iff a was 1 This sequence: atomic_store(x, 1, mo_relaxed); a = atomic_load(x, mo_relaxed); atomic_store(y, 3, mo_relaxed); is actually - and very seriously - buggy. Why? Because you have effectively split the atomic_load into two loads - one for the value of 'a', and one for your 'proof' that the store is unconditional. I can't follow that, because it isn't clear to me which code sequences are meant to belong together, and which transformations the compiler is supposed to make. If you would clarify that, then I can reply to this part. Basically, if the compiler allows the condition of I wrote 3 to the y, but the programmer sees 'a' has another value than 1 later then the compiler is one buggy pile of shit. It fundamentally broke the whole concept of atomic accesses. Basically the atomic access to 'x' turned into two different accesses: the one that proved that x had the value 1 (and caused the value 3 to be written), and the other load that then write that other value into 'a'. It's really not that complicated. Yes that's not complicated, but I assumed this to be
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-17 at 14:47 -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote: The argument was that an lvalue doesn't actually access the memory (an rvalue does), so this: volatile int *p = ...; *p; doesn't need to generate a load from memory, because *p is still an lvalue (since you could assign things to it). This isn't an issue in C, because in C, expression statements are always rvalues, but C++ changed that. Huhh. I can see the problems that this creates in terms of C/C++ compatibility. That's not the biggest problem. The biggest problem is that you have compiler writers that don't care about sane *use* of the features they write a compiler for, they just care about the standard. So they don't care about C vs C++ compatibility. Even more importantly, they don't care about the *user* that uses only C++ and the fact that their reading of the standard results in *meaningless* behavior. They point to the standard and say that's what the standard says, suck it, and silently generate code (or in this case, avoid generating code) that makes no sense. There's an underlying problem here that's independent from the actual instance that you're worried about here: no sense is a ultimately a matter of taste/objectives/priorities as long as the respective specification is logically consistent. If you want to be independent of your sanity being different from other people's sanity (e.g., compiler writers), you need to make sure that the specification is precise and says what you want. IOW, think about the specification being the program, and the people being computers; you better want a well-defined program in this case. So it's not about C++ being incompatible with C, it's about C++ having insane and bad semantics unless you just admit that oh, ok, I need to not just read the standard, I also need to use my brain, and admit that a C++ statement expression needs to act as if it is an access wrt volatile variables. 1) I agree that (IMO) a good standard strives for being easy to understand. 2) In practice, there is a trade-off between Easy to understand and actually producing a specification. A standard is not a tutorial. And that's for good reason, because (a) there might be more than one way to teach something and that should be allowed and (b) that the standard should carry the full precision but still be compact enough to be manageable. 3) Implementations can try to be nice to users by helping them avoiding error-prone corner cases or such. A warning for common problems is such a case. But an implementation has to draw a line somewhere, demarcating cases where it fully exploits what the standard says (eg, to allow optimizations) from cases where it is more conservative and does what the standard allows but in a potentially more intuitive way. That's especially the case if it's being asked to produce high-performance code. 4) There will be arguments for where the line actually is, simply because different users will have different goals. 5) The way to reduce 4) is to either make the standard more specific, or to provide better user documentation. If the standard has strict requirements, then there will be less misunderstanding. 6) To achieve 5), one way is to get involved in the standards process.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 3:10 PM, Alec Teal a.t...@warwick.ac.uk wrote: You mean unambiguous - try reading a patent (Apple have 1000s of trivial ones, I tried reading one once thinking how could they have phrased it so this got approved, their technique was to make the reader want to start cutting themselves to prove they wern't numb to everything) Oh, I agree, patent language is worse. I'm not going to teach you what rvalues and lvalues, but! I know what lvalues and rvalues are. I *understand* the thinking that goes on behind the let's not do the access, because it's not an rvalue, so there is no 'access' to the object. I understand it from a technical perspective. I don't understand the compiler writer that uses a *technicality* to argue against generating sane code that is obviously what the user actually asked for. See the difference? So start again, what is the serious problem, have you got any code that would let me replicate it, what is your version of GCC? The volatile problem is long fixed. The people who argued for the legalistically correct, but insane behavior lost (and as mentioned, I think C++11 actually fixed the legalistic reading too). I'm bringing it up because I've had too many cases where compiler writers pointed to standard and said that is ambiguous or undefined, so we can do whatever the hell we want, regardless of whether that's sensible, or regardless of whether there is a sensible way to get the behavior you want or not. Oh and lastly! Optimisations are not as casual as oh, we could do this and it'd work better unlike kernel work or any other software that is being improved, it is very formal (and rightfully so) Alec, I know compilers. I don't do code generation (quite frankly, register allocation and instruction choice is when I give up), but I did actually write my own for static analysis, including turning things into SSA etc. No, I'm not a compiler person, but I actually do know enough that I understand what goes on. And exactly because I know enough, I would *really* like atomics to be well-defined, and have very clear - and *local* - rules about how they can be combined and optimized. None of this if you can prove that the read has value X stuff. And things like value speculation should simply not be allowed, because that actually breaks the dependency chain that the CPU architects give guarantees for. Instead, make the rules be very clear, and very simple, like my suggestion. You can never remove a load because you can prove it has some value, but you can combine two consecutive atomic accesses/ For example, CPU people actually do tend to give guarantees for certain things, like stores that are causally related being visible in a particular order. If the compiler starts doing value speculation on atomic accesses, you are quite possibly breaking things like that. It's just not a good idea. Don't do it. Write the standard so that it clearly is disallowed. Because you may think that a C standard is machine-independent, but that isn't really the case. The people who write code still write code for a particular machine. Our code works (in the general case) on different byte orderings, different register sizes, different memory ordering models. But in each *instance* we still end up actually coding for each machine. So the rules for atomics should be simple and *specific* enough that when you write code for a particular architecture, you can take the architecture memory ordering *and* the C atomics orderings into account, and do the right thing for that architecture. And that very much means that doing things like value speculation MUST NOT HAPPEN. See? Even if you can prove that your code is equivalent, it isn't. So for example, let's say that you have a pointer, and you have some reason to believe that the pointer has a particular value. So you rewrite following the pointer from this: value = ptr-val; into value = speculated-value; tmp = ptr; if (unlikely(tmp != speculated)) value = tmp-value; and maybe you can now make the critical code-path for the speculated case go faster (since now there is no data dependency for the speculated case, and the actual pointer chasing load is now no longer in the critical path), and you made things faster because your profiling showed that the speculated case was true 99% of the time. Wonderful, right? And clearly, the code provably does the same thing. EXCEPT THAT IS NOT TRUE AT ALL. It very much does not do the same thing at all, and by doing value speculation and proving something was true, the only thing you did was to make incorrect code run faster. Because now the causally related load of value from the pointer isn't actually causally related at all, and you broke the memory ordering. This is why I don't like it when I see Torvald talk about proving things. It's bullshit. You can prove pretty much anything, and in the process lose sight of the bigger issue, namely that there is code that
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 3:17 PM, Torvald Riegel trie...@redhat.com wrote: On Mon, 2014-02-17 at 14:32 -0800, Stop claiming it can return 1.. It *never* returns 1 unless you do the load and *verify* it, or unless the load itself can be made to go away. And with the code sequence given, that just doesn't happen. END OF STORY. void foo(); { atomicint x = 1; if (atomic_load(x, mo_relaxed) == 1) atomic_store(y, 3, mo_relaxed)); } This is the very example I gave, where the real issue is not that you prove that load returns 1, you instead say store followed by a load can be combined. I (in another email I just wrote) tried to show why the prove something is true is a very dangerous model. Seriously, it's pure crap. It's broken. If the C standard defines atomics in terms of provable equivalence, it's broken. Exactly because on a *virtual* machine you can prove things that are not actually true in a *real* machine. I have the example of value speculation changing the memory ordering model of the actual machine. See? Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel trie...@redhat.com wrote: There's an underlying problem here that's independent from the actual instance that you're worried about here: no sense is a ultimately a matter of taste/objectives/priorities as long as the respective specification is logically consistent. Yes. But I don't think it's independent. Exactly *because* some people will read standards without applying does the resulting code generation actually make sense for the programmer that wrote the code, the standard has to be pretty clear. The standard often *isn't* pretty clear. It wasn't clear enough when it came to volatile, and yet that was a *much* simpler concept than atomic accesses and memory ordering. And most of the time it's not a big deal. But because the C standard generally tries to be very portable, and cover different machines, there tends to be a mindset that anything inherently unportable is undefined or implementation defined, and then the compiler writer is basically given free reign to do anything they want (with implementation defined at least requiring that it is reliably the same thing). And when it comes to memory ordering, *everything* is basically non-portable, because different CPU's very much have different rules. I worry that that means that the standard then takes the stance that well, compiler re-ordering is no worse than CPU re-ordering, so we let the compiler do anything. And then we have to either add volatile to make sure the compiler doesn't do that, or use an overly strict memory model at the compiler level that makes it all pointless. So I really really hope that the standard doesn't give compiler writers free hands to do anything that they can prove is equivalent in the virtual C machine model. That's not how you get reliable results. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 04:18:52PM -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel trie...@redhat.com wrote: There's an underlying problem here that's independent from the actual instance that you're worried about here: no sense is a ultimately a matter of taste/objectives/priorities as long as the respective specification is logically consistent. Yes. But I don't think it's independent. Exactly *because* some people will read standards without applying does the resulting code generation actually make sense for the programmer that wrote the code, the standard has to be pretty clear. The standard often *isn't* pretty clear. It wasn't clear enough when it came to volatile, and yet that was a *much* simpler concept than atomic accesses and memory ordering. And most of the time it's not a big deal. But because the C standard generally tries to be very portable, and cover different machines, there tends to be a mindset that anything inherently unportable is undefined or implementation defined, and then the compiler writer is basically given free reign to do anything they want (with implementation defined at least requiring that it is reliably the same thing). And when it comes to memory ordering, *everything* is basically non-portable, because different CPU's very much have different rules. I worry that that means that the standard then takes the stance that well, compiler re-ordering is no worse than CPU re-ordering, so we let the compiler do anything. And then we have to either add volatile to make sure the compiler doesn't do that, or use an overly strict memory model at the compiler level that makes it all pointless. For whatever it is worth, this line of reasoning has been one reason why I have been objecting strenuously every time someone on the committee suggests eliminating volatile from the standard. Thanx, Paul So I really really hope that the standard doesn't give compiler writers free hands to do anything that they can prove is equivalent in the virtual C machine model. That's not how you get reliable results. Linus
RE: Vectorizer Pragmas
The way Intel present #pragma simd (to users, to the OpenMP committee, to the C and C++ committees, etc) is that it is not a hint, it has a meaning. The meaning is defined in term of evaluation order. Both C and C++ define an evaluation order for sequential programs. #pragma simd relaxes the sequential order into a partial order: 0. subsequent iterations of the loop are chunked together and execute in lockstep 1. there is no change in the order of evaluation of expression within an iteration 2. if X and Y are expressions in the loop, and X(i) is the evaluation of X in iteration i, then for X sequenced before Y and iteration i evaluated before iteration j, X(i) is sequenced before Y(j). A corollary is that the sequential order is always allowed, since it satisfies the partial order. However, the partial order allows the compiler to group copies of the same expression next to each other, and then to combine the scalar instructions into a vector instruction. There are other corollaries, such as that if multiple loop iterations write into an object defined outside of the loop then it has to be an undefined behavior, the vector moral equivalent of a data race. That is what induction variables and reductions are necessary exception to this rule and require explicit support. As far as correctness, by this definition, the programmer expressed that it is correct, and the compiler should not try to prove correctness. On performance heuristics side, the Intel compiler tries to not second guess the user. There are users who work much harder than just add a #pragma simd on unmodified sequential loops. There are various changes that may be necessary, and users who worked hard to get their loops in a good shape are unhappy if the compiler does second guess them. Robert. -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Renato Golin Sent: Monday, February 17, 2014 7:14 AM To: tpri...@computer.org Cc: gcc Subject: Re: Vectorizer Pragmas On 17 February 2014 14:47, Tim Prince n...@aol.com wrote: I'm continuing discussions with former Intel colleagues. If you are asking for insight into how Intel priorities vary over time, I don't expect much, unless the next beta compiler provides some inferences. They have talked about implementing all of OpenMP 4.0 except user defined reduction this year. That would imply more activity in that area than on cilkplus, I'm expecting this. Any proposal to support Cilk in LLVM would be purely temporary and not endorsed in any way. although some fixes have come in the latter. On the other hand I had an issue on omp simd reduction(max: ) closed with the decision will not be fixed. We still haven't got pragmas for induction/reduction logic, so I'm not too worried about them. I have an icc problem report in on fixing omp simd safelen so it is more like the standard and less like the obsolete pragma simd vectorlength. Our width metadata is slightly different in that it means try to use that length, rather than it's safe to use that length, this is why I'm holding on use safelen for the moment. Also, I have some problem reports active attempting to get clarification of their omp target implementation. Same here... RTFM is not enough in this case. ;) You may have noticed that omp parallel for simd in current Intel compilers can be used for combined thread and simd parallelism, including the case where the outer loop is parallelizable and vectorizable but the inner one is not. That's my fear of going with omp simd directly. I don't want to be throwing threads all over the place when all I really want is vector code. For the time, my proposal is to use legacy pragmas: vector/novector, unroll/nounroll and simd vectorlength which map nicely to the metadata we already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up with simple non-threaded pragmas, we should use those and deprecate the legacy ones. If GCC is trying to do the same thing regarding non-threaded-vector code, I'd be glad to be involved in the discussion. Some LLVM folks think this should be an OpenMP discussion, I personally think it's pushing the boundaries a bit too much on an inherently threaded library extension. cheers, --renato
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 12:18:21PM -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel trie...@redhat.com wrote: Which example do you have in mind here? Haven't we resolved all the debated examples, or did I miss any? Well, Paul seems to still think that the standard possibly allows speculative writes or possibly value speculation in ways that break the hardware-guaranteed orderings. It is not that I know of any specific problems, but rather that I know I haven't looked under all the rocks. Plus my impression from my few years on the committee is that the standard will be pushed to the limit when it comes time to add optimizations. One example that I learned about last week uses the branch-prediction hardware to validate value speculation. And no, I am not at all a fan of value speculation, in case you were curious. However, it is still an educational example. This is where you start: p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */ do_something(p-a, p-b, p-c); p-d = 1; Then you leverage branch-prediction hardware as follows: p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */ if (p == GUESS) { do_something(GUESS-a, GUESS-b, GUESS-c); GUESS-d = 1; } else { do_something(p-a, p-b, p-c); p-d = 1; } The CPU's branch-prediction hardware squashes speculation in the case where the guess was wrong, and this prevents the speculative store to -d from ever being visible. However, the then-clause breaks dependencies, which means that the loads -could- be speculated, so that do_something() gets passed pre-initialization values. Now, I hope and expect that the wording in the standard about dependency ordering prohibits this sort of thing. But I do not yet know for certain. And yes, I am being paranoid. But not unnecessarily paranoid. ;-) Thanx, Paul And personally, I can't read standards paperwork. It is invariably written in some basically impossible-to-understand lawyeristic mode, and then it is read by people (compiler writers) that intentionally try to mis-use the words and do language-lawyering (that depends on what the meaning of 'is' is). The whole lvalue vs rvalue expression vs 'what is a volatile access' thing for C++ was/is a great example of that. So quite frankly, as a result I refuse to have anything to do with the process directly. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: One example that I learned about last week uses the branch-prediction hardware to validate value speculation. And no, I am not at all a fan of value speculation, in case you were curious. Heh. See the example I used in my reply to Alec Teal. It basically broke the same dependency the same way. Yes, value speculation of reads is simply wrong, the same way speculative writes are simply wrong. The dependency chain matters, and is meaningful, and breaking it is actively bad. As far as I can tell, the intent is that you can't do value speculation (except perhaps for the relaxed, which quite frankly sounds largely useless). But then I do get very very nervous when people talk about proving certain values. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds torva...@linux-foundation.org wrote: As far as I can tell, the intent is that you can't do value speculation (except perhaps for the relaxed, which quite frankly sounds largely useless). Hmm. The language I see for consume is not obvious: Consume operation: no reads in the current thread dependent on the value currently loaded can be reordered before this load and it could make a compiler writer say that value speculation is still valid, if you do it like this (with ptr being the atomic variable): value = ptr-val; into tmp = ptr; value = speculated.value; if (unlikely(tmp != speculated)) value = tmp-value; which is still bogus. The load of ptr does happen before the load of value = speculated-value in the instruction stream, but it would still result in the CPU possibly moving the value read before the pointer read at least on ARM and power. So if you're a compiler person, you think you followed the letter of the spec - as far as *you* were concerned, no load dependent on the value of the atomic load moved to before the atomic load. You go home, happy, knowing you've done your job. Never mind that you generated code that doesn't actually work. I dread having to explain to the compiler person that he may be right in some theoretical virtual machine, but the code is subtly broken and nobody will ever understand why (and likely not be able to create a test-case showing the breakage). But maybe the full standard makes it clear that reordered before this load actually means on the real hardware, not just in the generated instruction stream. Reading it with understanding of the *intent* and understanding all the different memory models that requirement should be obvious (on alpha, you need an rmb instruction after the load), but ... Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 07:24:56PM -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: One example that I learned about last week uses the branch-prediction hardware to validate value speculation. And no, I am not at all a fan of value speculation, in case you were curious. Heh. See the example I used in my reply to Alec Teal. It basically broke the same dependency the same way. ;-) Yes, value speculation of reads is simply wrong, the same way speculative writes are simply wrong. The dependency chain matters, and is meaningful, and breaking it is actively bad. As far as I can tell, the intent is that you can't do value speculation (except perhaps for the relaxed, which quite frankly sounds largely useless). But then I do get very very nervous when people talk about proving certain values. That was certainly my intent, but as you might have notice in the discussion earlier in this thread, the intent can get lost pretty quickly. ;-) The HPC guys appear to be the most interested in breaking dependencies. Their software does't rely on dependencies, and from their viewpoint anything that has any chance of leaving an FP unit of any type idle is a very bad thing. But there are probably other benchmarks for which breaking dependencies gives a few percent performance boost. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 07:42:42PM -0800, Linus Torvalds wrote: On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds torva...@linux-foundation.org wrote: As far as I can tell, the intent is that you can't do value speculation (except perhaps for the relaxed, which quite frankly sounds largely useless). Hmm. The language I see for consume is not obvious: Consume operation: no reads in the current thread dependent on the value currently loaded can be reordered before this load and it could make a compiler writer say that value speculation is still valid, if you do it like this (with ptr being the atomic variable): value = ptr-val; into tmp = ptr; value = speculated.value; if (unlikely(tmp != speculated)) value = tmp-value; which is still bogus. The load of ptr does happen before the load of value = speculated-value in the instruction stream, but it would still result in the CPU possibly moving the value read before the pointer read at least on ARM and power. So if you're a compiler person, you think you followed the letter of the spec - as far as *you* were concerned, no load dependent on the value of the atomic load moved to before the atomic load. You go home, happy, knowing you've done your job. Never mind that you generated code that doesn't actually work. Agreed, that would be bad. But please see below. I dread having to explain to the compiler person that he may be right in some theoretical virtual machine, but the code is subtly broken and nobody will ever understand why (and likely not be able to create a test-case showing the breakage). If things go as they usually do, such explanations will be required a time or two. But maybe the full standard makes it clear that reordered before this load actually means on the real hardware, not just in the generated instruction stream. Reading it with understanding of the *intent* and understanding all the different memory models that requirement should be obvious (on alpha, you need an rmb instruction after the load), but ... The key point with memory_order_consume is that it must be paired with some sort of store-release, a category that includes stores tagged with memory_order_release (surprise!), memory_order_acq_rel, and memory_order_seq_cst. This pairing is analogous to the memory-barrier pairing in the Linux kernel. So you have something like this for the rcu_assign_pointer() side: p = kmalloc(...); if (unlikely(!p)) return -ENOMEM; p-a = 1; p-b = 2; p-c = 3; /* The following would be buried within rcu_assign_pointer(). */ atomic_store_explicit(gp, p, memory_order_release); And something like this for the rcu_dereference() side: /* The following would be buried within rcu_dereference(). */ q = atomic_load_explicit(gp, memory_order_consume); do_something_with(q-a); So, let's look at the C11 draft, section 5.1.2.4 Multi-threaded executions and data races. 5.1.2.4p14 says that the atomic_load_explicit() carries a dependency to the argument of do_something_with(). 5.1.2.4p15 says that the atomic_store_explicit() is dependency-ordered before the atomic_load_explicit(). 5.1.2.4p15 also says that the atomic_store_explicit() is dependency-ordered before the argument of do_something_with(). This is because if A is dependency-ordered before X and X carries a dependency to B, then A is dependency-ordered before B. 5.1.2.4p16 says that the atomic_store_explicit() inter-thread happens before the argument of do_something_with(). The assignment to p-a is sequenced before the atomic_store_explicit(). Therefore, combining these last two, the assignment to p-a happens before the argument of do_something_with(), and that means that do_something_with() had better see the 1 assigned to p-a or some later value. But as far as I know, compiler writers currently take the approach of treating memory_order_consume as if it was memory_order_acquire. Which certainly works, as long as ARM and PowerPC people don't mind an extra memory barrier out of each rcu_dereference(). Which is one thing that compiler writers are permitted to do according to the standard -- substitute a memory-barrier instruction for any given dependency... Thanx, Paul
Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...
Hi, I am developing plugins for the GCC-4.8.2. I am a newbie in plugins. I wrote a plugin and tried to count and see the Goto Statements using the gimple_stmt_iterator. I get gimple statements printed on my stdout, but I am not able to find the line which has goto statements. I only get other lines such as variable declaration and logic statements, but no goto statements. When I open the Gimple/SSA/CFG file seperately using the vim editor I find the goto statements are actually present. So, can anyone help me. How can I actually get the count of Goto statements or atleast access these goto statements using some iterator. I have used -fdump-tree-all, -fdump-tree-cfg as flags. Here is the pseudocode: struct register_pass_info pass_info = { (pass_plugin.pass), /* Address of new pass, here, the 'struct opt_pass' field of 'gimple_opt_pass' defined above */ ssa, /* Name of the reference pass for hooking up the new pass. ??? */ 0, /* Insert the pass at the specified instance number of the reference pass. Do it for every instance if it is 0. */ PASS_POS_INSERT_AFTER/* how to insert the new pass: before, after, or replace. Here we are inserting a pass names 'plug' after the pass named 'pta' */ }; . static unsigned int dead_code_elimination (void) { FOR_EACH_BB_FN (bb, cfun) { // gimple_dump_bb(stdout,bb,0,0); //printf(\nIn New BB); gsi2= gsi_after_labels (bb); print_gimple_stmt(stdout,gsi_stmt(gsi2),0,0); /*Iterating over each gimple statement in a basic block*/ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) { g = gsi_stmt(gsi); print_gimple_stmt(stdout,g,0,0); if (gimple_code(g)==GIMPLE_GOTO) printf(\nFound GOTO stmt\n); print_gimple_stmt(stdout,gsi_stmt(gsi),0,0); //analyze_gimple_statement (gsi); } } }
Re: Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...
On Tue, 2014-02-18 at 11:17 +0530, Mohsin Khan wrote: Hi, I am developing plugins for the GCC-4.8.2. I am a newbie in plugins. I wrote a plugin and tried to count and see the Goto Statements using the gimple_stmt_iterator. I get gimple statements printed on my stdout, but I am not able to find the line which has goto statements. I guess that most GOTOs are just becoming implicit as the link to the next basic block. Probably if (!cond) goto end; something; end:; has nearly the same Gimple representation than while (cond) { something; } BTW, did you consider using MELT http://gcc-melt.org/ to code your GCC extension? -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029 Chengnian Sun chengniansun at gmail dot com changed: What|Removed |Added CC||chengniansun at gmail dot com --- Comment #4 from Chengnian Sun chengniansun at gmail dot com --- May I ask what is the design rational of not warning unused static const variables? I saw Clang has a different strategy, and it even has a type of warning -- [-Wunused-const-variable]
[Bug middle-end/60235] Inlining fails with template specialization and -fPIC on Linux AMD64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60235 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||hubicka at gcc dot gnu.org, ||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- The specialization is a regular function, not comdat, thus it is not appropriate to inline it at -O2 -fpic, only -O3 is inlining functions regardless to whether they could be interposed or not, or for -O2 without -fpic because the symbol can't be interposed. Or use the inline keyword for the specialization.
[Bug fortran/60191] test case gfortran.dg/dynamic_dispatch_1/3.f03 fail on ARMv7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60191 --- Comment #8 from Bernd Edlinger bernd.edlinger at hotmail dot de --- (In reply to janus from comment #5) (In reply to Bernd Edlinger from comment #3) The function make_real is not invoked directly, but through the type-bound a%real, which is called three times in the test case. Does the failure occur already at the first one (i.e. line 67)? Can you give a reduced test case? Yes it is in line 67. Ok, then I guess the following reduction should be enough to trigger the bug? module m type :: t1 integer :: i = 42 contains procedure, pass :: real = make_real end type contains real function make_real (arg) class(t1), intent(in) :: arg make_real = real (arg%i) end function make_real end module m use m class(t1), pointer :: a type(t1), target :: b a = b if (a%real() .ne. real (42)) call abort end The crash occurs if I add the line procedure(make_real), pointer :: ptr to type t1. Additionally you could try if calling 'make_real' directly (without the type-binding) works, i.e. replace the last line by: if (make_real(a) .ne. real (42)) call abort This line does not abort. The type-bound call is transformed into a procedure-pointer-component call, i.e. a._vptr-real (a). Do all the proc_ptr_comp_* test cases work on ARMv7? Yes, the test cases that failed with the last snapshot are: FAIL: gfortran.dg/dynamic_dispatch_1.f03 -O0 execution test FAIL: gfortran.dg/dynamic_dispatch_3.f03 -O0 execution test FAIL: gfortran.dg/select_type_4.f90 -O2 execution test This one might possibly be related. It also involves polymorphism (but no type-bound procedures). I think dynamic_dispatch_3.f03 duplicates this one. But I am not sure about select_type_4.f90: $ gfortran -O1 -g select_type_4.f90 -o select_type_4 $ ./select_type_4 1.2302 42 Node with no data. Some other node type. 4.5594 $ gfortran -O2 -g select_type_4.f90 -o select_type_4 $ ./select_type_4 1.2302 42 Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: Segmentation fault Program received signal SIGSEGV, Segmentation fault. MAIN__ () at select_type_4.f90:166 166if (cnt /= 4) call abort() but this statement is executed for the third time when the chash happens: Breakpoint 2, MAIN__ () at select_type_4.f90:166 166if (cnt /= 4) call abort() 2: /x $r2 = 0x8db4 1: x/i $pc = 0x8b14 MAIN__+608:ldrr3, [r2] (gdb) c Continuing. 1.2302 Breakpoint 2, MAIN__ () at select_type_4.f90:166 166if (cnt /= 4) call abort() 2: /x $r2 = 0x8dcc 1: x/i $pc = 0x8b14 MAIN__+608:ldrr3, [r2] (gdb) c Continuing. 42 Breakpoint 2, MAIN__ () at select_type_4.f90:166 166if (cnt /= 4) call abort() 2: /x $r2 = 0xf15aea17 1: x/i $pc = 0x8b14 MAIN__+608:ldrr3, [r2] this looks like some loop optimization problem.
[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Well, clang strategy seems to be not to bother with false positives and always prefer warning over not warning on anything, so usually the clang output is just completely unreadable because among the tons of false positives it is hard to find actual real code problems. GCC strategy is to find some ballance between false positive warnings and missed warnings.
[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029 --- Comment #6 from Mikael Pettersson mikpelinux at gmail dot com --- (In reply to Chengnian Sun from comment #4) May I ask what is the design rational of not warning unused static const variables? See PR28901. There are cases of unused static const where the warning isn't wanted, and so far the decision has been to favour those over the cases where the warning _is_ wanted and would have detected real bugs. Sigh.
[Bug driver/60233] AVX instructions emitted with -march=native on host without AVX support
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60233 --- Comment #5 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Jakub Jelinek from comment #4) I think the reason for this is that -march=native passes in your case -mf16c, and -mf16c implies -mavx. So, either OPTION_MASK_ISA_F16C_SET should not include OPTION_MASK_ISA_AVX_SET, or the driver shouldn't set -mf16c if AVX support is missing. As at least some of the F16C instructions use ymmN registers, if we'd change OPTION_MASK_ISA_F16C_SET, then the *256 TARGET_F16C patterns would also need to be guarded with TARGET_AVX. For the latter alternative, we would need to do something like: --- gcc/config/i386/driver-i386.c 2014-01-03 11:41:06.393269411 +0100 +++ gcc/config/i386/driver-i386.c 2014-02-17 07:32:41.289022308 +0100 @@ -513,6 +513,7 @@ const char *host_detect_local_cpu (int a has_avx2 = 0; has_fma = 0; has_fma4 = 0; + has_f16c = 0; has_xop = 0; has_xsave = 0; has_xsaveopt = 0; This is the correct approach. We already disable f16c for -mno-avx in common/config/i386/i386-common.c in this way, and it looks that driver-i386.c was not updated accordingly. There are no real processors with F16C and no AVX, but we should be consistent here and follow i386-common.c.
[Bug driver/60233] AVX instructions emitted with -march=native on host without AVX support
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60233 --- Comment #6 from Uroš Bizjak ubizjak at gmail dot com --- And while looking at driver-i386.c, it looks to me that the whole osxsave state check should be moved below (ext_level 0x8000) processing, otherwise we won't clear FMA4 and XOP flags correctly.
[Bug tree-optimization/60229] wrong code at -O2 and -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60229 --- Comment #3 from Mikael Pettersson mikpelinux at gmail dot com --- Technically there is an overflow there. But GCC defines conversion to a smaller signed integer type, when the value cannot be represented in that smaller type, as a non-signalling truncation. Still, portable code mustn't rely on that.
[Bug fortran/60231] [4.8/4.9 Regression] ICE on undefined generic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60231 janus at gcc dot gnu.org changed: What|Removed |Added Keywords||ice-on-invalid-code Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||janus at gcc dot gnu.org Summary|ICE on undefined generic|[4.8/4.9 Regression] ICE on ||undefined generic Ever confirmed|0 |1 --- Comment #1 from janus at gcc dot gnu.org --- Confirmed. The ICE occurs with 4.8 and trunk, but 4.7 gives the following: c0.f90:7.19: generic :: Add = Add1, Add2 1 Error: 'add1' and 'add2' for GENERIC 'add' at (1) are ambiguous c0.f90:5.12: procedure :: Add1 1 Error: 'add1' must be a module procedure or an external procedure with an explicit interface at (1) c0.f90:6.12: procedure :: Add2 1 Error: 'add2' must be a module procedure or an external procedure with an explicit interface at (1) About the first error one can argue, but the second and third one are certainly correct. Thus the ICE is a regresssion.
[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029 --- Comment #7 from Chengnian Sun chengniansun at gmail dot com --- Thanks, Jakub and Mikael. I see it now. IMHO, it might be worthy to add a flag -Wunused-const-variable similar to Clang, which is not included either -Wall or -Wextra. Therefore the end user can decide whether to enable this warning based on their specific scenario. I think it is better than the current case that people who need this warning support cannot get it.
[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029 --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org --- (In reply to Chengnian Sun from comment #7) Thanks, Jakub and Mikael. I see it now. IMHO, it might be worthy to add a flag -Wunused-const-variable similar to Clang, which is not included either -Wall or -Wextra. Therefore the end user can decide whether to enable this warning based on their specific scenario. I think it is better than the current case that people who need this warning support cannot get it. Yeah, I guess that is a possibility.
[Bug fortran/60231] [4.8/4.9 Regression] ICE on undefined generic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60231 janus at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |janus at gcc dot gnu.org --- Comment #2 from janus at gcc dot gnu.org --- This draft patch fixes the ICE: Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c(revision 207804) +++ gcc/fortran/resolve.c(working copy) @@ -11362,6 +11362,7 @@ check_generic_tbp_ambiguity (gfc_tbp_generic* t1, { gfc_symbol *sym1, *sym2; const char *pass1, *pass2; + gfc_formal_arglist *dummy_args; gcc_assert (t1-specific t2-specific); gcc_assert (!t1-specific-is_generic); @@ -11384,19 +11385,33 @@ check_generic_tbp_ambiguity (gfc_tbp_generic* t1, return false; } - /* Compare the interfaces. */ + /* Determine PASS arguments. */ if (t1-specific-nopass) pass1 = NULL; else if (t1-specific-pass_arg) pass1 = t1-specific-pass_arg; else -pass1 = gfc_sym_get_dummy_args (t1-specific-u.specific-n.sym)-sym-name; +{ + dummy_args = gfc_sym_get_dummy_args (t1-specific-u.specific-n.sym); + if (dummy_args) +pass1 = dummy_args-sym-name; + else +pass1 = NULL; +} if (t2-specific-nopass) pass2 = NULL; else if (t2-specific-pass_arg) pass2 = t2-specific-pass_arg; else -pass2 = gfc_sym_get_dummy_args (t2-specific-u.specific-n.sym)-sym-name; +{ + dummy_args = gfc_sym_get_dummy_args (t2-specific-u.specific-n.sym); + if (dummy_args) +pass2 = dummy_args-sym-name; + else +pass2 = NULL; +} + + /* Compare the interfaces. */ if (gfc_compare_interfaces (sym1, sym2, sym2-name, !t1-is_operator, 0, NULL, 0, pass1, pass2)) {
[Bug tree-optimization/60236] New: gfortran.dg/vect/pr32380.f fails on ARM
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60236 Bug ID: 60236 Summary: gfortran.dg/vect/pr32380.f fails on ARM Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bernd.edlinger at hotmail dot de Hi, this test case fails, because the only 5 of 6 loops get optimized: pr32380.f:162:0: note: function is not vectorizable. pr32380.f:162:0: note: not vectorized: relevant stmt not supported: _113 = __builtin_sqrtf (_112); pr32380.f:5:0: note: vectorized 5 loops in function. but the test case expects 6 loops here. gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/ed/gnu/arm-linux-gnueabihf/libexec/gcc/armv7l-unknown-linux-gnueabihf/4.9.0/lto-wrapper Target: armv7l-unknown-linux-gnueabihf Configured with: ../gcc-4.9-20140209/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf --enable-languages=c,c++,objc,obj-c++,fortran,ada,go --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard Thread model: posix gcc version 4.9.0 20140209 (experimental) (GCC)
[Bug c++/60215] [4.9 Regression] ICE with invalid bit-field size
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60215 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||jakub at gcc dot gnu.org, ||paolo at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Started with r205449.
[Bug fortran/60232] [OOP] The rank of the element in the structure constructor does not match that of the component
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60232 janus at gcc dot gnu.org changed: What|Removed |Added Keywords||rejects-valid Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||janus at gcc dot gnu.org Summary|OOP False Error: The rank |[OOP] The rank of the |of the element in the |element in the structure |structure constructor |constructor does not match ||that of the component Ever confirmed|0 |1 --- Comment #1 from janus at gcc dot gnu.org --- Reduced test case: module ObjectLists implicit none Type TObjectList contains procedure :: ArrayItem end Type contains function ArrayItem(L) result(P) Class(TObjectList) :: L Class(TObjectList), pointer :: P(:) end function end module use ObjectLists implicit none Type, extends(TObjectList):: TSampleList end Type contains subroutine TSampleList_ConfidVal(L) Class(TSampleList) :: L end subroutine end Same error with 4.7, 4.8 and trunk. (In 4.6 and earlier, polymorphic arrays are not supported yet.)
[Bug driver/60233] AVX instructions emitted with -march=native on host without AVX support
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60233 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 32151 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32151action=edit gcc49-pr60233.patch Untested fix.
[Bug tree-optimization/60229] wrong code at -O2 and -O3 on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60229 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- Well, the conversion is implementation-defined behavior, and GCC documents what it does in that case (does it?) and thus you can rely on it, and given that other compilers also have simimilar implementation-defined behavior choice for that case, you can portably assume it unless you are targetting extinct architectures.
[Bug tree-optimization/60172] ARM performance regression from trunk@207239
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 --- Comment #7 from Joey Ye joey.ye at arm dot com --- (In reply to Richard Biener from comment #5) (In reply to Joey Ye from comment #4) -fdisable-tree-forwprop4 doesn't help. -fno-tree-ter makes it even worse. The former is strange because it's the only pass that does sth that is changed by the patch? As said, make sure to include the fix for PR59993 in your testing. Does -fno-tree-forwprop fix the regression? I'm sorry what I meant was: -fdisable-tree-forwprop4 didn't make benchmark faster. Actually with -fdisable-tree-forwprop4 both revision before/after 207239 get the same lower score. 207239 O2: low 207238 O2: high 207239 O2 -fdisable-tree-forwprop4: low 207238 O2 -fdisable-tree-forwprop4: low
[Bug tree-optimization/60206] IVOPT has no idea of inline asm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 --- Comment #5 from rguenther at suse dot de rguenther at suse dot de --- On Fri, 14 Feb 2014, wmi at google dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 Bug ID: 60206 Summary: IVOPT has no idea of inline asm Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: rguenth at gcc dot gnu.org, shenhan at google dot com Host: i386 Target: i386 Created attachment 32141 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32141action=edit Testcase This bug is found in google branch but I think the same problem also exists on trunk (but not exposed). For the testcase 1.c attached (1.c is extracted from libgcc/soft-fp/divtf3.c), use trunk compiler gcc-r202164 (Target: x86_64-unknown-linux-gnu) + the patch r204497 could expose the problem. The command: gcc -v -O2 -fno-omit-frame-pointer -fpic -c -S -m32 1.c The error: ./1.c: In function ‘__divtf3’: ./1.c:64:1194: error: ‘asm’ operand has impossible constraints The inline asm in error message is as follow: do { __asm__ ( sub{l} {%11,%3|%3,%11}\n\t sbb{l} {%9,%2|%2,%9}\n\t sbb{l} {%7,%1|%1,%7}\n\t sbb{l} {%5,%0|%0,%5} : =r ((USItype) (A_f[3])), =r ((USItype) (A_f[2])), =r ((USItype) (A_f[1])), =r ((USItype) (A_f[0])) : 0 ((USItype) (B_f[2])), g ((USItype) (A_f[2])), 1 ((USItype) (B_f[1])), g ((USItype) (A_f[1])), 2 ((USItype) (B_f[0])), g ((USItype) (A_f[0])), 3 ((USItype) (0)), g ((USItype) (_n_f[_i]))); } while () Because -fno-omit-frame-pointer is turned on and the command line uses -fpic, there are only 5 registers for register allocation. Before IVOPT, %0, %1, %2, %3 require 4 registers. The index variable i of _n_f[_i] requires another register. So 5 registers are used up here. After IVOPT, MEM reference _n_f[_i] is converted to MEM[base: _874, index: ivtmp.22_821, offset: 0B]. base and index require 2 registers, Now 6 registers are required, so LRA cannot find enough registers to allocate. trunk compiler doesn't expose the problem because of patch r202165. With patch r202165, IVOPT doesn't change _n_f[_i] in inline asm above. But it just hided the problem. Should IVOPT care about the constraints in inline-asm and restrict its optimization in some case? It's true that ASMs are not in any way special cased - it may be worth trying if distinguishing address-uses and other uses may be worth it. It's only a cost thing, of course. In general find_interesting_uses_stmt may need some modernization.
[Bug tree-optimization/60172] ARM performance regression from trunk@207239
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 --- Comment #8 from Joey Ye joey.ye at arm dot com --- Here is tree dump and diff of 133t.forwprop4 bb 2: Int_Index_4 = Int_1_Par_Val_3(D) + 5; Int_Loc.0_5 = (unsigned int) Int_Index_4; _6 = Int_Loc.0_5 * 4; _8 = Arr_1_Par_Ref_7(D) + _6; *_8 = Int_2_Par_Val_10(D); _13 = _6 + 4; _14 = Arr_1_Par_Ref_7(D) + _13; *_14 = Int_2_Par_Val_10(D); _17 = _6 + 60; _18 = Arr_1_Par_Ref_7(D) + _17; *_18 = Int_Index_4; pretmp_20 = Int_Loc.0_5 * 100; pretmp_2 = Arr_2_Par_Ref_22(D) + pretmp_20; _42 = (sizetype) Int_1_Par_Val_3(D); _41 = _42 * 4; - _40 = pretmp_2 + _41; // good + _12 = _41 + pretmp_20; // bad + _40 = Arr_2_Par_Ref_22(D) + _12; // bad MEM[(int[25] *)_40 + 20B] = Int_Index_4; MEM[(int[25] *)_40 + 24B] = Int_Index_4; _29 = MEM[(int[25] *)_40 + 16B]; _30 = _29 + 1; MEM[(int[25] *)_40 + 16B] = _30; _32 = pretmp_20 + 1000; _33 = Arr_2_Par_Ref_22(D) + _32; _34 = *_8; - _51 = _33 + _41; // good + _16 = _41 + _32; // bad + _51 = Arr_2_Par_Ref_22(D) + _16; // bad MEM[(int[25] *)_51 + 20B] = _34; Int_Glob = 5; return;
[Bug c++/60222] [4.8/4.9 Regression] ICE with reference as template parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60222 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P1 Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||jakub at gcc dot gnu.org Version|4.9.0 |4.8.3 Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Indeed, started with r207167.
[Bug rtl-optimization/49847] [4.7/4.8/4.9 Regression] NULL deref in fold_rtx (prev_insn_cc0 == NULL)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847 --- Comment #33 from rguenther at suse dot de rguenther at suse dot de --- On Sun, 16 Feb 2014, law at redhat dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847 --- Comment #32 from Jeffrey A. Law law at redhat dot com --- The problem we're seeing is with the cc0-setter and cc0-user in different blocks, they're separated by a NOTE_BASIC_BLOCK. That causes CSE to blow up because it expects that the cc0-setter and cc0-user are always consecutive. While we're just seeing the failure in CSE right now, I'm sure there's a ton of places that assume the setter/user are inseparable as that has been the documented form for ~20 years. From rtl.texi: The instruction setting the condition code must be adjacent to the instruction using the condition code; only @code{note} insns may separate them. We either need to relax that and audit all the HAVE_cc0 code to ensure it doesn't make that assumption, or we need to somehow restore the property that the setter and user are inseparable. I think relaxing this constraint and allowing the cc0-setter and cc0-user be separated by a fallthru-edge should be allowed (and make sure that bb-reorder later doesn't separate the BBs)
[Bug c++/60219] [4.8/4.9 Regression] [c++11] ICE invalid use of variadic template
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60219 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||jakub at gcc dot gnu.org, ||jason at gcc dot gnu.org Version|4.9.0 |4.8.3 Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Started likely with r190653 (works with r190650, fails with r190662, coerce_template_parms+resolve_nondeduced_context in backtrace).
[Bug tree-optimization/60172] ARM performance regression from trunk@207239
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 --- Comment #9 from rguenther at suse dot de rguenther at suse dot de --- On Mon, 17 Feb 2014, joey.ye at arm dot com wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172 --- Comment #8 from Joey Ye joey.ye at arm dot com --- Here is tree dump and diff of 133t.forwprop4 bb 2: Int_Index_4 = Int_1_Par_Val_3(D) + 5; Int_Loc.0_5 = (unsigned int) Int_Index_4; _6 = Int_Loc.0_5 * 4; _8 = Arr_1_Par_Ref_7(D) + _6; *_8 = Int_2_Par_Val_10(D); _13 = _6 + 4; _14 = Arr_1_Par_Ref_7(D) + _13; *_14 = Int_2_Par_Val_10(D); _17 = _6 + 60; _18 = Arr_1_Par_Ref_7(D) + _17; *_18 = Int_Index_4; pretmp_20 = Int_Loc.0_5 * 100; pretmp_2 = Arr_2_Par_Ref_22(D) + pretmp_20; _42 = (sizetype) Int_1_Par_Val_3(D); _41 = _42 * 4; - _40 = pretmp_2 + _41; // good + _12 = _41 + pretmp_20; // bad + _40 = Arr_2_Par_Ref_22(D) + _12; // bad MEM[(int[25] *)_40 + 20B] = Int_Index_4; MEM[(int[25] *)_40 + 24B] = Int_Index_4; _29 = MEM[(int[25] *)_40 + 16B]; _30 = _29 + 1; MEM[(int[25] *)_40 + 16B] = _30; _32 = pretmp_20 + 1000; _33 = Arr_2_Par_Ref_22(D) + _32; _34 = *_8; - _51 = _33 + _41; // good + _16 = _41 + _32; // bad + _51 = Arr_2_Par_Ref_22(D) + _16; // bad MEM[(int[25] *)_51 + 20B] = _34; Int_Glob = 5; return; But that doesn't make sense - it means that -fdisable-tree-forwprop4 should get numbers back to good speed, no? Because that's the only change forwprop4 does. For completeness please base checks on r207316 (it contains a fix for the blamed revision, but as far as I can see it shouldn't make a difference for the testcase). Did you check whether my hackish patch fixes things?
[Bug tree-optimization/54742] Switch elimination in FSM loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742 --- Comment #35 from Joey Ye joey.ye at arm dot com --- Here is good expansion: ;; _41 = _42 * 4; (insn 20 19 0 (set (reg:SI 126 [ D.5038 ]) (ashift:SI (reg/v:SI 131 [ Int_1_Par_Val ]) (const_int 2 [0x2]))) -1 (nil)) ;; _40 = _2 + _41; (insn 21 20 22 (set (reg:SI 136 [ D.5035 ]) (plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ]) (reg:SI 119 [ D.5036 ]))) -1 (nil)) (insn 22 21 0 (set (reg/f:SI 125 [ D.5035 ]) (plus:SI (reg:SI 136 [ D.5035 ]) (reg:SI 126 [ D.5038 ]))) -1 (nil)) ;; MEM[(int[25] *)_51 + 20B] = _34; (insn 29 28 30 (set (reg:SI 139) (plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ]) (reg:SI 119 [ D.5036 ]))) Proc_8.c:23 -1 (nil)) (insn 30 29 31 (set (reg:SI 140) (plus:SI (reg:SI 139) (reg:SI 126 [ D.5038 ]))) Proc_8.c:23 -1 (nil)) (insn 31 30 32 (set (reg/f:SI 141) (plus:SI (reg:SI 140) (const_int 1000 [0x3e8]))) Proc_8.c:23 -1 (nil)) (insn 32 31 0 (set (mem:SI (plus:SI (reg/f:SI 141) (const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32]) (reg:SI 124 [ D.5039 ])) Proc_8.c:23 -1 (nil)) After cse1 140 can be replaced by 125, thus lead a series of transformation make it much more efficient. Here is bad expansion: ;; _40 = Arr_2_Par_Ref_22(D) + _12; (insn 22 21 23 (set (reg:SI 138 [ D.5038 ]) (plus:SI (reg:SI 128 [ D.5038 ]) (reg:SI 121 [ D.5036 ]))) -1 (nil)) (insn 23 22 0 (set (reg/f:SI 127 [ D.5035 ]) (plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ]) (reg:SI 138 [ D.5038 ]))) -1 (nil)) ;; _32 = _20 + 1000; (insn 29 28 0 (set (reg:SI 124 [ D.5038 ]) (plus:SI (reg:SI 121 [ D.5036 ]) (const_int 1000 [0x3e8]))) Proc_8.c:23 -1 (nil)) ;; MEM[(int[25] *)_51 + 20B] = _34; (insn 32 31 33 (set (reg:SI 141) (plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ]) (reg:SI 124 [ D.5038 ]))) Proc_8.c:23 -1 (nil)) (insn 33 32 34 (set (reg/f:SI 142) (plus:SI (reg:SI 141) (reg:SI 128 [ D.5038 ]))) Proc_8.c:23 -1 (nil)) (insn 34 33 0 (set (mem:SI (plus:SI (reg/f:SI 142) (const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32]) (reg:SI 126 [ D.5039 ])) Proc_8.c:23 -1 (nil)) Here cse doesn't happen, resulting in less optimal insns. Reason why cse doesn't happen is unclear yet.
[Bug c++/60216] [4.8/4.9 Regression] [c++11] Trouble with deleted template functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60216 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||jakub at gcc dot gnu.org, ||jason at gcc dot gnu.org Version|4.9.0 |4.8.3 Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Started with r198098 (or r198099, but that seems unrelated, r198096 works, r198100 fails).
[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 --- Comment #11 from algrant at acm dot org --- Where do you get that this is racy if the access to data is not atomic? By design, release/acquire and release/consume sequences don't require wholesale changes to the way the data payload (in the general case, multiple fields within a structure) is first constructed and then used. 1.10#13 makes clear that as a result of the intra-thread sequencing between atomic and non-atomic operations (1.9#14), and the inter-thread ordering between atomic operations (1.10 various), there is a resulting ordering on operations to ordinary (sic) objects. Please see the references to the C++ standard in the source example, for the chain of reasoning here.
[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234 janus at gcc dot gnu.org changed: What|Removed |Added Keywords||ice-on-valid-code Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||janus at gcc dot gnu.org Summary|OOP internal compiler |[4.9 Regression] [OOP] ICE |error: in |in |generate_finalization_wrapp |generate_finalization_wrapp |er |er at fortran/class.c:1883 Ever confirmed|0 |1 --- Comment #1 from janus at gcc dot gnu.org --- Reduced test case: module ObjectLists implicit none Type TObjectList contains FINAL :: finalize end Type Type, extends(TObjectList):: TRealCompareList end Type contains subroutine finalize(L) Type(TObjectList) :: L end subroutine integer function CompareReal(this) Class(TRealCompareList) :: this end function end module 4.8 rejects it cleanly ('not yet implemented'), so the ICE is a regression.
[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234 --- Comment #2 from janus at gcc dot gnu.org --- This patchlet seems to be sufficient to fix the ICE: Index: gcc/fortran/decl.c === --- gcc/fortran/decl.c(revision 207804) +++ gcc/fortran/decl.c(working copy) @@ -1199,7 +1199,7 @@ build_sym (const char *name, gfc_charlen *cl, bool sym-attr.implied_index = 0; if (sym-ts.type == BT_CLASS) -return gfc_build_class_symbol (sym-ts, sym-attr, sym-as, false); +return gfc_build_class_symbol (sym-ts, sym-attr, sym-as, true); return true; } Comment 1 compiles fine with this, but comment 0 hits another ICE: ObjectLists.f90:186:0: internal compiler error: Segmentation fault class is (object_array_pointer) ^ 0x93e90f crash_signal /home/jweil/gcc49/trunk/gcc/toplev.c:337 0x672420 gfc_get_derived_type(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2455 0x672988 gfc_typenode_for_spec(gfc_typespec*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:1112 0x671263 gfc_sym_type(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2137 0x671728 gfc_get_function_type(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2797 0x6721ca gfc_get_ppc_type(gfc_component*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2322 0x6726a7 gfc_get_derived_type(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2484 0x672988 gfc_typenode_for_spec(gfc_typespec*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:1112 0x671263 gfc_sym_type(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2137 0x637b96 gfc_get_symbol_decl(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-decl.c:1390 0x639f99 gfc_create_module_variable /home/jweil/gcc49/trunk/gcc/fortran/trans-decl.c:4267 0x607453 do_traverse_symtree /home/jweil/gcc49/trunk/gcc/fortran/symbol.c:3575 0x63ae12 gfc_generate_module_vars(gfc_namespace*) /home/jweil/gcc49/trunk/gcc/fortran/trans-decl.c:4693 0x61cef1 gfc_generate_module_code(gfc_namespace*) /home/jweil/gcc49/trunk/gcc/fortran/trans.c:1930 0x5db92b translate_all_program_units /home/jweil/gcc49/trunk/gcc/fortran/parse.c:4523 0x5db92b gfc_parse_file() /home/jweil/gcc49/trunk/gcc/fortran/parse.c:4733 0x618335 gfc_be_parse_file /home/jweil/gcc49/trunk/gcc/fortran/f95-lang.c:188
[Bug c++/60146] [4.8/4.9 Regression] ICE when compiling this code with -fopenmp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||jason at gcc dot gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- Started with r188939.
[Bug c++/60237] New: isnan fails with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237 Bug ID: 60237 Summary: isnan fails with -ffast-math Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: major Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: nathanael.schaeffer at gmail dot com With -ffast-math, isnan should return true if passed a NaN value. Otherwise, how is isnan different than (x!=x) ? isnan worked as expected with gcc 4.7, but does not with 4.8.1 and 4.8.2 How can I check if x is a NaN in a portable way (not presuming any compilation option) ?
[Bug c++/60237] isnan fails with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Well, -ffast-math implies -ffinite-math-only, so the compiler is assuming no NaNs or infinites are used as arguments/return values of any expression. So, if you have a program that produces NaNs anyway, you shouldn't be building it with -ffast-math, at least not with -ffinite-math-only.
[Bug c++/60215] [4.9 Regression] ICE with invalid bit-field size
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60215 Paolo Carlini paolo.carlini at oracle dot com changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com --- Evidently, in case of error recovery we can get here: 9672 case COMPONENT_REF: 9673 if (is_overloaded_fn (t)) 9674 { 9675 /* We can only get here in checking mode via 9676 build_non_dependent_expr, because any expression that 9677 calls or takes the address of the function will have 9678 pulled a FUNCTION_DECL out of the COMPONENT_REF. */ 9679 gcc_checking_assert (allow_non_constant); 9680 *non_constant_p = true; 9681 return t; 9682 } with allow_non_constant == false. Jason suggested the comment (and the assert ;) as part of the fix for 58647, thus I would like to hear from him... Shall we maybe || errorcount ? Seems safe for 4.9.0.
[Bug fortran/60238] New: Allow colon-separated triplet in array initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60238 Bug ID: 60238 Summary: Allow colon-separated triplet in array initialization Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: antony at cosmologist dot info Not really a bug, but ifort (and also going back, CVF) allow a clean array initialization sytnax like this integer :: indices(3) indices=[3:5] as an alternative to the ugly indices = (/ (I, I=3, 5) /) Supporting it would allow easier compiler interoperability.
[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234 --- Comment #3 from janus at gcc dot gnu.org --- (In reply to janus from comment #2) Comment 1 compiles fine with this, but comment 0 hits another ICE: ObjectLists.f90:186:0: internal compiler error: Segmentation fault class is (object_array_pointer) ^ 0x93e90f crash_signal /home/jweil/gcc49/trunk/gcc/toplev.c:337 0x672420 gfc_get_derived_type(gfc_symbol*) /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2455 A reduced test case for this ICE is: integer function Compare(R1) class(*) R1 end function But it seems to be due to the patch in comment 2 and does not occur without it.
[Bug tree-optimization/60183] [4.7/4.8 Regression] phiprop creates invalid code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60183 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Known to work||4.9.0 Summary|[4.7/4.8/4.9 Regression]|[4.7/4.8 Regression] |phiprop creates invalid |phiprop creates invalid |code|code --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- Fixed on trunk sofar.
[Bug c++/60237] isnan fails with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237 --- Comment #2 from N Schaeffer nathanael.schaeffer at gmail dot com --- Thank you for your answer. My program (which is a computational fluid dynamics solver) is not supposed to produce NaNs. However, when it does (which means something went wrong), I would like to abort the program and return an error instead of continuing crunching NaNs. I also want it to run as fast as possible (hence the -ffast-math option). I would argue that: if printf(%f,x) outputs NaN, isnan(x) should also be returning true. Do you have a suggestion concerning my last question: How can I check if x is NaN in a portable way (not presuming any compilation option) ?
[Bug c++/60237] isnan fails with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237 --- Comment #3 from Marc Glisse glisse at gcc dot gnu.org --- (In reply to N Schaeffer from comment #2) Do you have a suggestion concerning my last question: How can I check if x is NaN in a portable way (not presuming any compilation option) ? This should bypass software optimizations. But if the hardware is put in a mode that does strange things with NaN, it will be harder to work around. int my_isnan(double x){ volatile double y=x; return y!=y; }
[Bug fortran/60238] Allow colon-separated triplet in array initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60238 --- Comment #1 from Dominique d'Humieres dominiq at lps dot ens.fr --- as an alternative to the ugly indices = (/ (I, I=3, 5) /) You can use indices=[(I, I=3, 5)] if your coding style accepts f2003 syntax. Supporting it would allow easier compiler interoperability. The only way to achieve that is to stick to the Fortran standard, i.e, never use extensions of any kind.
[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234 janus at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |janus at gcc dot gnu.org --- Comment #4 from janus at gcc dot gnu.org --- The test case in comment 0 compiles cleanly when adding the following to the patch in comment 2: Index: gcc/fortran/class.c === --- gcc/fortran/class.c(revision 207804) +++ gcc/fortran/class.c(working copy) @@ -637,9 +637,10 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_a if (!gfc_add_component (fclass, _vptr, c)) return false; c-ts.type = BT_DERIVED; - if (delayed_vtab - || (ts-u.derived-f2k_derived - ts-u.derived-f2k_derived-finalizers)) + if ((delayed_vtab + || (ts-u.derived-f2k_derived +ts-u.derived-f2k_derived-finalizers)) + !ts-u.derived-attr.unlimited_polymorphic) c-ts.u.derived = NULL; else {
[Bug c++/60237] isnan fails with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237 --- Comment #4 from N Schaeffer nathanael.schaeffer at gmail dot com --- int my_isnan(double x){ volatile double y=x; return y!=y; } is translated to: 0x00406cf0 +0: movsd QWORD PTR [rsp-0x8],xmm0 0x00406cf6 +6: xoreax,eax 0x00406cf8 +8: movsd xmm1,QWORD PTR [rsp-0x8] 0x00406cfe +14:movsd xmm0,QWORD PTR [rsp-0x8] 0x00406d04 +20:comisd xmm1,xmm0 0x00406d08 +24:setne al 0x00406d0b +27:ret which also fails to detect NaN, which is right according to the documented behaviour of comisd: http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc44.htm
[Bug c++/60239] New: False positive maybe-uninitialized in for loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60239 Bug ID: 60239 Summary: False positive maybe-uninitialized in for loop Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: lcid-fire at gmx dot net The code in https://github.com/RobertBeckebans/RBDOOM-3-BFG/blob/dd9b8a8710dd7f8c1376eb245ee31fc740eae6eb/neo/renderer/tr_backend_rendertools.cpp triggers a false positive maybe-uninitialized warning. The code in question begins at line 1971: static void RB_DrawText( const char* text, const idVec3 origin, float scale, const idVec4 color, const idMat3 viewAxis, const int align ) { // snip/ idVec3 org, p1, p2; // snip/ for( i = 0; i len; i++ ) { if( i == 0 || text[i] == '\n' ) { org = origin - viewAxis[2] * ( line * 36.0f * scale ); // snip/ } org -= viewAxis[1] * ( spacing * scale ); } The error message is: idlib/../idlib/math/Vector.h: In function 'void RB_DrawText(const char*, const idVec3, float, const idVec4, const idMat3, int)': idlib/../idlib/math/Vector.h:567:10: error: 'org.idVec3::x' may be used uninitialized in this function [-Werror=maybe-uninitialized] /home/andreas/Projects/bfg/neo/renderer/tr_backend_rendertools.cpp:1971:9: note: 'org.idVec3::x' was declared here idVec3 org, p1, p2; I tried to create a simple version that triggers that false positive but everything I tried analyzes the code correctly.
[Bug other/60240] New: libbacktrace problems with nested functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60240 Bug ID: 60240 Summary: libbacktrace problems with nested functions Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: johannespfau at gmail dot com Created attachment 32152 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32152action=edit test case to reproduce the bug Compile the test case with -lbacktrace -g. Actual output: test.c:17 (null) Expected output: The backtrace should contain the function name ('a') instead of null. AFAICS the problem is in read_function_entry. There's a abbrev-has_children check that assumes all children of a function are inlined instances of the same function. This is not true, children can also be nested functions. libbacktrace should check the DW_AT_inline tag here.
[Bug libffi/60073] [4.9 regression] 64-bit libffi.call/cls_double_va.c FAILs after recent modification
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60073 --- Comment #11 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Author: ebotcazou Date: Mon Feb 17 12:00:04 2014 New Revision: 207822 URL: http://gcc.gnu.org/viewcvs?rev=207822root=gccview=rev Log: PR libffi/60073 * src/sparc/v8.S: Assemble only if !SPARC64. * src/sparc/v9.S: Remove obsolete comment. * src/sparc/ffitarget.h (enum ffi_abi): Add FFI_COMPAT_V9. (V8_ABI_P): New macro. (V9_ABI_P): Likewise. (FFI_EXTRA_CIF_FIELDS): Define only if SPARC64. * src/sparc/ffi.c (ffi_prep_args_v8): Compile only if !SPARC64. (ffi_prep_args_v9): Compile only if SPARC64. (ffi_prep_cif_machdep_core): Use V9_ABI_P predicate. (ffi_prep_cif_machdep): Guard access to nfixedargs field. (ffi_prep_cif_machdep_var): Likewise. (ffi_v9_layout_struct): Compile only if SPARC64. (ffi_call): Deal with FFI_V8PLUS and FFI_COMPAT_V9 and fix warnings. (ffi_prep_closure_loc): Use V9_ABI_P and V8_ABI_P predicates. (ffi_closure_sparc_inner_v8): Compile only if !SPARC64. (ffi_closure_sparc_inner_v9): Compile only if SPARC64. Guard access to nfixedargs field. Modified: trunk/libffi/ChangeLog trunk/libffi/src/sparc/ffi.c trunk/libffi/src/sparc/ffitarget.h trunk/libffi/src/sparc/v8.S trunk/libffi/src/sparc/v9.S
[Bug middle-end/25140] aliases, including weakref, break alias analysis
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25140 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added CC||johannespfau at gmail dot com --- Comment #12 from Richard Biener rguenth at gcc dot gnu.org --- *** Bug 60214 has been marked as a duplicate of this bug. ***
[Bug middle-end/60214] Variables with same DECL_ASSEMBLER_NAME are treated as different variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60214 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- Yes, that's a know deficiency in alias-analysis. *** This bug has been marked as a duplicate of bug 25140 ***
[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234 --- Comment #5 from janus at gcc dot gnu.org --- (In reply to janus from comment #4) The test case in comment 0 compiles cleanly when adding the following to the patch in comment 2: Unfortunately the combination fails on proc_ptr_comp_37 in the testsuite.
[Bug c++/60216] [4.8/4.9 Regression] [c++11] Trouble with deleted template functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60216 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P2
[Bug c++/60219] [4.8/4.9 Regression] [c++11] ICE invalid use of variadic template
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60219 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P2
[Bug fortran/60231] [4.8/4.9 Regression] ICE on undefined generic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60231 --- Comment #3 from janus at gcc dot gnu.org --- (In reply to janus from comment #2) This draft patch fixes the ICE: ... and regtests cleanly.
[Bug c/60220] Vectorization : simple loop : fails to vectorize
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60220 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- It's vectorized with -fno-tree-loop-distribute-patterns since at least GCC 4.7 (the oldest still maintained release).
[Bug middle-end/60221] [4.7/4.8/4.9 Regression] gcc -fexceptions generates unnecessary cleanup code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60221 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||EH Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-17 CC||hubicka at gcc dot gnu.org, ||matz at gcc dot gnu.org Target Milestone|--- |4.7.4 Summary|gcc -fexceptions generates |[4.7/4.8/4.9 Regression] |unnecessary cleanup code|gcc -fexceptions generates ||unnecessary cleanup code Ever confirmed|0 |1 --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- Confirmed.
[Bug c++/60227] [4.7/4.8/4.9 Regression] [C++11] ICE using brace-enclosed initializer list to initialize array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60227 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P2 Target Milestone|--- |4.7.4
[Bug c++/60224] [4.7/4.8/4.9 Regression] ICE using invalid initializer for array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60224 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P2 Target Milestone|--- |4.7.4
[Bug c++/60225] [4.9 Regression] [c++11] ICE initializing constexpr array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60225 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P1 Target Milestone|--- |4.9.0