date:20140217

Re: Vectorizer Pragmas

2014-02-17 Thread Renato Golin

On 16 February 2014 23:44, Tim Prince n...@aol.com wrote:
 I don't think many people want to use both OpenMP 4 and older Intel
 directives together.

I'm having less and less incentives to use anything other than omp4,
cilk and whatever. I think we should be able to map all our internal
needs to those pragmas.

On the other hand, if you guys have any cross discussion with Intel
folks about it, I'd love to hear. Since our support for those
directives are a bit behind, would be good not to duplicate the
efforts in the long run.

Thanks!
--renato

Re: TYPE_BINFO and canonical types at LTO

2014-02-17 Thread Richard Biener

On Mon, 17 Feb 2014, Jan Hubicka wrote:

  On Fri, 14 Feb 2014, Jan Hubicka wrote:
  
 This smells bad, since it is given a canonical type that is after the
 structural equivalency merging that ignores BINFOs, so it may be 
 completely
 different class with completely different bases than the original.  
 Bases are
 structuraly merged, too and may be exchanged for normal fields because
 DECL_ARTIFICIAL (that separate bases and fields) does not seem to be 
 part of
 the canonical type definition in LTO.

Can you elaborate on that DECL_ARTIFICIAL thing?  That is, what is 
broken
by considering all fields during that merging?
   
   To make the code work with LTO, one can not merge 
   struct B {struct A a}
   struct B: A {}
   
   these IMO differ only by DECL_ARTIFICIAL flag on the fields.
  
  The code == that BINFO walk?  Is that because we walk a completely
 
 Yes.
 
  unrelated BINFO chain?  I'd say we should have merged its types
  so that difference shouldn't matter.
  
  Hopefully ;)
 
 I am trying to make point that will matter.  Here is completed testcase above:
 
 struct A {int a;};
 struct C:A {};
 struct B {struct A a;};
 struct C *p2;
 struct B *p1;
 int
 t()
 {
   p1-a.a = 2;
   return p2-a;
 }
 
 With patch I get:
 
 Index: lto/lto.c
 ===
 --- lto/lto.c   (revision 20)
 +++ lto/lto.c   (working copy)
 @@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.
  #include data-streamer.h
  #include context.h
  #include pass_manager.h
 +#include print-tree.h
  
  
  /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver. 
  */
 @@ -619,6 +621,15 @@ gimple_canonical_type_eq (const void *p1
  {
const_tree t1 = (const_tree) p1;
const_tree t2 = (const_tree) p2;
 +  if (gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1),
 + CONST_CAST_TREE (t2))
 +   TREE_CODE (CONST_CAST_TREE (t1)) == RECORD_TYPE)
 + {
 +   debug_tree (CONST_CAST_TREE (t1));
 +   fprintf (stderr, bases:%i\n, BINFO_BASE_BINFOS (TYPE_BINFO 
 (t1))-length());
 +   debug_tree (CONST_CAST_TREE (t2));
 +   fprintf (stderr, bases:%i\n, BINFO_BASE_BINFOS (TYPE_BINFO 
 (t2))-length());
 + }
return gimple_canonical_types_compatible_p (CONST_CAST_TREE (t1),
   CONST_CAST_TREE (t2));
  }
 
  record_type 0x76c52888 B SI
 size integer_cst 0x76ae83a0 type integer_type 0x76ae5150 
 bitsizetype constant 32
 unit size integer_cst 0x76ae83c0 type integer_type 0x76ae50a8 
 sizetype constant 4
 align 32 symtab 0 alias set -1 canonical type 0x76c52888
 fields field_decl 0x76adec78 a
 type record_type 0x76c52738 A SI size integer_cst 
 0x76ae83a0 32 unit size integer_cst 0x76ae83c0 4
 align 32 symtab 0 alias set -1 canonical type 0x76c52738 
 fields field_decl 0x76adebe0 a context translation_unit_decl 
 0x76af2e60 D.2821
 chain type_decl 0x76af2f18 A
 nonlocal SI file t.C line 3 col 20 size integer_cst 0x76ae83a0 
 32 unit size integer_cst 0x76ae83c0 4
 align 32 offset_align 128
 offset integer_cst 0x76ae8060 constant 0
 bit offset integer_cst 0x76ae80e0 constant 0 context 
 record_type 0x76c52888 B
 chain type_decl 0x76c55170 B type record_type 0x76c52930 B
 nonlocal VOID file t.C line 3 col 10
 align 1 context record_type 0x76c52888 B result 
 record_type 0x76c52888 B context translation_unit_decl 
 0x76af2e60 D.2821
 pointer_to_this pointer_type 0x76c529d8 chain type_decl 
 0x76c550b8 B
 bases:0
  record_type 0x76c52b28 C SI
 size integer_cst 0x76ae83a0 type integer_type 0x76ae5150 
 bitsizetype constant 32
 unit size integer_cst 0x76ae83c0 type integer_type 0x76ae50a8 
 sizetype constant 4
 align 32 symtab 0 alias set -1 structural equality
 fields field_decl 0x76adeda8 D.2831
 type record_type 0x76c52738 A SI size integer_cst 
 0x76ae83a0 32 unit size integer_cst 0x76ae83c0 4
 align 32 symtab 0 alias set -1 canonical type 0x76c52738 
 fields field_decl 0x76adebe0 a context translation_unit_decl 
 0x76af2e60 D.2821
 chain type_decl 0x76af2f18 A
 ignored SI file t.C line 2 col 8 size integer_cst 0x76ae83a0 32 
 unit size integer_cst 0x76ae83c0 4
 align 32 offset_align 128
 offset integer_cst 0x76ae8060 constant 0
 bit offset integer_cst 0x76ae80e0 constant 0 context 
 record_type 0x76c52a80 C
 chain type_decl 0x76c552e0 C type record_type 0x76c52b28 C
 nonlocal VOID file t.C line 2 col 12
 align 1 context record_type 0x76c52a80 C result 
 record_type 0x76c52a80 C

Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-17 Thread Dominik Vogt

On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote:
 On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote:
  Given a specific VAR_DECL tree node, I need to find out whether
  its type is built in or not.  Up to now I have
 
tree tn = TYPE_NAME (TREE_TYPE (var_decl));
if (tn != NULL_TREE  TREE_CODE (tn) == TYPE_DECL  DECL_NAME (tn))
  {
...
  }
 
  This if-condition is true for both,
 
int x;
const int x;
...
 
  and
 
typedef int i_t;
i_t x;
const i_t x;
...
 
  I need to weed out the class of VAR_DECLs that directly use built
  in types.
 
 Try DECL_IS_BUILTIN.  But I question how you define builtin here?

Well, actually I'm working on the variable output function in
godump.c.  At the moment, if the code comes across

  typedef char c_t
  chat c1;
  c_t c2;

it emits

  type _c_t byte
  var c1 byte
  var c2 byte

This is fine for c1, but for c2 it should really use the type:

  var c2 _c_t

So the rule I'm trying to implement is:

  Given a Tree node that is a VAR_DECL, if its type is an alias
  (defined with typedef/union/struct/class etc.), use the name of
  the alias, otherwise resolve the type recursively until only
  types built into the language are left.

It's really only about the underlying data types (int, float,
_Complex etc.), not about storage classes, pointers, attributes,
qualifiers etc.

Well, since godump.c already caches all declarations it has come
across, I could assume that these declarations are not built-in
and use that in the rule above.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-17 Thread Richard Biener

On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt v...@linux.vnet.ibm.com wrote:
 On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote:
 On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt v...@linux.vnet.ibm.com 
 wrote:
  Given a specific VAR_DECL tree node, I need to find out whether
  its type is built in or not.  Up to now I have
 
tree tn = TYPE_NAME (TREE_TYPE (var_decl));
if (tn != NULL_TREE  TREE_CODE (tn) == TYPE_DECL  DECL_NAME (tn))
  {
...
  }
 
  This if-condition is true for both,
 
int x;
const int x;
...
 
  and
 
typedef int i_t;
i_t x;
const i_t x;
...
 
  I need to weed out the class of VAR_DECLs that directly use built
  in types.

 Try DECL_IS_BUILTIN.  But I question how you define builtin here?

 Well, actually I'm working on the variable output function in
 godump.c.  At the moment, if the code comes across

   typedef char c_t
   chat c1;
   c_t c2;

 it emits

   type _c_t byte
   var c1 byte
   var c2 byte

 This is fine for c1, but for c2 it should really use the type:

   var c2 _c_t

 So the rule I'm trying to implement is:

   Given a Tree node that is a VAR_DECL, if its type is an alias
   (defined with typedef/union/struct/class etc.), use the name of
   the alias, otherwise resolve the type recursively until only
   types built into the language are left.

 It's really only about the underlying data types (int, float,
 _Complex etc.), not about storage classes, pointers, attributes,
 qualifiers etc.

 Well, since godump.c already caches all declarations it has come
 across, I could assume that these declarations are not built-in
 and use that in the rule above.

Not sure what GO presents us as location info, but DECL_IS_BUILTIN
looks if the line the type was declared is sth impossible (reserved
and supposed to be used for all types that do not have to be declared).

Richard.

 Ciao

 Dominik ^_^  ^_^

 --

 Dominik Vogt
 IBM Germany

Re: Vectorizer Pragmas

2014-02-17 Thread Tim Prince



On 2/17/2014 4:42 AM, Renato Golin wrote:

On 16 February 2014 23:44, Tim Prince n...@aol.com wrote:

I don't think many people want to use both OpenMP 4 and older Intel
directives together.

I'm having less and less incentives to use anything other than omp4,
cilk and whatever. I think we should be able to map all our internal
needs to those pragmas.

On the other hand, if you guys have any cross discussion with Intel
folks about it, I'd love to hear. Since our support for those
directives are a bit behind, would be good not to duplicate the
efforts in the long run.


I'm continuing discussions with former Intel colleagues.  If you are 
asking for insight into how Intel priorities vary over time, I don't 
expect much, unless the next beta compiler provides some inferences.  
They have talked about implementing all of OpenMP 4.0 except user 
defined reduction this year.  That would imply more activity in that 
area than on cilkplus, although some fixes have come in the latter.  On 
the other hand I had an issue on omp simd reduction(max: ) closed with 
the decision will not be fixed.
I have an icc problem report in on fixing omp simd safelen so it is more 
like the standard and less like the obsolete pragma simd vectorlength.  
Also, I have some problem reports active attempting to get clarification 
of their omp target implementation.


You may have noticed that omp parallel for simd in current Intel 
compilers can be used for combined thread and simd parallelism, 
including the case where the outer loop is parallelizable and 
vectorizable but the inner one is not.


--
Tim Prince

Re: Vectorizer Pragmas

2014-02-17 Thread Renato Golin

On 17 February 2014 14:47, Tim Prince n...@aol.com wrote:
 I'm continuing discussions with former Intel colleagues.  If you are asking
 for insight into how Intel priorities vary over time, I don't expect much,
 unless the next beta compiler provides some inferences.  They have talked
 about implementing all of OpenMP 4.0 except user defined reduction this
 year.  That would imply more activity in that area than on cilkplus,

I'm expecting this. Any proposal to support Cilk in LLVM would be
purely temporary and not endorsed in any way.


 although some fixes have come in the latter.  On the other hand I had an
 issue on omp simd reduction(max: ) closed with the decision will not be
 fixed.

We still haven't got pragmas for induction/reduction logic, so I'm not
too worried about them.


 I have an icc problem report in on fixing omp simd safelen so it is more
 like the standard and less like the obsolete pragma simd vectorlength.

Our width metadata is slightly different in that it means try to use
that length, rather than it's safe to use that length, this is why
I'm holding on use safelen for the moment.


 Also, I have some problem reports active attempting to get clarification of
 their omp target implementation.

Same here... RTFM is not enough in this case. ;)


 You may have noticed that omp parallel for simd in current Intel compilers
 can be used for combined thread and simd parallelism, including the case
 where the outer loop is parallelizable and vectorizable but the inner one is
 not.

That's my fear of going with omp simd directly. I don't want to be
throwing threads all over the place when all I really want is vector
code.

For the time, my proposal is to use legacy pragmas: vector/novector,
unroll/nounroll and simd vectorlength which map nicely to the metadata
we already have and don't incur in OpenMP overhead. Later on, if
OpenMP ends up with simple non-threaded pragmas, we should use those
and deprecate the legacy ones.

If GCC is trying to do the same thing regarding non-threaded-vector
code, I'd be glad to be involved in the discussion. Some LLVM folks
think this should be an OpenMP discussion, I personally think it's
pushing the boundaries a bit too much on an inherently threaded
library extension.

cheers,
--renato

Re: [RFC] Offloading Support in libgomp

2014-02-17 Thread Ilya Verbin

On 14 Feb 16:43, Jakub Jelinek wrote:
 So, perhaps we should just stop for now oring the copyfrom in and just use
 the copyfrom from the very first mapping only, and wait for what the committee
 actually agrees on.
 
   Jakub

Like this?

@@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, 
splay_tree_key newn,
[%p..%p) is already mapped,
(void *) newn-host_start, (void *) newn-host_end,
(void *) oldn-host_start, (void *) oldn-host_end);
+#if 0
+  /* FIXME: Remove this when OpenMP 4.0 will be standardized.  Currently it's
+ unclear regarding overwriting copy_from for the existing mapping.
+ See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details.  */
   if (((kind  7) == 2 || (kind  7) == 3)
!oldn-copy_from
oldn-host_start == newn-host_start
oldn-host_end == newn-host_end)
 oldn-copy_from = true;
+#endif
   oldn-refcount++;
 }

  -- Ilya

Re: [RFC] Offloading Support in libgomp

2014-02-17 Thread Jakub Jelinek

On Mon, Feb 17, 2014 at 07:59:16PM +0400, Ilya Verbin wrote:
 On 14 Feb 16:43, Jakub Jelinek wrote:
  So, perhaps we should just stop for now oring the copyfrom in and just use
  the copyfrom from the very first mapping only, and wait for what the 
  committee
  actually agrees on.
  
  Jakub
 
 Like this?
 
 @@ -171,11 +171,16 @@ gomp_map_vars_existing (splay_tree_key oldn, 
 splay_tree_key newn,
   [%p..%p) is already mapped,
   (void *) newn-host_start, (void *) newn-host_end,
   (void *) oldn-host_start, (void *) oldn-host_end);
 +#if 0
 +  /* FIXME: Remove this when OpenMP 4.0 will be standardized.  Currently it's
 + unclear regarding overwriting copy_from for the existing mapping.
 + See http://gcc.gnu.org/ml/gcc/2014-02/msg00208.html for details.  */
if (((kind  7) == 2 || (kind  7) == 3)
 !oldn-copy_from
 oldn-host_start == newn-host_start
 oldn-host_end == newn-host_end)
  oldn-copy_from = true;
 +#endif
oldn-refcount++;
  }

Well, OpenMP 4.0 is a released standard, just in some cases ambiguous or
buggy.  I'd just remove the code rather than putting it into #if 0, patch
preapproved.  It will stay in the SVN history...

Jakub

Re: Need help: Is a VAR_DECL type builtin or not?

2014-02-17 Thread Ian Lance Taylor

On Mon, Feb 17, 2014 at 5:28 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Feb 17, 2014 at 1:15 PM, Dominik Vogt v...@linux.vnet.ibm.com wrote:
 On Fri, Feb 14, 2014 at 02:40:44PM +0100, Richard Biener wrote:
 On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt v...@linux.vnet.ibm.com 
 wrote:
  Given a specific VAR_DECL tree node, I need to find out whether
  its type is built in or not.  Up to now I have
 
tree tn = TYPE_NAME (TREE_TYPE (var_decl));
if (tn != NULL_TREE  TREE_CODE (tn) == TYPE_DECL  DECL_NAME (tn))
  {
...
  }
 
  This if-condition is true for both,
 
int x;
const int x;
...
 
  and
 
typedef int i_t;
i_t x;
const i_t x;
...
 
  I need to weed out the class of VAR_DECLs that directly use built
  in types.

 Try DECL_IS_BUILTIN.  But I question how you define builtin here?

 Well, actually I'm working on the variable output function in
 godump.c.  At the moment, if the code comes across

   typedef char c_t
   chat c1;
   c_t c2;

 it emits

   type _c_t byte
   var c1 byte
   var c2 byte

 This is fine for c1, but for c2 it should really use the type:

   var c2 _c_t

 So the rule I'm trying to implement is:

   Given a Tree node that is a VAR_DECL, if its type is an alias
   (defined with typedef/union/struct/class etc.), use the name of
   the alias, otherwise resolve the type recursively until only
   types built into the language are left.

 It's really only about the underlying data types (int, float,
 _Complex etc.), not about storage classes, pointers, attributes,
 qualifiers etc.

 Well, since godump.c already caches all declarations it has come
 across, I could assume that these declarations are not built-in
 and use that in the rule above.

 Not sure what GO presents us as location info, but DECL_IS_BUILTIN
 looks if the line the type was declared is sth impossible (reserved
 and supposed to be used for all types that do not have to be declared).

godump.c is actually not used by the Go frontend.  The purpose of
godump.c is to read C header files and dump them in a Go
representation.  It's used when building the Go library, to get Go
versions of system structures like struct stat.

I'm not quite sure what Dominik is after.  For system structures using
the basic type, the underlying type of a typedef, is normally what you
want.  But to answer the question as stated, I think I would look at
functions like is_naming_typedef_decl in dwarf2out.c, since this
sounds like the kind of question that debug info needs to sort out.

Ian

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
 On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
  You need volatile semantics to force the compiler to ignore any proofs
  it might otherwise attempt to construct.  Hence all the ACCESS_ONCE()
  calls in my email to Torvald.  (Hopefully I translated your example
  reasonably.)
 
 My brain gave out for today; but it did appear to have the right
 structure.

I can relate.  ;-)

 I would prefer it C11 would not require the volatile casts. It should
 simply _never_ speculate with atomic writes, volatile or not.

I agree with not needing volatiles to prevent speculated writes.  However,
they will sometimes be needed to prevent excessive load/store combining.
The compiler doesn't have the runtime feedback mechanisms that the
hardware has, and thus will need help from the developer from time
to time.

Or maybe the Linux kernel simply waits to transition to C11 relaxed atomics
until the compiler has learned to be sufficiently conservative in its
load-store combining decisions.

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Joseph S. Myers

On Sat, 15 Feb 2014, Torvald Riegel wrote:

 glibc is a counterexample that comes to mind, although it's a smaller
 code base.  (It's currently not using C11 atomics, but transitioning
 there makes sense, and some thing I want to get to eventually.)

glibc is using C11 atomics (GCC builtins rather than _Atomic / 
stdatomic.h, but using __atomic_* with explicitly specified memory model 
rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
and MIPS.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Will Deacon

On Mon, Feb 17, 2014 at 06:59:31PM +, Joseph S. Myers wrote:
 On Sat, 15 Feb 2014, Torvald Riegel wrote:
 
  glibc is a counterexample that comes to mind, although it's a smaller
  code base.  (It's currently not using C11 atomics, but transitioning
  there makes sense, and some thing I want to get to eventually.)
 
 glibc is using C11 atomics (GCC builtins rather than _Atomic / 
 stdatomic.h, but using __atomic_* with explicitly specified memory model 
 rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
 and MIPS.

Hmm, actually that results in a change in behaviour for the __sync_*
primitives on AArch64. The documentation for those states that:

  `In most cases, these built-in functions are considered a full barrier. That
  is, no memory operand is moved across the operation, either forward or
  backward. Further, instructions are issued as necessary to prevent the
  processor from speculating loads across the operation and from queuing stores
  after the operation.'

which is stronger than simply mapping them to memory_model_seq_cst, which
seems to be what the AArch64 compiler is doing (so you get acquire + release
instead of a full fence).

Will

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote:
 On Sat, 15 Feb 2014, Torvald Riegel wrote:
 
  glibc is a counterexample that comes to mind, although it's a smaller
  code base.  (It's currently not using C11 atomics, but transitioning
  there makes sense, and some thing I want to get to eventually.)
 
 glibc is using C11 atomics (GCC builtins rather than _Atomic / 
 stdatomic.h, but using __atomic_* with explicitly specified memory model 
 rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
 and MIPS.

I think the major steps remaining is moving the other architectures
over, and rechecking concurrent code (e.g., for the code that I have
seen, it was either asm variants (eg, on x86), or built before C11; ARM
pthread_once was lacking memory_barriers (see pthread_once unification
patches I posted)).  We also need/should to move towards using
relaxed-MO atomic loads instead of plain loads.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote:
 On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel trie...@redhat.com wrote:
 
  I think a major benefit of C11's memory model is that it gives a
  *precise* specification for how a compiler is allowed to optimize.
 
 Clearly it does *not*. This whole discussion is proof of that. It's
 not at all clear,

It might not be an easy-to-understand specification, but as far as I'm
aware it is precise.  The Cambridge group's formalization certainly is
precise.  From that, one can derive (together with the usual rules for
as-if etc.) what a compiler is allowed to do (assuming that the standard
is indeed precise).  My replies in this discussion have been based on
reasoning about the standard, and not secret knowledge (with the
exception of no-out-of-thin-air, which is required in the standard's
prose but not yet formalized).

I agree that I'm using the formalization as a kind of placeholder for
the standard's prose (which isn't all that easy to follow for me
either), but I guess there's no way around an ISO standard using prose.

If you see a case in which the standard isn't precise, please bring it
up or open a C++ CWG issue for it.

 and the standard apparently is at least debatably
 allowing things that shouldn't be allowed.

Which example do you have in mind here?  Haven't we resolved all the
debated examples, or did I miss any?

 It's also a whole lot more
 complicated than volatile, so the likelihood of a compiler writer
 actually getting it right - even if the standard does - is lower.

It's not easy, that's for sure, but none of the high-performance
alternatives are easy either.  There are testing tools out there based
on the formalization of the model, and we've found bugs with them.

And the alternative of using something not specified by the standard is
even worse, I think, because then you have to guess what a compiler
might do, without having any constraints; IOW, one is resorting to no
sane compiler would do that, and that doesn't seem to very robust
either.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel trie...@redhat.com wrote:

 Which example do you have in mind here?  Haven't we resolved all the
 debated examples, or did I miss any?

Well, Paul seems to still think that the standard possibly allows
speculative writes or possibly value speculation in ways that break
the hardware-guaranteed orderings.

And personally, I can't read standards paperwork. It is invariably
written in some basically impossible-to-understand lawyeristic mode,
and then it is read by people (compiler writers) that intentionally
try to mis-use the words and do language-lawyering (that depends on
what the meaning of 'is' is). The whole lvalue vs rvalue expression
vs 'what is a volatile access' thing for C++ was/is a great example
of that.

So quite frankly, as a result I refuse to have anything to do with the
process directly.

 Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote:
 On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote:
  On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel trie...@redhat.com wrote:
  
   I think a major benefit of C11's memory model is that it gives a
   *precise* specification for how a compiler is allowed to optimize.
  
  Clearly it does *not*. This whole discussion is proof of that. It's
  not at all clear,
 
 It might not be an easy-to-understand specification, but as far as I'm
 aware it is precise.  The Cambridge group's formalization certainly is
 precise.  From that, one can derive (together with the usual rules for
 as-if etc.) what a compiler is allowed to do (assuming that the standard
 is indeed precise).  My replies in this discussion have been based on
 reasoning about the standard, and not secret knowledge (with the
 exception of no-out-of-thin-air, which is required in the standard's
 prose but not yet formalized).
 
 I agree that I'm using the formalization as a kind of placeholder for
 the standard's prose (which isn't all that easy to follow for me
 either), but I guess there's no way around an ISO standard using prose.
 
 If you see a case in which the standard isn't precise, please bring it
 up or open a C++ CWG issue for it.

I suggest that I go through the Linux kernel's requirements for atomics
and memory barriers and see how they map to C11 atomics.  With that done,
we would have very specific examples to go over.  Without that done, the
discussion won't converge very well.

Seem reasonable?

Thanx, Paul

  and the standard apparently is at least debatably
  allowing things that shouldn't be allowed.
 
 Which example do you have in mind here?  Haven't we resolved all the
 debated examples, or did I miss any?
 
  It's also a whole lot more
  complicated than volatile, so the likelihood of a compiler writer
  actually getting it right - even if the standard does - is lower.
 
 It's not easy, that's for sure, but none of the high-performance
 alternatives are easy either.  There are testing tools out there based
 on the formalization of the model, and we've found bugs with them.
 
 And the alternative of using something not specified by the standard is
 even worse, I think, because then you have to guess what a compiler
 might do, without having any constraints; IOW, one is resorting to no
 sane compiler would do that, and that doesn't seem to very robust
 either.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Richard Biener

On February 17, 2014 7:18:15 PM GMT+01:00, Paul E. McKenney 
paul...@linux.vnet.ibm.com wrote:
On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
 On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
  You need volatile semantics to force the compiler to ignore any
proofs
  it might otherwise attempt to construct.  Hence all the
ACCESS_ONCE()
  calls in my email to Torvald.  (Hopefully I translated your example
  reasonably.)
 
 My brain gave out for today; but it did appear to have the right
 structure.

I can relate.  ;-)

 I would prefer it C11 would not require the volatile casts. It should
 simply _never_ speculate with atomic writes, volatile or not.

I agree with not needing volatiles to prevent speculated writes. 
However,
they will sometimes be needed to prevent excessive load/store
combining.
The compiler doesn't have the runtime feedback mechanisms that the
hardware has, and thus will need help from the developer from time
to time.

Or maybe the Linux kernel simply waits to transition to C11 relaxed
atomics
until the compiler has learned to be sufficiently conservative in its
load-store combining decisions.

Sounds backwards. Currently the compiler does nothing to the atomics. I'm sure 
we'll eventually add something. But if testing coverage is zero outside then 
surely things get worse, not better with time.

Richard.

   Thanx, Paul

FreeBSD users of gcc

2014-02-17 Thread Loren James Rittle

Greetings,

I am the named maintainer of the freebsd port.  I have been for
approximately twelve years; although I haven't been very active the
last four years.

The last major work I put into the freebsd port was at the end of
2009.  I have reviewed others' patches since then; but it really
hasn't required anything major since David O'Brien and I did
foundational work in the early 200Xs (which itself was based on many
others' foundations).  Gerald Pfeifer has also done much to keep the
port in a good shape.  (I also don't want to ignore the many patches
that came from members of the FreeBSD core team and other FreeBSD
users.)

To complicate matters, I haven't been using FreeBSD on my primary
desktop or otherwise since early 2011.

FreeBSD is listed as a tier one platform.  Therefore, I am looking for
someone that both the GCC steering committee and I would be willing to
hand over the reigns before I drop my officially-listed
maintainership.

The expected person will likely already have Write After Approval status.

Please contact me directly, if you are qualified and interested in
becoming the freebsd OS port maintainer.

Regards,
Loren

Re: TYPE_BINFO and canonical types at LTO

2014-02-17 Thread Jan Hubicka

 
 Yeah, ok.  But we treat those types (B and C) TBAA equivalent because
 structurally they are the same ;)  Luckily C has a proper field
 for its base (proper means that offset and size are correct as well
 as the type).  It indeed has DECL_ARTIFICIAL set and yes, we treat
 those as real fields when doing the structural comparison.

Yep, the difference is that depending if C or D win, we will end up walking the
BINFO or not.  So we should not depend on the BINFo walk for correctness.
 
 More interesting is of course when we can re-use tail-padding in
 one but not the other (works as expected - not merged).

Yep.
 
 struct A { A (); short x; bool a;};
 struct C:A { bool b; };
 struct B {struct A a; bool b;};
 struct C *p2;
 struct B *p1;
 int
 t()
 {
   p1-a.a = 2;
   return p2-a;
 }
 
  Yes, zero sized classes are those having no fields (but other stuff, 
  type decls, bases etc.)
 
 Yeah, but TBAA obviously doesn't care about type decls and bases.

So I guess the conclussion is that the BINFO walk in alias.c is pointless?

Concerning the merging details and LTO aliasing, I think for 4.10 we should
make C++ to compute mangled names of types (i.e. call DECL_ASSEMBLER_NAME on
the associated type_decl + explicitly mark that type is driven by ODR) and then
we can do merging driven by ODR rule.

Non-ODR types born from other frontends will then need to be made to alias all
the ODR variants that can be done by storing them into the current canonical 
type hash.
(I wonder if we want to support cross language aliasing for non-POD?)

I also think we want explicit representation of types known to be local to 
compilation
unit - anonymous namespaces in C/C++, types defined within function bodies in C 
and
god knows what in Ada/Fortran/Java.

Honza
 
 Richard.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 12:23 -0800, Paul E. McKenney wrote:
 On Mon, Feb 17, 2014 at 08:55:47PM +0100, Torvald Riegel wrote:
  On Sat, 2014-02-15 at 10:49 -0800, Linus Torvalds wrote:
   On Sat, Feb 15, 2014 at 9:45 AM, Torvald Riegel trie...@redhat.com 
   wrote:
   
I think a major benefit of C11's memory model is that it gives a
*precise* specification for how a compiler is allowed to optimize.
   
   Clearly it does *not*. This whole discussion is proof of that. It's
   not at all clear,
  
  It might not be an easy-to-understand specification, but as far as I'm
  aware it is precise.  The Cambridge group's formalization certainly is
  precise.  From that, one can derive (together with the usual rules for
  as-if etc.) what a compiler is allowed to do (assuming that the standard
  is indeed precise).  My replies in this discussion have been based on
  reasoning about the standard, and not secret knowledge (with the
  exception of no-out-of-thin-air, which is required in the standard's
  prose but not yet formalized).
  
  I agree that I'm using the formalization as a kind of placeholder for
  the standard's prose (which isn't all that easy to follow for me
  either), but I guess there's no way around an ISO standard using prose.
  
  If you see a case in which the standard isn't precise, please bring it
  up or open a C++ CWG issue for it.
 
 I suggest that I go through the Linux kernel's requirements for atomics
 and memory barriers and see how they map to C11 atomics.  With that done,
 we would have very specific examples to go over.  Without that done, the
 discussion won't converge very well.
 
 Seem reasonable?

Sounds good!

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel trie...@redhat.com wrote:
 
  Which example do you have in mind here?  Haven't we resolved all the
  debated examples, or did I miss any?
 
 Well, Paul seems to still think that the standard possibly allows
 speculative writes or possibly value speculation in ways that break
 the hardware-guaranteed orderings.

That's true, I just didn't see any specific examples so far.

 And personally, I can't read standards paperwork. It is invariably
 written in some basically impossible-to-understand lawyeristic mode,

Yeah, it's not the most intuitive form for things like the memory model.

 and then it is read by people (compiler writers) that intentionally
 try to mis-use the words and do language-lawyering (that depends on
 what the meaning of 'is' is).

That assumption about people working on compilers is a little too broad,
don't you think?

I think that it is important to stick to a specification, in the same
way that one wouldn't expect a program with undefined behavior make any
sense of it, magically, in cases where stuff is undefined.

However, that of course doesn't include trying to exploit weasel-wording
(BTW, both users and compiler writers try to do it).  IMHO,
weasel-wording in a standard is a problem in itself even if not
exploited, and often it indicates that there is a real issue.  There
might be reasons to have weasel-wording (e.g., because there's no known
better way to express it like in case of the not really precise
no-out-of-thin-air rule today), but nonetheless those aren't ideal.

 The whole lvalue vs rvalue expression
 vs 'what is a volatile access' thing for C++ was/is a great example
 of that.

I'm not aware of the details of this.

 So quite frankly, as a result I refuse to have anything to do with the
 process directly.

That's unfortunate.  Then please work with somebody that isn't
uncomfortable with participating directly in the process.  But be
warned, it may very well be a person working on compilers :)

Have you looked at the formalization of the model by Batty et al.?  The
overview of this is prose, but the formalized model itself is all formal
relations and logic.  So there should be no language-lawyering issues
with that form.  (For me, the formalized model is much easier to reason
about.)

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel trie...@redhat.com wrote:
 On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote:
 and then it is read by people (compiler writers) that intentionally
 try to mis-use the words and do language-lawyering (that depends on
 what the meaning of 'is' is).

 That assumption about people working on compilers is a little too broad,
 don't you think?

Let's just say that *some* are that way, and those are the ones that I
end up butting heads with.

The sane ones I never have to argue with - point them at a bug, and
they just say yup, bug. The insane ones say we don't need to fix
that, because if you read this copy of the standards that have been
translated to chinese and back, it clearly says that this is
acceptable.

 The whole lvalue vs rvalue expression
 vs 'what is a volatile access' thing for C++ was/is a great example
 of that.

 I'm not aware of the details of this.

The argument was that an lvalue doesn't actually access the memory
(an rvalue does), so this:

   volatile int *p = ...;

   *p;

doesn't need to generate a load from memory, because *p is still an
lvalue (since you could assign things to it).

This isn't an issue in C, because in C, expression statements are
always rvalues, but C++ changed that. The people involved with the C++
standards have generally been totally clueless about their subtle
changes.

I may have misstated something, but basically some C++ people tried
very hard to make volatile useless.

We had other issues too. Like C compiler people who felt that the
type-based aliasing should always override anything else, even if the
variable accessed (through different types) was statically clearly
aliasing and used the exact same pointer. That made it impossible to
do a syntactically clean model of this aliases, since the _only_
exception to the type-based aliasing rule was to generate a union for
every possible access pairing.

We turned off type-based aliasing (as I've mentioned before, I think
it's a fundamentally broken feature to begin with, and a horrible
horrible hack that adds no value for anybody but the HPC people).

Gcc eventually ended up having some sane syntax for overriding it, but
by then I was too disgusted with the people involved to even care.

   Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
 On Sat, Feb 15, 2014 at 9:30 AM, Torvald Riegel trie...@redhat.com wrote:
 
  I think the example is easy to misunderstand, because the context isn't
  clear.  Therefore, let me first try to clarify the background.
 
  (1) The abstract machine does not write speculatively.
  (2) Emitting a branch instruction and executing a branch at runtime is
  not part of the specified behavior of the abstract machine.  Of course,
  the abstract machine performs conditional execution, but that just
  specifies the output / side effects that it must produce (e.g., volatile
  stores) -- not with which hardware instructions it is producing this.
  (3) A compiled program must produce the same output as if executed by
  the abstract machine.
 
 Ok, I'm fine with that.
 
  Thus, we need to be careful what speculative store is meant to refer
  to.  A few examples:
 
  if (atomic_load(x, mo_relaxed) == 1)
atomic_store(y, 3, mo_relaxed));
 
 No, please don't use this idiotic example. It is wrong.

It won't be useful in practice in a lot of cases, but that doesn't mean
it's wrong.  It's clearly not illegal code.  It also serves a purpose: a
simple example to reason about a few aspects of the memory model.

 The fact is, if a compiler generates anything but the obvious sequence
 (read/cmp/branch/store - where branch/store might obviously be done
 with some other machine conditional like a predicate), the compiler is
 wrong.

Why?  I've reasoned why (1) to (3) above allow in certain cases (i.e.,
the first load always returning 1) for the branch (or other machine
conditional) to not be emitted.  So please either poke holes into this
reasoning, or clarify that you don't in fact, contrary to what you wrote
above, agree with (1) to (3).

 Anybody who argues anything else is wrong, or confused, or confusing.

I appreciate your opinion, and maybe I'm just one of the three things
above (my vote is on confusing).  But without you saying why doesn't
help me see what's the misunderstanding here.

 Instead, argue about *other* sequences where the compiler can do something.

I'd prefer if we could clarify the misunderstanding for the simple case
first that doesn't involve stronger ordering requirements in the form of
non-relaxed MOs.

 For example, this sequence:
 
atomic_store(x, a, mo_relaxed);
b = atomic_load(x, mo_relaxed);
 
 can validly be transformed to
 
atomic_store(x, a, mo_relaxed);
b = (typeof(x)) a;
 
 and I think everybody agrees about that. In fact, that optimization
 can be done even for mo_strict.

Yes.

 But even that obvious optimization has subtle cases. What if the
 store is relaxed, but the load is strict? You can't do the
 optimization without a lot of though, because dropping the strict load
 would drop an ordering point. So even the store followed by exact
 same load case has subtle issues.

Yes if a compiler wants to optimize that, it has to give it more
thought.  My gut feeling is that either the store should get the
stronger ordering, or the accesses should be merged.  But I'd have to
think more about that one (which I can do on request).

 With similar caveats, it is perfectly valid to merge two consecutive
 loads, and to merge two consecutive stores.
 
 Now that means that the sequence
 
 atomic_store(x, 1, mo_relaxed);
 if (atomic_load(x, mo_relaxed) == 1)
 atomic_store(y, 3, mo_relaxed);
 
 can first be optimized to
 
 atomic_store(x, 1, mo_relaxed);
 if (1 == 1)
 atomic_store(y, 3, mo_relaxed);
 
 and then you get the end result that you wanted in the first place
 (including the ability to re-order the two stores due to the relaxed
 ordering, assuming they can be proven to not alias - and please don't
 use the idiotic type-based aliasing rules).
 
 Bringing up your first example is pure and utter confusion.

Sorry if it was confusing.  But then maybe we need to talk about it
more, because it shouldn't be confusing if we agree on what the memory
model allows and what not.  I had originally picked the example because
it was related to the example Paul/Peter brought up.

 Don't do
 it. Instead, show what are obvious and valid transformations, and then
 you can bring up these kinds of combinations as look, this is
 obviously also correct.

I have my doubts whether the best way to reason about the memory model
is by thinking about specific compiler transformations.  YMMV,
obviously.

The -- kind of vague -- reason is that the allowed transformations will
be more complicated to reason about than the allowed output of a
concurrent program when understanding the memory model (ie, ordering and
interleaving of memory accesses, etc.).  However, I can see that when
trying to optimize with a hardware memory model in mind, this might look
appealing.

What the compiler will do is exploiting knowledge about all possible
executions.  For example, if it knows that x is always 1, it will do the
transform.  The user would

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote:
 On February 17, 2014 7:18:15 PM GMT+01:00, Paul E. McKenney 
 paul...@linux.vnet.ibm.com wrote:
 On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
  On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
   You need volatile semantics to force the compiler to ignore any
 proofs
   it might otherwise attempt to construct.  Hence all the
 ACCESS_ONCE()
   calls in my email to Torvald.  (Hopefully I translated your example
   reasonably.)
  
  My brain gave out for today; but it did appear to have the right
  structure.
 
 I can relate.  ;-)
 
  I would prefer it C11 would not require the volatile casts. It should
  simply _never_ speculate with atomic writes, volatile or not.
 
 I agree with not needing volatiles to prevent speculated writes. 
 However,
 they will sometimes be needed to prevent excessive load/store
 combining.
 The compiler doesn't have the runtime feedback mechanisms that the
 hardware has, and thus will need help from the developer from time
 to time.
 
 Or maybe the Linux kernel simply waits to transition to C11 relaxed
 atomics
 until the compiler has learned to be sufficiently conservative in its
 load-store combining decisions.
 
 Sounds backwards. Currently the compiler does nothing to the atomics. I'm 
 sure we'll eventually add something. But if testing coverage is zero outside 
 then surely things get worse, not better with time.

Perhaps we solve this chicken-and-egg problem by creating a test suite?

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 1:21 PM, Torvald Riegel trie...@redhat.com wrote:
  On Mon, 2014-02-17 at 12:18 -0800, Linus Torvalds wrote:
  and then it is read by people (compiler writers) that intentionally
  try to mis-use the words and do language-lawyering (that depends on
  what the meaning of 'is' is).
 
  That assumption about people working on compilers is a little too broad,
  don't you think?
 
 Let's just say that *some* are that way, and those are the ones that I
 end up butting heads with.
 
 The sane ones I never have to argue with - point them at a bug, and
 they just say yup, bug. The insane ones say we don't need to fix
 that, because if you read this copy of the standards that have been
 translated to chinese and back, it clearly says that this is
 acceptable.
 
  The whole lvalue vs rvalue expression
  vs 'what is a volatile access' thing for C++ was/is a great example
  of that.
 
  I'm not aware of the details of this.
 
 The argument was that an lvalue doesn't actually access the memory
 (an rvalue does), so this:
 
volatile int *p = ...;
 
*p;
 
 doesn't need to generate a load from memory, because *p is still an
 lvalue (since you could assign things to it).
 
 This isn't an issue in C, because in C, expression statements are
 always rvalues, but C++ changed that.

Huhh.  I can see the problems that this creates in terms of C/C++
compatibility.

 The people involved with the C++
 standards have generally been totally clueless about their subtle
 changes.

This isn't a fair characterization.  There are many people that do care,
and certainly not all are clueless.  But it's a limited set of people,
bugs happen, and not all of them will have the same goals.

I think one way to prevent such problems in the future could be to have
someone in the kernel community volunteer to look through standard
revisions before they are published.  The standard needs to be fixed,
because compilers need to conform to the standard (e.g., a compiler's
extension fixing the above wouldn't be conforming anymore because it
emits more volatile reads than specified).

Or maybe those of us working on the standard need to flag potential
changes of interest to the kernel folks.  But that may be less reliable
than someone from the kernel side looking at them; I don't know.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel trie...@redhat.com wrote:
 On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
 
  if (atomic_load(x, mo_relaxed) == 1)
atomic_store(y, 3, mo_relaxed));

 No, please don't use this idiotic example. It is wrong.

 It won't be useful in practice in a lot of cases, but that doesn't mean
 it's wrong.  It's clearly not illegal code.  It also serves a purpose: a
 simple example to reason about a few aspects of the memory model.

It's not illegal code, but i you claim that you can make that store
unconditional, it's a pointless and wrong example.

 The fact is, if a compiler generates anything but the obvious sequence
 (read/cmp/branch/store - where branch/store might obviously be done
 with some other machine conditional like a predicate), the compiler is
 wrong.

 Why?  I've reasoned why (1) to (3) above allow in certain cases (i.e.,
 the first load always returning 1) for the branch (or other machine
 conditional) to not be emitted.  So please either poke holes into this
 reasoning, or clarify that you don't in fact, contrary to what you wrote
 above, agree with (1) to (3).

The thing is, the first load DOES NOT RETURN 1. It returns whatever
that memory location contains. End of story.

Stop claiming it can return 1.. It *never* returns 1 unless you do
the load and *verify* it, or unless the load itself can be made to go
away. And with the code sequence given, that just doesn't happen. END
OF STORY.

So your argument is *shit*. Why do you continue to argue it?

I told you how that load can go away, and you agreed. But IT CANNOT GO
AWAY any other way. You cannot claim the compiler knows. The
compiler doesn't know. It's that simple.

 So why do I say you are wrong, after I just gave you an example of how
 it happens? Because my example went back to the *real* issue, and
 there are actual real semantically meaningful details with doing
 things like load merging.

 To give an example, let's rewrite things a bit more to use an extra variable:

 atomic_store(x, 1, mo_relaxed);
 a = atomic_load(1, mo_relaxed);
 if (a == 1)
 atomic_store(y, 3, mo_relaxed);

 which looks exactly the same.

 I'm confused.  Is this a new example?

That is a new example. The important part is that it has left a
trace for the programmer: because 'a' contains the value, the
programmer can now look at the value later and say oh, we know we did
a store iff a was 1

 This sequence:

 atomic_store(x, 1, mo_relaxed);
 a = atomic_load(x, mo_relaxed);
 atomic_store(y, 3, mo_relaxed);

 is actually - and very seriously - buggy.

 Why? Because you have effectively split the atomic_load into two loads
 - one for the value of 'a', and one for your 'proof' that the store is
 unconditional.

 I can't follow that, because it isn't clear to me which code sequences
 are meant to belong together, and which transformations the compiler is
 supposed to make.  If you would clarify that, then I can reply to this
 part.

Basically, if the compiler allows the condition of I wrote 3 to the
y, but the programmer sees 'a' has another value than 1 later then
the compiler is one buggy pile of shit. It fundamentally broke the
whole concept of atomic accesses. Basically the atomic access to 'x'
turned into two different accesses: the one that proved that x had
the value 1 (and caused the value 3 to be written), and the other load
that then write that other value into 'a'.

It's really not that complicated.

And this is why descriptions like this should ABSOLUTELY NOT BE
WRITTEN as if the compiler can prove that 'x' had the value 1, it can
remove the branch. Because that IS NOT SUFFICIENT. That was not a
valid transformation of the atomic load.

The only valid transformation was the one I stated, namely to remove
the load entirely and replace it with the value written earlier in the
same execution context.

Really, why is so hard to understand?

   Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 14:14 -0800, Paul E. McKenney wrote:
 On Mon, Feb 17, 2014 at 09:39:54PM +0100, Richard Biener wrote:
  On February 17, 2014 7:18:15 PM GMT+01:00, Paul E. McKenney 
  paul...@linux.vnet.ibm.com wrote:
  On Wed, Feb 12, 2014 at 07:12:05PM +0100, Peter Zijlstra wrote:
   On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
You need volatile semantics to force the compiler to ignore any
  proofs
it might otherwise attempt to construct.  Hence all the
  ACCESS_ONCE()
calls in my email to Torvald.  (Hopefully I translated your example
reasonably.)
   
   My brain gave out for today; but it did appear to have the right
   structure.
  
  I can relate.  ;-)
  
   I would prefer it C11 would not require the volatile casts. It should
   simply _never_ speculate with atomic writes, volatile or not.
  
  I agree with not needing volatiles to prevent speculated writes. 
  However,
  they will sometimes be needed to prevent excessive load/store
  combining.
  The compiler doesn't have the runtime feedback mechanisms that the
  hardware has, and thus will need help from the developer from time
  to time.
  
  Or maybe the Linux kernel simply waits to transition to C11 relaxed
  atomics
  until the compiler has learned to be sufficiently conservative in its
  load-store combining decisions.
  
  Sounds backwards. Currently the compiler does nothing to the atomics. I'm 
  sure we'll eventually add something. But if testing coverage is zero 
  outside then surely things get worse, not better with time.
 
 Perhaps we solve this chicken-and-egg problem by creating a test suite?

Perhaps.  The test suite might also be a good set of examples showing
which cases we expect to be optimized in a certain way, and which not.
I suppose the uses of (the equivalent) of atomics in the kernel would be
a good start.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel trie...@redhat.com wrote:
 On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote:

 The argument was that an lvalue doesn't actually access the memory
 (an rvalue does), so this:

volatile int *p = ...;

*p;

 doesn't need to generate a load from memory, because *p is still an
 lvalue (since you could assign things to it).

 This isn't an issue in C, because in C, expression statements are
 always rvalues, but C++ changed that.

 Huhh.  I can see the problems that this creates in terms of C/C++
 compatibility.

That's not the biggest problem.

The biggest problem is that you have compiler writers that don't care
about sane *use* of the features they write a compiler for, they just
care about the standard.

So they don't care about C vs C++ compatibility. Even more
importantly, they don't care about the *user* that uses only C++ and
the fact that their reading of the standard results in *meaningless*
behavior. They point to the standard and say that's what the standard
says, suck it, and silently generate code (or in this case, avoid
generating code) that makes no sense.

So it's not about C++ being incompatible with C, it's about C++ having
insane and bad semantics unless you just admit that oh, ok, I need to
not just read the standard, I also need to use my brain, and admit
that a C++ statement expression needs to act as if it is an access
wrt volatile variables.

In other words, as a compiler person, you do need to read more than
the paper of standard. You need to also take into account what is
reasonable behavior even when the standard could possibly be read some
other way. And some compiler people don't.

The volatile access in statement expression did get resolved,
sanely, at least in gcc. I think gcc warns about some remaining cases.

Btw, afaik, C++11 actually clarifies the standard to require the
reads, because everybody *knew* that not requiring the read was insane
and meaningless behavior, and clearly against the intent of
volatile.

But that didn't stop compiler writers from saying hey, the standard
allows my insane and meaningless behavior, so I'll implement it and
not consider it a bug.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Joseph S. Myers

On Mon, 17 Feb 2014, Torvald Riegel wrote:

 On Mon, 2014-02-17 at 18:59 +, Joseph S. Myers wrote:
  On Sat, 15 Feb 2014, Torvald Riegel wrote:
  
   glibc is a counterexample that comes to mind, although it's a smaller
   code base.  (It's currently not using C11 atomics, but transitioning
   there makes sense, and some thing I want to get to eventually.)
  
  glibc is using C11 atomics (GCC builtins rather than _Atomic / 
  stdatomic.h, but using __atomic_* with explicitly specified memory model 
  rather than the older __sync_*) on AArch64, plus in certain cases on ARM 
  and MIPS.
 
 I think the major steps remaining is moving the other architectures
 over, and rechecking concurrent code (e.g., for the code that I have

I don't think we'll be ready to require GCC = 4.7 to build glibc for 
another year or two, although probably we could move the requirement up 
from 4.4 to 4.6.  (And some platforms only had the C11 atomics optimized 
later than 4.7.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: MSP430 in gcc4.9 ... enable interrupts?

2014-02-17 Thread DJ Delorie


 I presume these will be part of the headers for the library
 distributed for msp430 gcc by TI/Redhat?

I can't speak for TI's or Red Hat's plans.  GNU's typical non-custom
embedded runtime is newlib/libgloss, which usually doesn't have that
much in the way of chip-specific headers or library functions.

 is that for the critical attribute that exists in the old msp430
 port (which disables interrupts for the duration of the function)?

Yes, for things like that.  They're documented under Function
Attributes in the Extensions to the C Language Family chapter of
the current GCC manual.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Alec Teal


On 17/02/14 20:18, Linus Torvalds wrote:

On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegeltrie...@redhat.com  wrote:

Which example do you have in mind here?  Haven't we resolved all the
debated examples, or did I miss any?

Well, Paul seems to still think that the standard possibly allows
speculative writes or possibly value speculation in ways that break
the hardware-guaranteed orderings.

And personally, I can't read standards paperwork. It is invariably

Can't = Don't - evidently.

written in some basically impossible-to-understand lawyeristic mode,
You mean unambiguous - try reading a patent (Apple have 1000s of 
trivial ones, I tried reading one once thinking how could they have 
phrased it so this got approved, their technique was to make the reader 
want to start cutting themselves to prove they wern't numb to everything)

and then it is read by people (compiler writers) that intentionally
try to mis-use the words and do language-lawyering (that depends on
what the meaning of 'is' is). The whole lvalue vs rvalue expression
vs 'what is a volatile access' thing for C++ was/is a great example
of that.
I'm not going to teach you what rvalues and lvalues, but! 
http://lmgtfy.com/?q=what+are+rvalues might help.


So quite frankly, as a result I refuse to have anything to do with the
process directly.

Is this goodbye?


  Linus
That aside, what is the problem? If the compiler has created code that 
that has different program states than what would be created without 
optimisation please file a bug report and/or send something to the 
mailing list USING A CIVIL TONE, there's no need for swear-words and 
profanities all the time - use them when you want to emphasise 
something. Additionally if you are always angry, start calling that 
state normal then reserve such words for when you are outraged.


There are so many emails from you bitching about stuff, I've lost track 
of what you're bitching about you bitch that much about it. Like this 
standards stuff above (notice I said stuff, not crap or shit).


What exactly is your problem, if the compiler is doing something the 
standard does not permit, or optimising something wrongly (read: puts 
the program in a different state than if the optimisation was not 
applied) that is REALLY serious, you are right to report it; but 
whining like a n00b on Stack-overflow when a question gets closed is not 
helping.


I tried reading back though the emails (I dismissed them previously) but 
there's just so much ranting, and rants about the standard too (I would 
trash this if I deemed the effort required to delete was less than the 
storage of the bytes the message takes up) standardised behaviour is 
VERY important.


So start again, what is the serious problem, have you got any code that 
would let me replicate it, what is your version of GCC?


Oh and lastly! Optimisations are not as casual as oh, we could do this 
and it'd work better unlike kernel work or any other software that is 
being improved, it is very formal (and rightfully so). I seriously 
recommend you read the first 40 pages at least of a book called 
Compiler Design, Analysis and Transformation it's not about the 
parsing phases or anything, but it develops a good introduction and 
later a good foundation for exploring the field further. Compilers do 
not operate on what I call A-level logic and to show what I mean I use 
the shovel-to-the-face of real analysis, of course 1/x tends towards 0, 
it's not gonna be 5!! = A-level logic. Let epsilon  0 be given, then 
there exists an N - formal proof. So when one says the compiler 
can prove it's not some silly thing powered by A-level logic, it is the 
implementation of something that can be proven to be correct (in the 
sense of the program states mentioned before)


So yeah, calm down and explain - no lashing out at standards bodies, 
what is the problem?


Alec

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 14:32 -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 2:09 PM, Torvald Riegel trie...@redhat.com wrote:
  On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
  
   if (atomic_load(x, mo_relaxed) == 1)
 atomic_store(y, 3, mo_relaxed));
 
  No, please don't use this idiotic example. It is wrong.
 
  It won't be useful in practice in a lot of cases, but that doesn't mean
  it's wrong.  It's clearly not illegal code.  It also serves a purpose: a
  simple example to reason about a few aspects of the memory model.
 
 It's not illegal code, but i you claim that you can make that store
 unconditional, it's a pointless and wrong example.
 
  The fact is, if a compiler generates anything but the obvious sequence
  (read/cmp/branch/store - where branch/store might obviously be done
  with some other machine conditional like a predicate), the compiler is
  wrong.
 
  Why?  I've reasoned why (1) to (3) above allow in certain cases (i.e.,
  the first load always returning 1) for the branch (or other machine
  conditional) to not be emitted.  So please either poke holes into this
  reasoning, or clarify that you don't in fact, contrary to what you wrote
  above, agree with (1) to (3).
 
 The thing is, the first load DOES NOT RETURN 1. It returns whatever
 that memory location contains. End of story.

The memory location is just an abstraction for state, if it's not
volatile.

 Stop claiming it can return 1.. It *never* returns 1 unless you do
 the load and *verify* it, or unless the load itself can be made to go
 away. And with the code sequence given, that just doesn't happen. END
 OF STORY.

void foo();
{
  atomicint x = 1;
  if (atomic_load(x, mo_relaxed) == 1)
atomic_store(y, 3, mo_relaxed));
}

This is a counter example to your claim, and yes, the compiler has proof
that x is 1.  It's deliberately simple, but I can replace this with
other more advanced situations.  For example, if x comes out of malloc
(or, on the kernel side, something else that returns non-aliasing
memory) and hasn't provably escaped to other threads yet.

I haven't posted this full example, but I've *clearly* said that *if*
the compiler can prove that the load would always return 1, it can
remove it.  And it's simple to see why that's the case: If this holds,
then in all allowed executions it would load from a know store, the
relaxed_mo gives no further ordering guarantees so we can just take the
value, and we're good.

 So your argument is *shit*. Why do you continue to argue it?

Maybe because it isn't?  Maybe you should try to at least trust that my
intentions are good, even if distrusting my ability to reason.

 I told you how that load can go away, and you agreed. But IT CANNOT GO
 AWAY any other way. You cannot claim the compiler knows. The
 compiler doesn't know. It's that simple.

Oh yes it can.  Because of the same rules that allow you to perform the
other transformations.  Please try to see the similarities here.  You
previously said you don't want to mix volatile semantics and atomics.
This is something that's being applied in this example.

  So why do I say you are wrong, after I just gave you an example of how
  it happens? Because my example went back to the *real* issue, and
  there are actual real semantically meaningful details with doing
  things like load merging.
 
  To give an example, let's rewrite things a bit more to use an extra 
  variable:
 
  atomic_store(x, 1, mo_relaxed);
  a = atomic_load(1, mo_relaxed);
  if (a == 1)
  atomic_store(y, 3, mo_relaxed);
 
  which looks exactly the same.
 
  I'm confused.  Is this a new example?
 
 That is a new example. The important part is that it has left a
 trace for the programmer: because 'a' contains the value, the
 programmer can now look at the value later and say oh, we know we did
 a store iff a was 1
 
  This sequence:
 
  atomic_store(x, 1, mo_relaxed);
  a = atomic_load(x, mo_relaxed);
  atomic_store(y, 3, mo_relaxed);
 
  is actually - and very seriously - buggy.
 
  Why? Because you have effectively split the atomic_load into two loads
  - one for the value of 'a', and one for your 'proof' that the store is
  unconditional.
 
  I can't follow that, because it isn't clear to me which code sequences
  are meant to belong together, and which transformations the compiler is
  supposed to make.  If you would clarify that, then I can reply to this
  part.
 
 Basically, if the compiler allows the condition of I wrote 3 to the
 y, but the programmer sees 'a' has another value than 1 later then
 the compiler is one buggy pile of shit. It fundamentally broke the
 whole concept of atomic accesses. Basically the atomic access to 'x'
 turned into two different accesses: the one that proved that x had
 the value 1 (and caused the value 3 to be written), and the other load
 that then write that other value into 'a'.
 
 It's really not that complicated.

Yes that's not complicated, but I assumed this to be

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Torvald Riegel

On Mon, 2014-02-17 at 14:47 -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 2:25 PM, Torvald Riegel trie...@redhat.com wrote:
  On Mon, 2014-02-17 at 14:02 -0800, Linus Torvalds wrote:
 
  The argument was that an lvalue doesn't actually access the memory
  (an rvalue does), so this:
 
 volatile int *p = ...;
 
 *p;
 
  doesn't need to generate a load from memory, because *p is still an
  lvalue (since you could assign things to it).
 
  This isn't an issue in C, because in C, expression statements are
  always rvalues, but C++ changed that.
 
  Huhh.  I can see the problems that this creates in terms of C/C++
  compatibility.
 
 That's not the biggest problem.
 
 The biggest problem is that you have compiler writers that don't care
 about sane *use* of the features they write a compiler for, they just
 care about the standard.
 
 So they don't care about C vs C++ compatibility. Even more
 importantly, they don't care about the *user* that uses only C++ and
 the fact that their reading of the standard results in *meaningless*
 behavior. They point to the standard and say that's what the standard
 says, suck it, and silently generate code (or in this case, avoid
 generating code) that makes no sense.

There's an underlying problem here that's independent from the actual
instance that you're worried about here: no sense is a ultimately a
matter of taste/objectives/priorities as long as the respective
specification is logically consistent.

If you want to be independent of your sanity being different from other
people's sanity (e.g., compiler writers), you need to make sure that the
specification is precise and says what you want.  IOW, think about the
specification being the program, and the people being computers; you
better want a well-defined program in this case.

 So it's not about C++ being incompatible with C, it's about C++ having
 insane and bad semantics unless you just admit that oh, ok, I need to
 not just read the standard, I also need to use my brain, and admit
 that a C++ statement expression needs to act as if it is an access
 wrt volatile variables.

1) I agree that (IMO) a good standard strives for being easy to
understand.

2) In practice, there is a trade-off between Easy to understand and
actually producing a specification.  A standard is not a tutorial.  And
that's for good reason, because (a) there might be more than one way to
teach something and that should be allowed and (b) that the standard
should carry the full precision but still be compact enough to be
manageable.

3) Implementations can try to be nice to users by helping them avoiding
error-prone corner cases or such.  A warning for common problems is such
a case.  But an implementation has to draw a line somewhere, demarcating
cases where it fully exploits what the standard says (eg, to allow
optimizations) from cases where it is more conservative and does what
the standard allows but in a potentially more intuitive way.  That's
especially the case if it's being asked to produce high-performance
code.

4) There will be arguments for where the line actually is, simply
because different users will have different goals.

5) The way to reduce 4) is to either make the standard more specific, or
to provide better user documentation.  If the standard has strict
requirements, then there will be less misunderstanding.

6) To achieve 5), one way is to get involved in the standards process.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 3:10 PM, Alec Teal a.t...@warwick.ac.uk wrote:

 You mean unambiguous - try reading a patent (Apple have 1000s of trivial
 ones, I tried reading one once thinking how could they have phrased it so
 this got approved, their technique was to make the reader want to start
 cutting themselves to prove they wern't numb to everything)

Oh, I agree, patent language is worse.

 I'm not going to teach you what rvalues and lvalues, but!

I know what lvalues and rvalues are. I *understand* the thinking that
goes on behind the let's not do the access, because it's not an
rvalue, so there is no 'access' to the object.

I understand it from a technical perspective.

I don't understand the compiler writer that uses a *technicality* to
argue against generating sane code that is obviously what the user
actually asked for.

See the difference?

 So start again, what is the serious problem, have you got any code that
 would let me replicate it, what is your version of GCC?

The volatile problem is long fixed. The people who argued for the
legalistically correct, but insane behavior lost (and as mentioned,
I think C++11 actually fixed the legalistic reading too).

I'm bringing it up because I've had too many cases where compiler
writers pointed to standard and said that is ambiguous or undefined,
so we can do whatever the hell we want, regardless of whether that's
sensible, or regardless of whether there is a sensible way to get the
behavior you want or not.


 Oh and lastly! Optimisations are not as casual as oh, we could do this and
 it'd work better unlike kernel work or any other software that is being
 improved, it is very formal (and rightfully so)

Alec, I know compilers. I don't do code generation (quite frankly,
register allocation and instruction choice is when I give up), but I
did actually write my own for static analysis, including turning
things into SSA etc.

No, I'm not a compiler person, but I actually do know enough that I
understand what goes on.

And exactly because I know enough, I would *really* like atomics to be
well-defined, and have very clear - and *local* - rules about how they
can be combined and optimized.

None of this if you can prove that the read has value X stuff. And
things like value speculation should simply not be allowed, because
that actually breaks the dependency chain that the CPU architects give
guarantees for. Instead, make the rules be very clear, and very
simple, like my suggestion. You can never remove a load because you
can prove it has some value, but you can combine two consecutive
atomic accesses/

For example, CPU people actually do tend to give guarantees for
certain things, like stores that are causally related being visible in
a particular order. If the compiler starts doing value speculation on
atomic accesses, you are quite possibly breaking things like that.
It's just not a good idea. Don't do it. Write the standard so that it
clearly is disallowed.

Because you may think that a C standard is machine-independent, but
that isn't really the case. The people who write code still write code
for a particular machine. Our code works (in the general case) on
different byte orderings, different register sizes, different memory
ordering models. But in each *instance* we still end up actually
coding for each machine.

So the rules for atomics should be simple and *specific* enough that
when you write code for a particular architecture, you can take the
architecture memory ordering *and* the C atomics orderings into
account, and do the right thing for that architecture.

And that very much means that doing things like value speculation MUST
NOT HAPPEN. See? Even if you can prove that your code is equivalent,
it isn't.

So for example, let's say that you have a pointer, and you have some
reason to believe that the pointer has a particular value. So you
rewrite following the pointer from this:

  value = ptr-val;

into

  value = speculated-value;
  tmp = ptr;
  if (unlikely(tmp != speculated))
value = tmp-value;

and maybe you can now make the critical code-path for the speculated
case go faster (since now there is no data dependency for the
speculated case, and the actual pointer chasing load is now no longer
in the critical path), and you made things faster because your
profiling showed that the speculated case was true 99% of the time.
Wonderful, right? And clearly, the code provably does the same
thing.

EXCEPT THAT IS NOT TRUE AT ALL.

It very much does not do the same thing at all, and by doing value
speculation and proving something was true, the only thing you did
was to make incorrect code run faster. Because now the causally
related load of value from the pointer isn't actually causally related
at all, and you broke the memory ordering.

This is why I don't like it when I see Torvald talk about proving
things. It's bullshit. You can prove pretty much anything, and in
the process lose sight of the bigger issue, namely that there is code
that

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 3:17 PM, Torvald Riegel trie...@redhat.com wrote:
 On Mon, 2014-02-17 at 14:32 -0800,

 Stop claiming it can return 1.. It *never* returns 1 unless you do
 the load and *verify* it, or unless the load itself can be made to go
 away. And with the code sequence given, that just doesn't happen. END
 OF STORY.

 void foo();
 {
   atomicint x = 1;
   if (atomic_load(x, mo_relaxed) == 1)
 atomic_store(y, 3, mo_relaxed));
 }

This is the very example I gave, where the real issue is not that you
prove that load returns 1, you instead say store followed by a load
can be combined.

I (in another email I just wrote) tried to show why the prove
something is true is a very dangerous model.  Seriously, it's pure
crap. It's broken.

If the C standard defines atomics in terms of provable equivalence,
it's broken. Exactly because on a *virtual* machine you can prove
things that are not actually true in a *real* machine. I have the
example of value speculation changing the memory ordering model of the
actual machine.

See?

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel trie...@redhat.com wrote:

 There's an underlying problem here that's independent from the actual
 instance that you're worried about here: no sense is a ultimately a
 matter of taste/objectives/priorities as long as the respective
 specification is logically consistent.

Yes. But I don't think it's independent.

Exactly *because* some people will read standards without applying
does the resulting code generation actually make sense for the
programmer that wrote the code, the standard has to be pretty clear.

The standard often *isn't* pretty clear. It wasn't clear enough when
it came to volatile, and yet that was a *much* simpler concept than
atomic accesses and memory ordering.

And most of the time it's not a big deal. But because the C standard
generally tries to be very portable, and cover different machines,
there tends to be a mindset that anything inherently unportable is
undefined or implementation defined, and then the compiler writer
is basically given free reign to do anything they want (with
implementation defined at least requiring that it is reliably the
same thing).

And when it comes to memory ordering, *everything* is basically
non-portable, because different CPU's very much have different rules.
I worry that that means that the standard then takes the stance that
well, compiler re-ordering is no worse than CPU re-ordering, so we
let the compiler do anything. And then we have to either add
volatile to make sure the compiler doesn't do that, or use an overly
strict memory model at the compiler level that makes it all pointless.

So I really really hope that the standard doesn't give compiler
writers free hands to do anything that they can prove is equivalent
in the virtual C machine model. That's not how you get reliable
results.

   Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Mon, Feb 17, 2014 at 04:18:52PM -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 3:41 PM, Torvald Riegel trie...@redhat.com wrote:
 
  There's an underlying problem here that's independent from the actual
  instance that you're worried about here: no sense is a ultimately a
  matter of taste/objectives/priorities as long as the respective
  specification is logically consistent.
 
 Yes. But I don't think it's independent.
 
 Exactly *because* some people will read standards without applying
 does the resulting code generation actually make sense for the
 programmer that wrote the code, the standard has to be pretty clear.
 
 The standard often *isn't* pretty clear. It wasn't clear enough when
 it came to volatile, and yet that was a *much* simpler concept than
 atomic accesses and memory ordering.
 
 And most of the time it's not a big deal. But because the C standard
 generally tries to be very portable, and cover different machines,
 there tends to be a mindset that anything inherently unportable is
 undefined or implementation defined, and then the compiler writer
 is basically given free reign to do anything they want (with
 implementation defined at least requiring that it is reliably the
 same thing).
 
 And when it comes to memory ordering, *everything* is basically
 non-portable, because different CPU's very much have different rules.
 I worry that that means that the standard then takes the stance that
 well, compiler re-ordering is no worse than CPU re-ordering, so we
 let the compiler do anything. And then we have to either add
 volatile to make sure the compiler doesn't do that, or use an overly
 strict memory model at the compiler level that makes it all pointless.

For whatever it is worth, this line of reasoning has been one reason why
I have been objecting strenuously every time someone on the committee
suggests eliminating volatile from the standard.

Thanx, Paul

 So I really really hope that the standard doesn't give compiler
 writers free hands to do anything that they can prove is equivalent
 in the virtual C machine model. That's not how you get reliable
 results.
 
Linus

RE: Vectorizer Pragmas

2014-02-17 Thread Geva, Robert

The way Intel present #pragma simd (to users, to the OpenMP committee, to the C 
and C++ committees, etc) is that it is not a hint, it has a meaning.
The meaning is defined in term of evaluation order.
Both C and C++ define an evaluation order for sequential programs. #pragma simd 
relaxes the sequential order into a partial order:
0. subsequent iterations of the loop are chunked together and execute in 
lockstep
1. there is no change in the order of evaluation of expression within an 
iteration
2. if X and Y are expressions in the loop, and X(i) is the evaluation of X in 
iteration i, then for X sequenced before Y and iteration i evaluated before 
iteration j, X(i) is sequenced before Y(j).

A corollary is that the sequential order is always allowed, since it satisfies 
the partial order.
However, the partial order allows the compiler to group copies of the same 
expression next to each other, and then to combine the scalar instructions into 
a vector instruction.
There are other corollaries, such as that if multiple loop iterations write 
into an object defined outside of the loop then it has to be an undefined 
behavior, the vector moral equivalent of a data race. That is what induction 
variables and reductions are necessary exception to this rule and require 
explicit support.

As far as correctness, by this definition, the programmer expressed that it is 
correct, and the compiler should not try to prove correctness. 

On performance heuristics side, the Intel compiler tries to not second guess 
the user. There are users who work much harder than just add a #pragma simd on 
unmodified sequential loops. There are various changes that may be necessary, 
and users who worked hard to get their loops in a good shape are unhappy if the 
compiler does second guess them.

Robert.

-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Renato 
Golin
Sent: Monday, February 17, 2014 7:14 AM
To: tpri...@computer.org
Cc: gcc
Subject: Re: Vectorizer Pragmas

On 17 February 2014 14:47, Tim Prince n...@aol.com wrote:
 I'm continuing discussions with former Intel colleagues.  If you are 
 asking for insight into how Intel priorities vary over time, I don't 
 expect much, unless the next beta compiler provides some inferences.  
 They have talked about implementing all of OpenMP 4.0 except user 
 defined reduction this year.  That would imply more activity in that 
 area than on cilkplus,

I'm expecting this. Any proposal to support Cilk in LLVM would be purely 
temporary and not endorsed in any way.


 although some fixes have come in the latter.  On the other hand I had 
 an issue on omp simd reduction(max: ) closed with the decision will 
 not be fixed.

We still haven't got pragmas for induction/reduction logic, so I'm not too 
worried about them.


 I have an icc problem report in on fixing omp simd safelen so it is 
 more like the standard and less like the obsolete pragma simd vectorlength.

Our width metadata is slightly different in that it means try to use that 
length, rather than it's safe to use that length, this is why I'm holding on 
use safelen for the moment.


 Also, I have some problem reports active attempting to get 
 clarification of their omp target implementation.

Same here... RTFM is not enough in this case. ;)


 You may have noticed that omp parallel for simd in current Intel 
 compilers can be used for combined thread and simd parallelism, 
 including the case where the outer loop is parallelizable and 
 vectorizable but the inner one is not.

That's my fear of going with omp simd directly. I don't want to be throwing 
threads all over the place when all I really want is vector code.

For the time, my proposal is to use legacy pragmas: vector/novector, 
unroll/nounroll and simd vectorlength which map nicely to the metadata we 
already have and don't incur in OpenMP overhead. Later on, if OpenMP ends up 
with simple non-threaded pragmas, we should use those and deprecate the legacy 
ones.

If GCC is trying to do the same thing regarding non-threaded-vector code, I'd 
be glad to be involved in the discussion. Some LLVM folks think this should be 
an OpenMP discussion, I personally think it's pushing the boundaries a bit too 
much on an inherently threaded library extension.

cheers,
--renato

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Mon, Feb 17, 2014 at 12:18:21PM -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel trie...@redhat.com wrote:
 
  Which example do you have in mind here?  Haven't we resolved all the
  debated examples, or did I miss any?
 
 Well, Paul seems to still think that the standard possibly allows
 speculative writes or possibly value speculation in ways that break
 the hardware-guaranteed orderings.

It is not that I know of any specific problems, but rather that I
know I haven't looked under all the rocks.  Plus my impression from
my few years on the committee is that the standard will be pushed to
the limit when it comes time to add optimizations.

One example that I learned about last week uses the branch-prediction
hardware to validate value speculation.  And no, I am not at all a fan
of value speculation, in case you were curious.  However, it is still
an educational example.

This is where you start:

p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */
do_something(p-a, p-b, p-c);
p-d = 1;

Then you leverage branch-prediction hardware as follows:

p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */
if (p == GUESS) {
do_something(GUESS-a, GUESS-b, GUESS-c);
GUESS-d = 1;
} else {
do_something(p-a, p-b, p-c);
p-d = 1;
}

The CPU's branch-prediction hardware squashes speculation in the case where
the guess was wrong, and this prevents the speculative store to -d from
ever being visible.  However, the then-clause breaks dependencies, which
means that the loads -could- be speculated, so that do_something() gets
passed pre-initialization values.

Now, I hope and expect that the wording in the standard about dependency
ordering prohibits this sort of thing.  But I do not yet know for certain.

And yes, I am being paranoid.  But not unnecessarily paranoid.  ;-)

Thanx, Paul

 And personally, I can't read standards paperwork. It is invariably
 written in some basically impossible-to-understand lawyeristic mode,
 and then it is read by people (compiler writers) that intentionally
 try to mis-use the words and do language-lawyering (that depends on
 what the meaning of 'is' is). The whole lvalue vs rvalue expression
 vs 'what is a volatile access' thing for C++ was/is a great example
 of that.
 
 So quite frankly, as a result I refuse to have anything to do with the
 process directly.
 
  Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:

 One example that I learned about last week uses the branch-prediction
 hardware to validate value speculation.  And no, I am not at all a fan
 of value speculation, in case you were curious.

Heh. See the example I used in my reply to Alec Teal. It basically
broke the same dependency the same way.

Yes, value speculation of reads is simply wrong, the same way
speculative writes are simply wrong. The dependency chain matters, and
is meaningful, and breaking it is actively bad.

As far as I can tell, the intent is that you can't do value
speculation (except perhaps for the relaxed, which quite frankly
sounds largely useless). But then I do get very very nervous when
people talk about proving certain values.

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Linus Torvalds

On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 As far as I can tell, the intent is that you can't do value
 speculation (except perhaps for the relaxed, which quite frankly
 sounds largely useless).

Hmm. The language I see for consume is not obvious:

  Consume operation: no reads in the current thread dependent on the
value currently loaded can be reordered before this load

and it could make a compiler writer say that value speculation is
still valid, if you do it like this (with ptr being the atomic
variable):

  value = ptr-val;

into

  tmp = ptr;
  value = speculated.value;
  if (unlikely(tmp != speculated))
value = tmp-value;

which is still bogus. The load of ptr does happen before the load of
value = speculated-value in the instruction stream, but it would
still result in the CPU possibly moving the value read before the
pointer read at least on ARM and power.

So if you're a compiler person, you think you followed the letter of
the spec - as far as *you* were concerned, no load dependent on the
value of the atomic load moved to before the atomic load. You go home,
happy, knowing you've done your job. Never mind that you generated
code that doesn't actually work.

I dread having to explain to the compiler person that he may be right
in some theoretical virtual machine, but the code is subtly broken and
nobody will ever understand why (and likely not be able to create a
test-case showing the breakage).

But maybe the full standard makes it clear that reordered before this
load actually means on the real hardware, not just in the generated
instruction stream. Reading it with understanding of the *intent* and
understanding all the different memory models that requirement should
be obvious (on alpha, you need an rmb instruction after the load),
but ...

Linus

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Mon, Feb 17, 2014 at 07:24:56PM -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 7:00 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
 
  One example that I learned about last week uses the branch-prediction
  hardware to validate value speculation.  And no, I am not at all a fan
  of value speculation, in case you were curious.
 
 Heh. See the example I used in my reply to Alec Teal. It basically
 broke the same dependency the same way.

;-)

 Yes, value speculation of reads is simply wrong, the same way
 speculative writes are simply wrong. The dependency chain matters, and
 is meaningful, and breaking it is actively bad.
 
 As far as I can tell, the intent is that you can't do value
 speculation (except perhaps for the relaxed, which quite frankly
 sounds largely useless). But then I do get very very nervous when
 people talk about proving certain values.

That was certainly my intent, but as you might have notice in the
discussion earlier in this thread, the intent can get lost pretty
quickly.  ;-)

The HPC guys appear to be the most interested in breaking dependencies.
Their software does't rely on dependencies, and from their viewpoint
anything that has any chance of leaving an FP unit of any type idle is
a very bad thing.  But there are probably other benchmarks for which
breaking dependencies gives a few percent performance boost.

Thanx, Paul

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-17 Thread Paul E. McKenney

On Mon, Feb 17, 2014 at 07:42:42PM -0800, Linus Torvalds wrote:
 On Mon, Feb 17, 2014 at 7:24 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  As far as I can tell, the intent is that you can't do value
  speculation (except perhaps for the relaxed, which quite frankly
  sounds largely useless).
 
 Hmm. The language I see for consume is not obvious:
 
   Consume operation: no reads in the current thread dependent on the
 value currently loaded can be reordered before this load
 
 and it could make a compiler writer say that value speculation is
 still valid, if you do it like this (with ptr being the atomic
 variable):
 
   value = ptr-val;
 
 into
 
   tmp = ptr;
   value = speculated.value;
   if (unlikely(tmp != speculated))
 value = tmp-value;
 
 which is still bogus. The load of ptr does happen before the load of
 value = speculated-value in the instruction stream, but it would
 still result in the CPU possibly moving the value read before the
 pointer read at least on ARM and power.
 
 So if you're a compiler person, you think you followed the letter of
 the spec - as far as *you* were concerned, no load dependent on the
 value of the atomic load moved to before the atomic load. You go home,
 happy, knowing you've done your job. Never mind that you generated
 code that doesn't actually work.

Agreed, that would be bad.  But please see below.

 I dread having to explain to the compiler person that he may be right
 in some theoretical virtual machine, but the code is subtly broken and
 nobody will ever understand why (and likely not be able to create a
 test-case showing the breakage).

If things go as they usually do, such explanations will be required
a time or two.

 But maybe the full standard makes it clear that reordered before this
 load actually means on the real hardware, not just in the generated
 instruction stream. Reading it with understanding of the *intent* and
 understanding all the different memory models that requirement should
 be obvious (on alpha, you need an rmb instruction after the load),
 but ...

The key point with memory_order_consume is that it must be paired with
some sort of store-release, a category that includes stores tagged
with memory_order_release (surprise!), memory_order_acq_rel, and
memory_order_seq_cst.  This pairing is analogous to the memory-barrier
pairing in the Linux kernel.

So you have something like this for the rcu_assign_pointer() side:

p = kmalloc(...);
if (unlikely(!p))
return -ENOMEM;
p-a = 1;
p-b = 2;
p-c = 3;
/* The following would be buried within rcu_assign_pointer(). */
atomic_store_explicit(gp, p, memory_order_release);

And something like this for the rcu_dereference() side:

/* The following would be buried within rcu_dereference(). */
q = atomic_load_explicit(gp, memory_order_consume);
do_something_with(q-a);

So, let's look at the C11 draft, section 5.1.2.4 Multi-threaded
executions and data races.

5.1.2.4p14 says that the atomic_load_explicit() carries a dependency to
the argument of do_something_with().

5.1.2.4p15 says that the atomic_store_explicit() is dependency-ordered
before the atomic_load_explicit().

5.1.2.4p15 also says that the atomic_store_explicit() is
dependency-ordered before the argument of do_something_with().  This is
because if A is dependency-ordered before X and X carries a dependency
to B, then A is dependency-ordered before B.

5.1.2.4p16 says that the atomic_store_explicit() inter-thread happens
before the argument of do_something_with().

The assignment to p-a is sequenced before the atomic_store_explicit().

Therefore, combining these last two, the assignment to p-a happens
before the argument of do_something_with(), and that means that
do_something_with() had better see the 1 assigned to p-a or some
later value.

But as far as I know, compiler writers currently take the approach of
treating memory_order_consume as if it was memory_order_acquire.
Which certainly works, as long as ARM and PowerPC people don't mind
an extra memory barrier out of each rcu_dereference().

Which is one thing that compiler writers are permitted to do according
to the standard -- substitute a memory-barrier instruction for any
given dependency...

Thanx, Paul

Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...

2014-02-17 Thread Mohsin Khan

Hi,

 I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
I wrote a plugin and tried to count and see the Goto Statements using
the gimple_stmt_iterator. I get gimple statements printed on my
stdout, but I am not able to find the line which has goto statements.
I only get other lines such as variable declaration and logic
statements, but no goto statements.
  When I open the Gimple/SSA/CFG file seperately using the vim editor
I find the goto statements are actually present.
  So, can anyone help me. How can I actually get the count of Goto
statements or atleast access these goto statements using some
iterator.
  I have used -fdump-tree-all, -fdump-tree-cfg as flags.

Here is the pseudocode:

struct register_pass_info pass_info = {
(pass_plugin.pass), /* Address of new pass,
here, the 'struct
 opt_pass' field of
'gimple_opt_pass'
 defined above */
ssa,   /* Name of the reference
pass for hooking up
 the new pass.   ??? */
0,   /* Insert the pass at the
specified instance
 number of the reference
pass. Do it for
 every instance if it is 0. */
PASS_POS_INSERT_AFTER/* how to insert the new
pass: before,
 after, or replace. Here
we are inserting
 a pass names 'plug' after
the pass named
 'pta' */
};

.

static unsigned int dead_code_elimination (void)
{

   FOR_EACH_BB_FN (bb, cfun)
 {
  //  gimple_dump_bb(stdout,bb,0,0);
 //printf(\nIn New BB);

   gsi2= gsi_after_labels (bb);
  print_gimple_stmt(stdout,gsi_stmt(gsi2),0,0);
 /*Iterating over each gimple statement in a basic block*/
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
{
   g = gsi_stmt(gsi);

print_gimple_stmt(stdout,g,0,0);

  if (gimple_code(g)==GIMPLE_GOTO)
  printf(\nFound GOTO stmt\n);

print_gimple_stmt(stdout,gsi_stmt(gsi),0,0);
  //analyze_gimple_statement (gsi);
 }
   }
}

Re: Help Required on Missing GOTO statements in Gimple/SSA/CFG Pass ...

2014-02-17 Thread Basile Starynkevitch

On Tue, 2014-02-18 at 11:17 +0530, Mohsin Khan wrote:
 Hi,
 
  I am developing plugins for the GCC-4.8.2. I am a newbie in plugins.
 I wrote a plugin and tried to count and see the Goto Statements using
 the gimple_stmt_iterator. I get gimple statements printed on my
 stdout, but I am not able to find the line which has goto statements.

I guess that most GOTOs are just becoming implicit as the link to the
next basic block.

Probably 

   if (!cond) goto end;
   something;
  end:;

has nearly the same Gimple representation than
   while (cond) {
 something;
   }

BTW, did you consider using MELT http://gcc-melt.org/ to code your GCC
extension?

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable

2014-02-17 Thread chengniansun at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029

Chengnian Sun chengniansun at gmail dot com changed:

   What|Removed |Added

 CC||chengniansun at gmail dot com

--- Comment #4 from Chengnian Sun chengniansun at gmail dot com ---
May I ask what is the design rational of not warning unused static const
variables?  

I saw Clang has a different strategy, and it even has a type of warning --
[-Wunused-const-variable]

[Bug middle-end/60235] Inlining fails with template specialization and -fPIC on Linux AMD64

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60235

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
The specialization is a regular function, not comdat, thus it is not
appropriate to inline it at -O2 -fpic, only -O3 is inlining functions
regardless to whether they could be interposed or not, or for -O2 without -fpic
because the symbol can't be interposed.  Or use the inline keyword for the
specialization.

[Bug fortran/60191] test case gfortran.dg/dynamic_dispatch_1/3.f03 fail on ARMv7

2014-02-17 Thread bernd.edlinger at hotmail dot de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60191

--- Comment #8 from Bernd Edlinger bernd.edlinger at hotmail dot de ---
(In reply to janus from comment #5)
 (In reply to Bernd Edlinger from comment #3)
   The function make_real is not invoked directly, but through the 
   type-bound
   a%real, which is called three times in the test case. Does the failure
   occur already at the first one (i.e. line 67)? Can you give a reduced test
   case?
  
  Yes it is in line 67.
 
 Ok, then I guess the following reduction should be enough to trigger the bug?
 
 
 module m
 
   type :: t1
 integer :: i = 42
   contains
 procedure, pass :: real = make_real
   end type
 
 contains
 
   real function make_real (arg)
 class(t1), intent(in) :: arg
 make_real = real (arg%i)
   end function make_real
 
 end module m
 
   use m
   class(t1), pointer :: a
   type(t1), target :: b
   a = b
   if (a%real() .ne. real (42)) call abort
 end
 
 

The crash occurs if I add the line procedure(make_real), pointer :: ptr
to type t1.

 Additionally you could try if calling 'make_real' directly (without the
 type-binding) works, i.e. replace the last line by:
 
   if (make_real(a) .ne. real (42)) call abort
 
 

This line does not abort.


   The type-bound call is transformed into a procedure-pointer-component
   call, i.e. a._vptr-real (a). Do all the proc_ptr_comp_* test cases 
   work
   on ARMv7?
  
  Yes, the test cases that failed with the last snapshot are:
  
  FAIL: gfortran.dg/dynamic_dispatch_1.f03  -O0  execution test
  FAIL: gfortran.dg/dynamic_dispatch_3.f03  -O0  execution test
  FAIL: gfortran.dg/select_type_4.f90  -O2  execution test
 
 This one might possibly be related. It also involves polymorphism (but no
 type-bound procedures).

I think dynamic_dispatch_3.f03 duplicates this one.
But I am not sure about select_type_4.f90:

$ gfortran -O1 -g select_type_4.f90 -o select_type_4
$ ./select_type_4 
   1.2302
  42
 Node with no data.
 Some other node type.
   4.5594

$ gfortran -O2 -g select_type_4.f90 -o select_type_4
$ ./select_type_4
1.2302
  42

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
Segmentation fault

Program received signal SIGSEGV, Segmentation fault.
MAIN__ () at select_type_4.f90:166
166if (cnt /= 4) call abort()

but this statement is executed for the third time when the chash happens:

Breakpoint 2, MAIN__ () at select_type_4.f90:166
166if (cnt /= 4) call abort()
2: /x $r2 = 0x8db4
1: x/i $pc
= 0x8b14 MAIN__+608:ldrr3, [r2]
(gdb) c
Continuing.
   1.2302

Breakpoint 2, MAIN__ () at select_type_4.f90:166
166if (cnt /= 4) call abort()
2: /x $r2 = 0x8dcc
1: x/i $pc
= 0x8b14 MAIN__+608:ldrr3, [r2]
(gdb) c
Continuing.
  42

Breakpoint 2, MAIN__ () at select_type_4.f90:166
166if (cnt /= 4) call abort()
2: /x $r2 = 0xf15aea17
1: x/i $pc
= 0x8b14 MAIN__+608:ldrr3, [r2]

this looks like some loop optimization problem.

[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org ---
Well, clang strategy seems to be not to bother with false positives and always
prefer warning over not warning on anything, so usually the clang output is
just completely unreadable because among the tons of false positives it is hard
to find actual real code problems.  GCC strategy is to find some ballance
between false positive warnings and missed warnings.

[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable

2014-02-17 Thread mikpelinux at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029

--- Comment #6 from Mikael Pettersson mikpelinux at gmail dot com ---
(In reply to Chengnian Sun from comment #4)
 May I ask what is the design rational of not warning unused static const
 variables?  

See PR28901.  There are cases of unused static const where the warning isn't
wanted, and so far the decision has been to favour those over the cases where
the warning _is_ wanted and would have detected real bugs.  Sigh.

[Bug driver/60233] AVX instructions emitted with -march=native on host without AVX support

2014-02-17 Thread ubizjak at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60233

--- Comment #5 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to Jakub Jelinek from comment #4)
 I think the reason for this is that -march=native passes in your case
 -mf16c, and -mf16c implies -mavx.  So, either OPTION_MASK_ISA_F16C_SET
 should not include OPTION_MASK_ISA_AVX_SET, or the driver shouldn't set
 -mf16c if AVX support is missing.
 As at least some of the F16C instructions use ymmN registers, if we'd change
 OPTION_MASK_ISA_F16C_SET, then the *256 TARGET_F16C patterns would also need
 to be guarded with  TARGET_AVX.
 For the latter alternative, we would need to do something like:
 --- gcc/config/i386/driver-i386.c 2014-01-03 11:41:06.393269411 +0100
 +++ gcc/config/i386/driver-i386.c 2014-02-17 07:32:41.289022308 +0100
 @@ -513,6 +513,7 @@ const char *host_detect_local_cpu (int a
has_avx2 = 0;
has_fma = 0;
has_fma4 = 0;
 +  has_f16c = 0;
has_xop = 0;
has_xsave = 0;
has_xsaveopt = 0;

This is the correct approach. We already disable f16c for -mno-avx in
common/config/i386/i386-common.c in this way, and it looks that driver-i386.c
was not updated accordingly.

There are no real processors with F16C and no AVX, but we should be consistent
here and follow i386-common.c.

[Bug driver/60233] AVX instructions emitted with -march=native on host without AVX support

2014-02-17 Thread ubizjak at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60233

--- Comment #6 from Uroš Bizjak ubizjak at gmail dot com ---
And while looking at driver-i386.c, it looks to me that the whole osxsave state
check should be moved below (ext_level  0x8000) processing, otherwise we
won't clear FMA4 and XOP flags correctly.

[Bug tree-optimization/60229] wrong code at -O2 and -O3 on x86_64-linux-gnu

2014-02-17 Thread mikpelinux at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60229

--- Comment #3 from Mikael Pettersson mikpelinux at gmail dot com ---
Technically there is an overflow there.  But GCC defines conversion to a
smaller signed integer type, when the value cannot be represented in that
smaller type, as a non-signalling truncation.  Still, portable code mustn't
rely on that.

[Bug fortran/60231] [4.8/4.9 Regression] ICE on undefined generic

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60231

janus at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||ice-on-invalid-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||janus at gcc dot gnu.org
Summary|ICE on undefined generic|[4.8/4.9 Regression] ICE on
   ||undefined generic
 Ever confirmed|0   |1

--- Comment #1 from janus at gcc dot gnu.org ---
Confirmed. The ICE occurs with 4.8 and trunk, but 4.7 gives the following:

c0.f90:7.19:

  generic :: Add = Add1, Add2
   1
Error: 'add1' and 'add2' for GENERIC 'add' at (1) are ambiguous
c0.f90:5.12:

   procedure :: Add1
1
Error: 'add1' must be a module procedure or an external procedure with an
explicit interface at (1)
c0.f90:6.12:

   procedure :: Add2   
1
Error: 'add2' must be a module procedure or an external procedure with an
explicit interface at (1)


About the first error one can argue, but the second and third one are certainly
correct. Thus the ICE is a regresssion.

[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable

2014-02-17 Thread chengniansun at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029

--- Comment #7 from Chengnian Sun chengniansun at gmail dot com ---
Thanks, Jakub and Mikael. 

I see it now. IMHO, it might be worthy to add a flag -Wunused-const-variable
similar to Clang, which is not included either -Wall or -Wextra. Therefore the
end user can decide whether to enable this warning based on their specific
scenario. I think it is better than the current case that people who need this
warning support cannot get it.

[Bug c/13029] [3.4 Regression] static consts and -Wunused-variable

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13029

--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org ---
(In reply to Chengnian Sun from comment #7)
 Thanks, Jakub and Mikael. 
 
 I see it now. IMHO, it might be worthy to add a flag -Wunused-const-variable
 similar to Clang, which is not included either -Wall or -Wextra. Therefore
 the end user can decide whether to enable this warning based on their
 specific scenario. I think it is better than the current case that people
 who need this warning support cannot get it.

Yeah, I guess that is a possibility.

[Bug fortran/60231] [4.8/4.9 Regression] ICE on undefined generic

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60231

janus at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |janus at gcc dot gnu.org

--- Comment #2 from janus at gcc dot gnu.org ---
This draft patch fixes the ICE:


Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c(revision 207804)
+++ gcc/fortran/resolve.c(working copy)
@@ -11362,6 +11362,7 @@ check_generic_tbp_ambiguity (gfc_tbp_generic* t1,
 {
   gfc_symbol *sym1, *sym2;
   const char *pass1, *pass2;
+  gfc_formal_arglist *dummy_args;

   gcc_assert (t1-specific  t2-specific);
   gcc_assert (!t1-specific-is_generic);
@@ -11384,19 +11385,33 @@ check_generic_tbp_ambiguity (gfc_tbp_generic* t1,
   return false;
 }

-  /* Compare the interfaces.  */
+  /* Determine PASS arguments.  */
   if (t1-specific-nopass)
 pass1 = NULL;
   else if (t1-specific-pass_arg)
 pass1 = t1-specific-pass_arg;
   else
-pass1 = gfc_sym_get_dummy_args
(t1-specific-u.specific-n.sym)-sym-name;
+{
+  dummy_args = gfc_sym_get_dummy_args (t1-specific-u.specific-n.sym);
+  if (dummy_args)
+pass1 = dummy_args-sym-name;
+  else
+pass1 = NULL;
+}
   if (t2-specific-nopass)
 pass2 = NULL;
   else if (t2-specific-pass_arg)
 pass2 = t2-specific-pass_arg;
   else
-pass2 = gfc_sym_get_dummy_args
(t2-specific-u.specific-n.sym)-sym-name;
+{
+  dummy_args = gfc_sym_get_dummy_args (t2-specific-u.specific-n.sym);
+  if (dummy_args)
+pass2 = dummy_args-sym-name;
+  else
+pass2 = NULL;
+}
+
+  /* Compare the interfaces.  */
   if (gfc_compare_interfaces (sym1, sym2, sym2-name, !t1-is_operator, 0,
   NULL, 0, pass1, pass2))
 {

[Bug tree-optimization/60236] New: gfortran.dg/vect/pr32380.f fails on ARM

2014-02-17 Thread bernd.edlinger at hotmail dot de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60236

Bug ID: 60236
   Summary: gfortran.dg/vect/pr32380.f fails on ARM
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bernd.edlinger at hotmail dot de

Hi,

this test case fails, because the only 5 of 6 loops get optimized:

pr32380.f:162:0: note: function is not vectorizable.
pr32380.f:162:0: note: not vectorized: relevant stmt not supported: _113 =
__builtin_sqrtf (_112);

pr32380.f:5:0: note: vectorized 5 loops in function.

but the test case expects 6 loops here.


gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/ed/gnu/arm-linux-gnueabihf/libexec/gcc/armv7l-unknown-linux-gnueabihf/4.9.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-4.9-20140209/configure
--prefix=/home/ed/gnu/arm-linux-gnueabihf
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go --with-arch=armv7-a
--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard
Thread model: posix
gcc version 4.9.0 20140209 (experimental) (GCC)

[Bug c++/60215] [4.9 Regression] ICE with invalid bit-field size

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60215

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||jakub at gcc dot gnu.org,
   ||paolo at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Started with r205449.

[Bug fortran/60232] [OOP] The rank of the element in the structure constructor does not match that of the component

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60232

janus at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||rejects-valid
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||janus at gcc dot gnu.org
Summary|OOP False Error: The rank   |[OOP] The rank of the
   |of the element in the   |element in the structure
   |structure constructor   |constructor does not match
   ||that of the component
 Ever confirmed|0   |1

--- Comment #1 from janus at gcc dot gnu.org ---
Reduced test case:


module ObjectLists
  implicit none

  Type TObjectList
  contains
procedure :: ArrayItem
  end Type

contains

  function ArrayItem(L) result(P)
Class(TObjectList) :: L
Class(TObjectList), pointer :: P(:)
  end function

end module


  use ObjectLists
  implicit none

  Type, extends(TObjectList):: TSampleList
  end Type

contains

  subroutine TSampleList_ConfidVal(L)
Class(TSampleList) :: L
  end subroutine

end


Same error with 4.7, 4.8 and trunk. (In 4.6 and earlier, polymorphic arrays are
not supported yet.)

[Bug driver/60233] AVX instructions emitted with -march=native on host without AVX support

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60233

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 32151
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32151action=edit
gcc49-pr60233.patch

Untested fix.

[Bug tree-optimization/60229] wrong code at -O2 and -O3 on x86_64-linux-gnu

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60229

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org ---
Well, the conversion is implementation-defined behavior, and GCC documents what
it does in that case (does it?) and thus you can rely on it, and given that
other compilers also have simimilar implementation-defined behavior choice for
that case, you can portably assume it unless you are targetting extinct
architectures.

[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-17 Thread joey.ye at arm dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #7 from Joey Ye joey.ye at arm dot com ---
(In reply to Richard Biener from comment #5)
 (In reply to Joey Ye from comment #4)
  -fdisable-tree-forwprop4 doesn't help. -fno-tree-ter makes it even worse.
 
 The former is strange because it's the only pass that does sth that is
 changed by the patch?  As said, make sure to include the fix for PR59993
 in your testing.
 
 Does -fno-tree-forwprop fix the regression?

I'm sorry what I meant was: -fdisable-tree-forwprop4 didn't make benchmark
faster. Actually with -fdisable-tree-forwprop4 both revision before/after
207239 get the same lower score.

207239 O2: low
207238 O2: high
207239 O2 -fdisable-tree-forwprop4: low
207238 O2 -fdisable-tree-forwprop4: low

[Bug tree-optimization/60206] IVOPT has no idea of inline asm

2014-02-17 Thread rguenther at suse dot de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206

--- Comment #5 from rguenther at suse dot de rguenther at suse dot de ---
On Fri, 14 Feb 2014, wmi at google dot com wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206
 
 Bug ID: 60206
Summary: IVOPT has no idea of inline asm
Product: gcc
Version: 4.9.0
 Status: UNCONFIRMED
   Severity: normal
   Priority: P3
  Component: tree-optimization
   Assignee: unassigned at gcc dot gnu.org
   Reporter: wmi at google dot com
 CC: rguenth at gcc dot gnu.org, shenhan at google dot com
   Host: i386
 Target: i386
 
 Created attachment 32141
   -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32141action=edit
 Testcase
 
 This bug is found in google branch but I think the same problem also exists on
 trunk (but not exposed).
 
 For the testcase 1.c attached (1.c is extracted from libgcc/soft-fp/divtf3.c),
 use trunk compiler gcc-r202164 (Target: x86_64-unknown-linux-gnu) + the patch
 r204497 could expose the problem.
 
 The command:
 gcc -v -O2 -fno-omit-frame-pointer -fpic -c -S -m32 1.c
 
 The error:
 ./1.c: In function ‘__divtf3’:
 ./1.c:64:1194: error: ‘asm’ operand has impossible constraints
 
 The inline asm in error message is as follow:
 do {
  __asm__ (
 sub{l} {%11,%3|%3,%11}\n\t
 sbb{l} {%9,%2|%2,%9}\n\t
 sbb{l} {%7,%1|%1,%7}\n\t
 sbb{l} {%5,%0|%0,%5}
 : =r ((USItype) (A_f[3])), =r ((USItype) (A_f[2])), =r ((USItype)
 (A_f[1])), =r ((USItype) (A_f[0])) : 0 ((USItype) (B_f[2])), g
 ((USItype) (A_f[2])), 1 ((USItype) (B_f[1])), g ((USItype) (A_f[1])), 2
 ((USItype) (B_f[0])), g ((USItype) (A_f[0])), 3 ((USItype) (0)), g
 ((USItype) (_n_f[_i])));
 } while ()
 
 Because -fno-omit-frame-pointer is turned on and the command line uses -fpic,
 there are only 5 registers for register allocation.
 
 Before IVOPT,
 %0, %1, %2, %3 require 4 registers. The index variable i of _n_f[_i] requires
 another register. So 5 registers are used up here.
 
 After IVOPT, MEM reference _n_f[_i] is converted to MEM[base: _874, index:
 ivtmp.22_821, offset: 0B]. base and index require 2 registers, Now 6 registers
 are required, so LRA cannot find enough registers to allocate.
 
 trunk compiler doesn't expose the problem because of patch r202165. With patch
 r202165, IVOPT doesn't change _n_f[_i] in inline asm above. But it just hided
 the problem.
 
 Should IVOPT care about the constraints in inline-asm and restrict its
 optimization in some case?

It's true that ASMs are not in any way special cased - it may be worth
trying if distinguishing address-uses and other uses may be worth it.
It's only a cost thing, of course.  In general find_interesting_uses_stmt
may need some modernization.

[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-17 Thread joey.ye at arm dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #8 from Joey Ye joey.ye at arm dot com ---
Here is tree dump and diff of 133t.forwprop4
  bb 2:
  Int_Index_4 = Int_1_Par_Val_3(D) + 5;
  Int_Loc.0_5 = (unsigned int) Int_Index_4;
  _6 = Int_Loc.0_5 * 4;
  _8 = Arr_1_Par_Ref_7(D) + _6;
  *_8 = Int_2_Par_Val_10(D);
  _13 = _6 + 4;
  _14 = Arr_1_Par_Ref_7(D) + _13;
  *_14 = Int_2_Par_Val_10(D);
  _17 = _6 + 60;
  _18 = Arr_1_Par_Ref_7(D) + _17;
  *_18 = Int_Index_4;
  pretmp_20 = Int_Loc.0_5 * 100;
  pretmp_2 = Arr_2_Par_Ref_22(D) + pretmp_20;
  _42 = (sizetype) Int_1_Par_Val_3(D);
  _41 = _42 * 4;
-  _40 = pretmp_2 + _41; // good
+  _12 = _41 + pretmp_20; // bad
+  _40 = Arr_2_Par_Ref_22(D) + _12;  // bad
  MEM[(int[25] *)_40 + 20B] = Int_Index_4;
  MEM[(int[25] *)_40 + 24B] = Int_Index_4;
  _29 = MEM[(int[25] *)_40 + 16B];
  _30 = _29 + 1;
  MEM[(int[25] *)_40 + 16B] = _30;
  _32 = pretmp_20 + 1000;
  _33 = Arr_2_Par_Ref_22(D) + _32;
  _34 = *_8;
-  _51 = _33 + _41;  // good
+  _16 = _41 + _32;  // bad
+  _51 = Arr_2_Par_Ref_22(D) + _16;  // bad

  MEM[(int[25] *)_51 + 20B] = _34;
  Int_Glob = 5;
  return;

[Bug c++/60222] [4.8/4.9 Regression] ICE with reference as template parameter

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60222

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||jakub at gcc dot gnu.org
Version|4.9.0   |4.8.3
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Indeed, started with r207167.

[Bug rtl-optimization/49847] [4.7/4.8/4.9 Regression] NULL deref in fold_rtx (prev_insn_cc0 == NULL)

2014-02-17 Thread rguenther at suse dot de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847

--- Comment #33 from rguenther at suse dot de rguenther at suse dot de ---
On Sun, 16 Feb 2014, law at redhat dot com wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847
 
 --- Comment #32 from Jeffrey A. Law law at redhat dot com ---
 The problem we're seeing is with the cc0-setter and cc0-user in different
 blocks, they're separated by a NOTE_BASIC_BLOCK.  
 
 That causes CSE to blow up because it expects that the cc0-setter and 
 cc0-user are always consecutive.  While we're just seeing the failure in 
 CSE right now, I'm sure there's a ton of places that assume the 
 setter/user are inseparable as that has been the documented form for ~20 
 years.
 
 From rtl.texi:
 
  The instruction setting the
 condition code must be adjacent to the instruction using the condition
 code; only @code{note} insns may separate them.
 
 
 We either need to relax that and audit all the HAVE_cc0 code to ensure it
 doesn't make that assumption, or we need to somehow restore the property that
 the setter and user are inseparable.

I think relaxing this constraint and allowing the cc0-setter and
cc0-user be separated by a fallthru-edge should be allowed
(and make sure that bb-reorder later doesn't separate the BBs)

[Bug c++/60219] [4.8/4.9 Regression] [c++11] ICE invalid use of variadic template

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60219

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org
Version|4.9.0   |4.8.3
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Started likely with r190653 (works with r190650, fails with r190662,
coerce_template_parms+resolve_nondeduced_context in backtrace).

[Bug tree-optimization/60172] ARM performance regression from trunk@207239

2014-02-17 Thread rguenther at suse dot de

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172

--- Comment #9 from rguenther at suse dot de rguenther at suse dot de ---
On Mon, 17 Feb 2014, joey.ye at arm dot com wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172
 
 --- Comment #8 from Joey Ye joey.ye at arm dot com ---
 Here is tree dump and diff of 133t.forwprop4
   bb 2:
   Int_Index_4 = Int_1_Par_Val_3(D) + 5;
   Int_Loc.0_5 = (unsigned int) Int_Index_4;
   _6 = Int_Loc.0_5 * 4;
   _8 = Arr_1_Par_Ref_7(D) + _6;
   *_8 = Int_2_Par_Val_10(D);
   _13 = _6 + 4;
   _14 = Arr_1_Par_Ref_7(D) + _13;
   *_14 = Int_2_Par_Val_10(D);
   _17 = _6 + 60;
   _18 = Arr_1_Par_Ref_7(D) + _17;
   *_18 = Int_Index_4;
   pretmp_20 = Int_Loc.0_5 * 100;
   pretmp_2 = Arr_2_Par_Ref_22(D) + pretmp_20;
   _42 = (sizetype) Int_1_Par_Val_3(D);
   _41 = _42 * 4;
 -  _40 = pretmp_2 + _41; // good
 +  _12 = _41 + pretmp_20; // bad
 +  _40 = Arr_2_Par_Ref_22(D) + _12;  // bad
   MEM[(int[25] *)_40 + 20B] = Int_Index_4;
   MEM[(int[25] *)_40 + 24B] = Int_Index_4;
   _29 = MEM[(int[25] *)_40 + 16B];
   _30 = _29 + 1;
   MEM[(int[25] *)_40 + 16B] = _30;
   _32 = pretmp_20 + 1000;
   _33 = Arr_2_Par_Ref_22(D) + _32;
   _34 = *_8;
 -  _51 = _33 + _41;  // good
 +  _16 = _41 + _32;  // bad
 +  _51 = Arr_2_Par_Ref_22(D) + _16;  // bad
 
   MEM[(int[25] *)_51 + 20B] = _34;
   Int_Glob = 5;
   return;

But that doesn't make sense - it means that -fdisable-tree-forwprop4
should get numbers back to good speed, no?  Because that's the
only change forwprop4 does.

For completeness please base checks on r207316 (it contains a fix
for the blamed revision, but as far as I can see it shouldn't make
a difference for the testcase).

Did you check whether my hackish patch fixes things?

[Bug tree-optimization/54742] Switch elimination in FSM loop

2014-02-17 Thread joey.ye at arm dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742

--- Comment #35 from Joey Ye joey.ye at arm dot com ---
Here is good expansion:
;; _41 = _42 * 4;

(insn 20 19 0 (set (reg:SI 126 [ D.5038 ])
(ashift:SI (reg/v:SI 131 [ Int_1_Par_Val ])
(const_int 2 [0x2]))) -1
 (nil))

;; _40 = _2 + _41;

(insn 21 20 22 (set (reg:SI 136 [ D.5035 ])
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) -1
 (nil))

(insn 22 21 0 (set (reg/f:SI 125 [ D.5035 ])
(plus:SI (reg:SI 136 [ D.5035 ])
(reg:SI 126 [ D.5038 ]))) -1
 (nil))


;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 29 28 30 (set (reg:SI 139)
(plus:SI (reg/v/f:SI 130 [ Arr_2_Par_Ref ])
(reg:SI 119 [ D.5036 ]))) Proc_8.c:23 -1
 (nil))

(insn 30 29 31 (set (reg:SI 140)
(plus:SI (reg:SI 139)
(reg:SI 126 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 31 30 32 (set (reg/f:SI 141)
(plus:SI (reg:SI 140)
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

(insn 32 31 0 (set (mem:SI (plus:SI (reg/f:SI 141)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 124 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

After cse1 140 can be replaced by 125, thus lead a series of transformation
make it much more efficient.

Here is bad expansion:
;; _40 = Arr_2_Par_Ref_22(D) + _12;

(insn 22 21 23 (set (reg:SI 138 [ D.5038 ])
(plus:SI (reg:SI 128 [ D.5038 ])
(reg:SI 121 [ D.5036 ]))) -1
 (nil))

(insn 23 22 0 (set (reg/f:SI 127 [ D.5035 ])
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 138 [ D.5038 ]))) -1
 (nil))

;; _32 = _20 + 1000;

(insn 29 28 0 (set (reg:SI 124 [ D.5038 ])
(plus:SI (reg:SI 121 [ D.5036 ])
(const_int 1000 [0x3e8]))) Proc_8.c:23 -1
 (nil))

;; MEM[(int[25] *)_51 + 20B] = _34;

(insn 32 31 33 (set (reg:SI 141)
(plus:SI (reg/v/f:SI 132 [ Arr_2_Par_Ref ])
(reg:SI 124 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 33 32 34 (set (reg/f:SI 142)
(plus:SI (reg:SI 141)
(reg:SI 128 [ D.5038 ]))) Proc_8.c:23 -1
 (nil))

(insn 34 33 0 (set (mem:SI (plus:SI (reg/f:SI 142)
(const_int 20 [0x14])) [2 MEM[(int[25] *)_51 + 20B]+0 S4 A32])
(reg:SI 126 [ D.5039 ])) Proc_8.c:23 -1
 (nil))

Here cse doesn't happen, resulting in less optimal insns. Reason why cse
doesn't happen is unclear yet.

[Bug c++/60216] [4.8/4.9 Regression] [c++11] Trouble with deleted template functions

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60216

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org
Version|4.9.0   |4.8.3
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Started with r198098 (or r198099, but that seems unrelated, r198096 works,
r198100 fails).

[Bug middle-end/59448] Code generation doesn't respect C11 address-dependency

2014-02-17 Thread algrant at acm dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

--- Comment #11 from algrant at acm dot org ---
Where do you get that this is racy if the access to data is not atomic?  By
design, release/acquire and release/consume sequences don't require wholesale
changes to the way the data payload (in the general case, multiple fields
within a structure) is first constructed and then used.  1.10#13 makes clear
that as a result of the intra-thread sequencing between atomic and non-atomic
operations (1.9#14), and the inter-thread ordering between atomic operations
(1.10 various), there is a resulting ordering on operations to ordinary (sic)
objects.  Please see the references to the C++ standard in the source example,
for the chain of reasoning here.

[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234

janus at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||janus at gcc dot gnu.org
Summary|OOP internal compiler   |[4.9 Regression] [OOP] ICE
   |error: in   |in
   |generate_finalization_wrapp |generate_finalization_wrapp
   |er  |er at fortran/class.c:1883
 Ever confirmed|0   |1

--- Comment #1 from janus at gcc dot gnu.org ---
Reduced test case:


module ObjectLists
implicit none

Type TObjectList
contains
  FINAL :: finalize
end Type

Type, extends(TObjectList):: TRealCompareList
end Type

contains

  subroutine finalize(L)
Type(TObjectList) :: L
  end subroutine


  integer function CompareReal(this)
Class(TRealCompareList) :: this
  end function

end module


4.8 rejects it cleanly ('not yet implemented'), so the ICE is a regression.

[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234

--- Comment #2 from janus at gcc dot gnu.org ---
This patchlet seems to be sufficient to fix the ICE:

Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c(revision 207804)
+++ gcc/fortran/decl.c(working copy)
@@ -1199,7 +1199,7 @@ build_sym (const char *name, gfc_charlen *cl, bool
   sym-attr.implied_index = 0;

   if (sym-ts.type == BT_CLASS)
-return gfc_build_class_symbol (sym-ts, sym-attr, sym-as, false);
+return gfc_build_class_symbol (sym-ts, sym-attr, sym-as, true);

   return true;
 }


Comment 1 compiles fine with this, but comment 0 hits another ICE:


ObjectLists.f90:186:0: internal compiler error: Segmentation fault
 class is (object_array_pointer)
 ^
0x93e90f crash_signal
/home/jweil/gcc49/trunk/gcc/toplev.c:337
0x672420 gfc_get_derived_type(gfc_symbol*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2455
0x672988 gfc_typenode_for_spec(gfc_typespec*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:1112
0x671263 gfc_sym_type(gfc_symbol*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2137
0x671728 gfc_get_function_type(gfc_symbol*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2797
0x6721ca gfc_get_ppc_type(gfc_component*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2322
0x6726a7 gfc_get_derived_type(gfc_symbol*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2484
0x672988 gfc_typenode_for_spec(gfc_typespec*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:1112
0x671263 gfc_sym_type(gfc_symbol*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2137
0x637b96 gfc_get_symbol_decl(gfc_symbol*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-decl.c:1390
0x639f99 gfc_create_module_variable
/home/jweil/gcc49/trunk/gcc/fortran/trans-decl.c:4267
0x607453 do_traverse_symtree
/home/jweil/gcc49/trunk/gcc/fortran/symbol.c:3575
0x63ae12 gfc_generate_module_vars(gfc_namespace*)
/home/jweil/gcc49/trunk/gcc/fortran/trans-decl.c:4693
0x61cef1 gfc_generate_module_code(gfc_namespace*)
/home/jweil/gcc49/trunk/gcc/fortran/trans.c:1930
0x5db92b translate_all_program_units
/home/jweil/gcc49/trunk/gcc/fortran/parse.c:4523
0x5db92b gfc_parse_file()
/home/jweil/gcc49/trunk/gcc/fortran/parse.c:4733
0x618335 gfc_be_parse_file
/home/jweil/gcc49/trunk/gcc/fortran/f95-lang.c:188

[Bug c++/60146] [4.8/4.9 Regression] ICE when compiling this code with -fopenmp

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
Started with r188939.

[Bug c++/60237] New: isnan fails with -ffast-math

2014-02-17 Thread nathanael.schaeffer at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237

Bug ID: 60237
   Summary: isnan fails with -ffast-math
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nathanael.schaeffer at gmail dot com

With -ffast-math, isnan should return true if passed a NaN value.
Otherwise, how is isnan different than (x!=x) ?

isnan worked as expected with gcc 4.7, but does not with 4.8.1 and 4.8.2

How can I check if x is a NaN in a portable way (not presuming any compilation
option) ?

[Bug c++/60237] isnan fails with -ffast-math

2014-02-17 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org ---
Well, -ffast-math implies -ffinite-math-only, so the compiler is assuming no
NaNs or infinites are used as arguments/return values of any expression.  So,
if you have a program that produces NaNs anyway, you shouldn't be building it
with -ffast-math, at least not with -ffinite-math-only.

[Bug c++/60215] [4.9 Regression] ICE with invalid bit-field size

2014-02-17 Thread paolo.carlini at oracle dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60215

Paolo Carlini paolo.carlini at oracle dot com changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #2 from Paolo Carlini paolo.carlini at oracle dot com ---
Evidently, in case of error recovery we can get here:

9672 case COMPONENT_REF:
9673   if (is_overloaded_fn (t))
9674 {
9675   /* We can only get here in checking mode via 
9676  build_non_dependent_expr,  because any expression that
9677  calls or takes the address of the function will have
9678  pulled a FUNCTION_DECL out of the COMPONENT_REF.  */
9679   gcc_checking_assert (allow_non_constant);
9680   *non_constant_p = true;
9681   return t;
9682 }

with allow_non_constant == false. Jason suggested the comment (and the assert
;) as part of the fix for 58647, thus I would like to hear from him... Shall we
maybe || errorcount ? Seems safe for 4.9.0.

[Bug fortran/60238] New: Allow colon-separated triplet in array initialization

2014-02-17 Thread antony at cosmologist dot info

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60238

Bug ID: 60238
   Summary: Allow colon-separated triplet in array initialization
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: antony at cosmologist dot info

Not really a bug, but ifort (and also going back, CVF) allow a clean array
initialization sytnax like this

integer :: indices(3)
indices=[3:5] 


as an alternative to the ugly

indices = (/ (I, I=3, 5) /)

Supporting it would allow easier compiler interoperability.

[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234

--- Comment #3 from janus at gcc dot gnu.org ---
(In reply to janus from comment #2)
 Comment 1 compiles fine with this, but comment 0 hits another ICE:
 
 ObjectLists.f90:186:0: internal compiler error: Segmentation fault
  class is (object_array_pointer)
  ^
 0x93e90f crash_signal
   /home/jweil/gcc49/trunk/gcc/toplev.c:337
 0x672420 gfc_get_derived_type(gfc_symbol*)
   /home/jweil/gcc49/trunk/gcc/fortran/trans-types.c:2455


A reduced test case for this ICE is:

  integer function Compare(R1)
class(*) R1
  end function


But it seems to be due to the patch in comment 2 and does not occur without it.

[Bug tree-optimization/60183] [4.7/4.8 Regression] phiprop creates invalid code

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60183

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||4.9.0
Summary|[4.7/4.8/4.9 Regression]|[4.7/4.8 Regression]
   |phiprop creates invalid |phiprop creates invalid
   |code|code

--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org ---
Fixed on trunk sofar.

[Bug c++/60237] isnan fails with -ffast-math

2014-02-17 Thread nathanael.schaeffer at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237

--- Comment #2 from N Schaeffer nathanael.schaeffer at gmail dot com ---
Thank you for your answer.

My program (which is a computational fluid dynamics solver) is not supposed to
produce NaNs. However, when it does (which means something went wrong), I would
like to abort the program and return an error instead of continuing crunching
NaNs.
I also want it to run as fast as possible (hence the -ffast-math option).

I would argue that: if printf(%f,x) outputs NaN, isnan(x) should also be
returning true.

Do you have a suggestion concerning my last question:
How can I check if x is NaN in a portable way (not presuming any compilation
option) ?

[Bug c++/60237] isnan fails with -ffast-math

2014-02-17 Thread glisse at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237

--- Comment #3 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to N Schaeffer from comment #2)
 Do you have a suggestion concerning my last question:
 How can I check if x is NaN in a portable way (not presuming any compilation
 option) ?

This should bypass software optimizations. But if the hardware is put in a mode
that does strange things with NaN, it will be harder to work around.

int my_isnan(double x){
  volatile double y=x;
  return y!=y;
}

[Bug fortran/60238] Allow colon-separated triplet in array initialization

2014-02-17 Thread dominiq at lps dot ens.fr

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60238

--- Comment #1 from Dominique d'Humieres dominiq at lps dot ens.fr ---
 as an alternative to the ugly

 indices = (/ (I, I=3, 5) /)

You can use

indices=[(I, I=3, 5)]

if your coding style accepts f2003 syntax.

 Supporting it would allow easier compiler interoperability.

The only way to achieve that is to stick to the Fortran standard, i.e, never
use extensions of any kind.

[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234

janus at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |janus at gcc dot gnu.org

--- Comment #4 from janus at gcc dot gnu.org ---
The test case in comment 0 compiles cleanly when adding the following to the
patch in comment 2:

Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c(revision 207804)
+++ gcc/fortran/class.c(working copy)
@@ -637,9 +637,10 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_a
   if (!gfc_add_component (fclass, _vptr, c))
 return false;
   c-ts.type = BT_DERIVED;
-  if (delayed_vtab
-  || (ts-u.derived-f2k_derived
-   ts-u.derived-f2k_derived-finalizers))
+  if ((delayed_vtab
+   || (ts-u.derived-f2k_derived
+ts-u.derived-f2k_derived-finalizers))
+   !ts-u.derived-attr.unlimited_polymorphic)
 c-ts.u.derived = NULL;
   else
 {

[Bug c++/60237] isnan fails with -ffast-math

2014-02-17 Thread nathanael.schaeffer at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237

--- Comment #4 from N Schaeffer nathanael.schaeffer at gmail dot com ---
int my_isnan(double x){
  volatile double y=x;
  return y!=y;
}

is translated to:
   0x00406cf0 +0: movsd  QWORD PTR [rsp-0x8],xmm0
   0x00406cf6 +6: xoreax,eax
   0x00406cf8 +8: movsd  xmm1,QWORD PTR [rsp-0x8]
   0x00406cfe +14:movsd  xmm0,QWORD PTR [rsp-0x8]
   0x00406d04 +20:comisd xmm1,xmm0
   0x00406d08 +24:setne  al
   0x00406d0b +27:ret

which also fails to detect NaN, which is right according to the documented
behaviour of comisd:
http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc44.htm

[Bug c++/60239] New: False positive maybe-uninitialized in for loop

2014-02-17 Thread lcid-fire at gmx dot net

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60239

Bug ID: 60239
   Summary: False positive maybe-uninitialized in for loop
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lcid-fire at gmx dot net

The code in
https://github.com/RobertBeckebans/RBDOOM-3-BFG/blob/dd9b8a8710dd7f8c1376eb245ee31fc740eae6eb/neo/renderer/tr_backend_rendertools.cpp
triggers a false positive maybe-uninitialized warning.

The code in question begins at line 1971:

static void RB_DrawText( const char* text, const idVec3 origin, float
scale, const idVec4 color, const idMat3 viewAxis, const int align )
{

// snip/

idVec3 org, p1, p2;

// snip/

for( i = 0; i  len; i++ )
{
  if( i == 0 || text[i] == '\n' )
  {
org = origin - viewAxis[2] * ( line * 36.0f * scale );
// snip/
  }

  org -= viewAxis[1] * ( spacing * scale );
}


The error message is:
idlib/../idlib/math/Vector.h: In function 'void RB_DrawText(const char*, const
idVec3, float, const idVec4, const idMat3, int)':
idlib/../idlib/math/Vector.h:567:10: error: 'org.idVec3::x' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
/home/andreas/Projects/bfg/neo/renderer/tr_backend_rendertools.cpp:1971:9:
note: 'org.idVec3::x' was declared here
  idVec3 org, p1, p2;


I tried to create a simple version that triggers that false positive but
everything I tried analyzes the code correctly.

[Bug other/60240] New: libbacktrace problems with nested functions

2014-02-17 Thread johannespfau at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60240

Bug ID: 60240
   Summary: libbacktrace problems with nested functions
   Product: gcc
   Version: 4.8.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: johannespfau at gmail dot com

Created attachment 32152
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32152action=edit
test case to reproduce the bug

Compile the test case with -lbacktrace -g.

Actual output:
test.c:17 (null)

Expected output: The backtrace should contain the function name ('a') instead
of null.

AFAICS the problem is in read_function_entry. There's a abbrev-has_children
check that assumes all children of a function are inlined instances of the same
function. This is not true, children can also be nested functions. libbacktrace
should check the DW_AT_inline tag here.

[Bug libffi/60073] [4.9 regression] 64-bit libffi.call/cls_double_va.c FAILs after recent modification

2014-02-17 Thread ebotcazou at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60073

--- Comment #11 from Eric Botcazou ebotcazou at gcc dot gnu.org ---
Author: ebotcazou
Date: Mon Feb 17 12:00:04 2014
New Revision: 207822

URL: http://gcc.gnu.org/viewcvs?rev=207822root=gccview=rev
Log:
PR libffi/60073
* src/sparc/v8.S: Assemble only if !SPARC64.
* src/sparc/v9.S: Remove obsolete comment.
* src/sparc/ffitarget.h (enum ffi_abi): Add FFI_COMPAT_V9.
(V8_ABI_P): New macro.
(V9_ABI_P): Likewise.
(FFI_EXTRA_CIF_FIELDS): Define only if SPARC64.
* src/sparc/ffi.c (ffi_prep_args_v8): Compile only if !SPARC64.
(ffi_prep_args_v9): Compile only if SPARC64.
(ffi_prep_cif_machdep_core): Use V9_ABI_P predicate.
(ffi_prep_cif_machdep): Guard access to nfixedargs field.
(ffi_prep_cif_machdep_var): Likewise.
(ffi_v9_layout_struct): Compile only if SPARC64.
(ffi_call): Deal with FFI_V8PLUS and FFI_COMPAT_V9 and fix warnings.
(ffi_prep_closure_loc): Use V9_ABI_P and V8_ABI_P predicates.
(ffi_closure_sparc_inner_v8): Compile only if !SPARC64.
(ffi_closure_sparc_inner_v9): Compile only if SPARC64.  Guard access
to nfixedargs field.

Modified:
trunk/libffi/ChangeLog
trunk/libffi/src/sparc/ffi.c
trunk/libffi/src/sparc/ffitarget.h
trunk/libffi/src/sparc/v8.S
trunk/libffi/src/sparc/v9.S

[Bug middle-end/25140] aliases, including weakref, break alias analysis

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25140

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 CC||johannespfau at gmail dot com

--- Comment #12 from Richard Biener rguenth at gcc dot gnu.org ---
*** Bug 60214 has been marked as a duplicate of this bug. ***

[Bug middle-end/60214] Variables with same DECL_ASSEMBLER_NAME are treated as different variables

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60214

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
Yes, that's a know deficiency in alias-analysis.

*** This bug has been marked as a duplicate of bug 25140 ***

[Bug fortran/60234] [4.9 Regression] [OOP] ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60234

--- Comment #5 from janus at gcc dot gnu.org ---
(In reply to janus from comment #4)
 The test case in comment 0 compiles cleanly when adding the following to the
 patch in comment 2:

Unfortunately the combination fails on proc_ptr_comp_37 in the testsuite.

[Bug c++/60216] [4.8/4.9 Regression] [c++11] Trouble with deleted template functions

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60216

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug c++/60219] [4.8/4.9 Regression] [c++11] ICE invalid use of variadic template

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60219

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug fortran/60231] [4.8/4.9 Regression] ICE on undefined generic

2014-02-17 Thread janus at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60231

--- Comment #3 from janus at gcc dot gnu.org ---
(In reply to janus from comment #2)
 This draft patch fixes the ICE:

... and regtests cleanly.

[Bug c/60220] Vectorization : simple loop : fails to vectorize

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60220

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
It's vectorized with -fno-tree-loop-distribute-patterns since at least GCC 4.7
(the oldest still maintained release).

[Bug middle-end/60221] [4.7/4.8/4.9 Regression] gcc -fexceptions generates unnecessary cleanup code

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60221

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||EH
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-17
 CC||hubicka at gcc dot gnu.org,
   ||matz at gcc dot gnu.org
   Target Milestone|--- |4.7.4
Summary|gcc -fexceptions generates  |[4.7/4.8/4.9 Regression]
   |unnecessary cleanup code|gcc -fexceptions generates
   ||unnecessary cleanup code
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
Confirmed.

[Bug c++/60227] [4.7/4.8/4.9 Regression] [C++11] ICE using brace-enclosed initializer list to initialize array

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60227

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |4.7.4

[Bug c++/60224] [4.7/4.8/4.9 Regression] ICE using invalid initializer for array

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60224

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |4.7.4

[Bug c++/60225] [4.9 Regression] [c++11] ICE initializing constexpr array

2014-02-17 Thread rguenth at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60225

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
   Target Milestone|--- |4.9.0

1 2 3 >

1 - 100 of 236 matches

Mail list logo