date:20140228

Re: copyright dates in binutils (and includes/)

2014-02-28 Thread Alan Modra

On Thu, Feb 27, 2014 at 06:47:17PM +, Joseph S. Myers wrote:
 On Thu, 27 Feb 2014, Joel Brobecker wrote:
 
  I should mention, however, that for us to use ranges like this,
  the FSF asked us to add a note explaining that the copyright years
  could be abbreviated into a range. See gdb/README (at the end).
  I suspect that you'll need the same note for binutils.

Thanks Joel.  I'll copy that or the gcc wording.

 And, where a gap in the years is being implicitly filled in by conversion 
 to a range, make sure that either (a) there was a public version control 
 repository for binutils during that year, or (b) there was a release 
 (including beta releases, Cygnus releases etc., not just official 
 releases) during that year.

It looks like the earliest binutils files that are edited by
update-copyright.py have copyright dates starting at 1985.  Of those,
quite a few have skipped years.  eg. binutils/filemode.c is
Copyright 1985, 1990,...

So, CVS goes back to 1991, and there are copies of old binutils
releases for all years from 1988 to 2002 except for 1999 at
ftp://sourceware.org/pub/binutils/old-releases/

Joseph, do you know why implicitly adding years to the claimed
copyright years is a problem?  I'm guessing the file needs to be
published somewhere for each year claimed.

-- 
Alan Modra
Australia Development Lab, IBM

RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

2014-02-28 Thread Gopalasubramanian, Ganesh

With the locality value received in the instruction pattern, I think it would 
be safe to handle them in prefetch instruction.
This helps especially AArch64 has prefetch instructions that can handle this 
locality.

+(define_insn prefetch
+  [(prefetch (match_operand:DI 0 address_operand r)
+(match_operand:QI 1 const_int_operand n)
+(match_operand:QI 2 const_int_operand n))]
+  
+  *
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+ /* non temporal locality */
+ return (INTVAL(operands[1])) ? \prfm\\tPSTL1STRM, [%0, #0]\ : 
\prfm\\tPLDL1STRM, [%0, #0]\;
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \prfm\\tPSTL%2KEEP, [%0, #0]\ : 
\prfm\\tPLDL%2KEEP, [%0, #0]\;
+}
+  [(set_attr type prefetch)]
+)
+

I also have attached a patch that implements
*   Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). 
Added a predicate for this.
*   Prefetch with immediate offset - in the range -256 to 255 (Gets 
generated only when we have a negative offset. Generates prfum instruction). 
Added a predicate for this.
*   Prefetch with register offset. (modified for printing the locality)

Regards
Ganesh

-Original Message-
From: Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] 
Sent: Wednesday, February 19, 2014 2:40 AM
To: gcc-patches@gcc.gnu.org
Cc: philipp.toms...@theobroma-systems.com
Subject: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

---
 gcc/config/aarch64/aarch64.md | 17 +
 gcc/config/arm/types.md   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
index 99a6ac8..b972a1b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -293,6 +293,23 @@
   [(set_attr type no_insn)]
 )
 
+(define_insn prefetch
+  [(prefetch (match_operand:DI 0 register_operand r)
+(match_operand:QI 1 const_int_operand n)
+(match_operand:QI 2 const_int_operand n))]
+  
+  *
+{
+  if (INTVAL(operands[2]) == 0)
+ /* no temporal locality */
+ return (INTVAL(operands[1])) ? \prfm\\tPSTL1STRM, [%0, #0]\ : 
+\prfm\\tPLDL1STRM, [%0, #0]\;
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \prfm\\tPSTL1KEEP, [%0, #0]\ : 
+\prfm\\tPLDL1KEEP, [%0, #0]\; }
+  [(set_attr type prefetch)]
+)
+
 (define_insn trap
   [(trap_if (const_int 1) (const_int 8))]
   
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 
cc39cd1..1d1280d 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -117,6 +117,7 @@
 ; mvn_shift_reg  inverting move instruction, shifted operand by a register.
 ; no_insnan insn which does not represent an instruction in the
 ;final output, thus having no impact on scheduling.
+; prefetch  a prefetch instruction
 ; rbit   reverse bits.
 ; revreverse bytes.
 ; sdiv   signed division.
@@ -553,6 +554,7 @@
   call,\
   clz,\
   no_insn,\
+  prefetch,\
   csel,\
   crc,\
   extend,\
--
1.9.0



prefetchdiff.log
Description: prefetchdiff.log

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread Richard Biener

On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests has
 already been fixed by recent patches, I went through and fixed the others.
 One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c.  When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This certainly
 results in warnings about invalid profile information and causes the case to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  bin.ch...@arm.com

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.


 This change:

 if (dest-loop_father-header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + bb-loop_father-header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 + bb-loop_father-header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
   bb-loop_father-header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?

I think the preheader test isn't fully correct (bb may be in an inner loop
for example).  So a more conservative variant would be

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 208169)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
   /* Protect loop preheaders and latches if requested.  */
   if (dest-loop_father-header == dest)
{
- if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
-  bb-loop_father-header != dest)
-   return false;
-
- if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
-  bb-loop_father-header == dest)
-   return false;
+ if (bb-loop_father == dest-loop_father)
+   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
+ else if (bb-loop_father == loop_outer (dest-loop_father))
+   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
+ /* Always preserve other edges into loop headers that are
+not simple latches or preheaders.  */
+ return false;
}
 }

that makes sure we can properly update loop information.  It's also
a more conservative change at this point which should still successfully
remove simple latches and preheaders created by loop discovery.

Does it fix 435.gromacs?

Thanks,
Richard.




 --
 H.J.

RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

2014-02-28 Thread Gopalasubramanian, Ganesh

Avoided top-posting and resending.

+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \prfm\\tPSTL1KEEP, [%0, #0]\ : 
+\prfm\\tPLDL1KEEP, [%0, #0]\; }
+  [(set_attr type prefetch)]
+)
+

With the locality value received in the instruction pattern, I think it would 
be safe to handle them in prefetch instruction.
This helps especially AArch64 has prefetch instructions that can handle this 
locality.

+(define_insn prefetch
+  [(prefetch (match_operand:DI 0 address_operand r)
+(match_operand:QI 1 const_int_operand n)
+(match_operand:QI 2 const_int_operand n))]
+  
+  *
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+ /* non temporal locality */
+ return (INTVAL(operands[1])) ? \prfm\\tPSTL1STRM, [%0, #0]\ : 
\prfm\\tPLDL1STRM, [%0, #0]\;
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \prfm\\tPSTL%2KEEP, [%0, #0]\ : 
\prfm\\tPLDL%2KEEP, [%0, #0]\;
+}
+  [(set_attr type prefetch)]
+)
+

I also have attached a patch that implements the following. 
*   Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). 
Added a predicate for this.
*   Prefetch with immediate offset - in the range -256 to 255 (Gets 
generated only when we have a negative offset. Generates prfum instruction). 
Added a predicate for this.
*   Prefetch with register offset. (modified for printing the locality)

Regards
Ganesh


prefetchdiff.log
Description: prefetchdiff.log

[gomp4 1/2] Initial support for the OpenACC kernels construct: GIMPLE_OACC_KERNELS.

2014-02-28 Thread Thomas Schwinge

From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4

gcc/
* gimple.def (GIMPLE_OACC_KERNELS): New code.
* doc/gimple.texi: Document it.
* gimple.h (gimple_has_substatements, CASE_GIMPLE_OMP)
(is_gimple_omp_oacc_specifically): Handle it.
(gimple_statement_oacc_kernels): New struct.
(gimple_build_oacc_kernels): New prototype.
(gimple_oacc_kernels_clauses, gimple_oacc_kernels_clauses_ptr)
(gimple_oacc_kernels_set_clauses, gimple_oacc_kernels_child_fn)
(gimple_oacc_kernels_child_fn_ptr)
(gimple_oacc_kernels_set_child_fn, gimple_oacc_kernels_data_arg)
(gimple_oacc_kernels_data_arg_ptr)
(gimple_oacc_kernels_set_data_arg): New inline functions.
* gimple.c (gimple_build_oacc_kernels): New function.
(gimple_copy): Handle GIMPLE_OACC_KERNELS.
* gimple-low.c (lower_stmt): Likewise.
* gimple-walk.c (walk_gimple_op, walk_gimple_stmt): Likewise.
* gimple-pretty-print.c (pp_gimple_stmt_1): Likewise.
(dump_gimple_oacc_parallel): Rename to dump_gimple_oacc_offload.
Also handle GIMPLE_OACC_KERNELS.  Update all callers.
* gimplify.c (gimplify_omp_workshare, gimplify_expr): Handle
OACC_KERNELS.
* oacc-builtins.def (BUILT_IN_GOACC_KERNELS): New builtin.
* omp-low.c (scan_oacc_parallel, expand_oacc_parallel)
(lower_oacc_parallel): Rename to scan_oacc_offload,
expand_oacc_offload, and lower_oacc_offload.  Also handle
GIMPLE_OACC_KERNELS.  Update all callers.
(scan_sharing_clauses, scan_omp_1_stmt, expand_omp, lower_omp_1)
(diagnose_sb_0, diagnose_sb_1, diagnose_sb_2)
(make_gimple_omp_edges): Handle GIMPLE_OACC_KERNELS.
* tree-inline.c (remap_gimple_stmt, estimate_num_insns): Likewise.
* tree-nested.c (convert_nonlocal_reference_stmt)
(convert_local_reference_stmt, convert_tramp_reference_stmt)
(convert_gimple_call): Likewise.
libgomp/
* libgomp.map (GOACC_2.0): Add GOACC_kernels.
* libgomp_g.h (GOACC_kernels): New prototype.
* oacc-parallel.c (GOACC_kernels): New function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208215 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp|  36 +
 gcc/doc/gimple.texi   |   7 +++
 gcc/gimple-low.c  |   1 +
 gcc/gimple-pretty-print.c |  48 -
 gcc/gimple-walk.c |  16 ++
 gcc/gimple.c  |  18 +++
 gcc/gimple.def|  22 +++-
 gcc/gimple.h  | 130 --
 gcc/gimplify.c|   6 ++-
 gcc/oacc-builtins.def |   6 ++-
 gcc/omp-low.c | 116 -
 gcc/tree-inline.c |   2 +
 gcc/tree-nested.c |   4 ++
 libgomp/ChangeLog.gomp|   6 +++
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp_g.h   |   6 ++-
 libgomp/oacc-parallel.c   |  12 -
 17 files changed, 389 insertions(+), 48 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 3d9b06d..79030d6 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,39 @@
+2014-02-28  Thomas Schwinge  tho...@codesourcery.com
+
+   * gimple.def (GIMPLE_OACC_KERNELS): New code.
+   * doc/gimple.texi: Document it.
+   * gimple.h (gimple_has_substatements, CASE_GIMPLE_OMP)
+   (is_gimple_omp_oacc_specifically): Handle it.
+   (gimple_statement_oacc_kernels): New struct.
+   (gimple_build_oacc_kernels): New prototype.
+   (gimple_oacc_kernels_clauses, gimple_oacc_kernels_clauses_ptr)
+   (gimple_oacc_kernels_set_clauses, gimple_oacc_kernels_child_fn)
+   (gimple_oacc_kernels_child_fn_ptr)
+   (gimple_oacc_kernels_set_child_fn, gimple_oacc_kernels_data_arg)
+   (gimple_oacc_kernels_data_arg_ptr)
+   (gimple_oacc_kernels_set_data_arg): New inline functions.
+   * gimple.c (gimple_build_oacc_kernels): New function.
+   (gimple_copy): Handle GIMPLE_OACC_KERNELS.
+   * gimple-low.c (lower_stmt): Likewise.
+   * gimple-walk.c (walk_gimple_op, walk_gimple_stmt): Likewise.
+   * gimple-pretty-print.c (pp_gimple_stmt_1): Likewise.
+   (dump_gimple_oacc_parallel): Rename to dump_gimple_oacc_offload.
+   Also handle GIMPLE_OACC_KERNELS.  Update all callers.
+   * gimplify.c (gimplify_omp_workshare, gimplify_expr): Handle
+   OACC_KERNELS.
+   * oacc-builtins.def (BUILT_IN_GOACC_KERNELS): New builtin.
+   * omp-low.c (scan_oacc_parallel, expand_oacc_parallel)
+   (lower_oacc_parallel): Rename to scan_oacc_offload,
+   expand_oacc_offload, and lower_oacc_offload.  Also handle
+   GIMPLE_OACC_KERNELS.  Update all callers.
+   (scan_sharing_clauses, scan_omp_1_stmt, expand_omp, lower_omp_1)
+   (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2)
+   (make_gimple_omp_edges): Handle

[gomp4 2/2] Initial support for the OpenACC kernels construct in the C front end.

2014-02-28 Thread Thomas Schwinge

From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add kernels.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_KERNELS.
gcc/c/
* c-parser.c (OACC_KERNELS_CLAUSE_MASK): New macro definition.
(c_parser_oacc_kernels): New function.
(c_parser_omp_construct): Handle PRAGMA_OACC_KERNELS.
* c-tree.h (c_finish_oacc_kernels): New prototype.
* c-typeck.c (c_finish_oacc_kernels): New function.
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-fail-1.c: Extend for OpenACC
kernels construct.
* c-c++-common/goacc/clauses-fail.c: Likewise.
* c-c++-common/goacc/data-clause-duplicate-1.c: Likewise.
* c-c++-common/goacc/deviceptr-1.c: Likewise.
* c-c++-common/goacc/nesting-fail-1.c: Likewise.
* c-c++-common/goacc/kernels-1.c: New file.
* gcc.dg/goacc/parallel-sb-1.c: Rename to...
* gcc.dg/goacc/sb-1.c: ... this new file, and extend for OpenACC
kernels and data constructs.
* gcc.dg/goacc/parallel-sb-2.c: Rename to...
* gcc.dg/goacc/sb-2.c: ... this new file, and extend for OpenACC
kernels and data constructs.
libgomp/
* testsuite/libgomp.oacc-c/goacc_kernels.c: New file.
* testsuite/libgomp.oacc-c/kernels-1.c: Likewise.
* testsuite/libgomp.oacc-c/parallel-1.c: Add one missing test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208216 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c-family/ChangeLog.gomp|   5 +
 gcc/c-family/c-pragma.c|   1 +
 gcc/c-family/c-pragma.h|   1 +
 gcc/c/ChangeLog.gomp   |   8 +
 gcc/c/c-parser.c   |  42 +
 gcc/c/c-tree.h |   1 +
 gcc/c/c-typeck.c   |  19 +++
 gcc/testsuite/ChangeLog.gomp   |  16 ++
 .../c-c++-common/goacc-gomp/nesting-fail-1.c   |  84 ++
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c|   3 +
 .../c-c++-common/goacc/data-clause-duplicate-1.c   |   4 +-
 gcc/testsuite/c-c++-common/goacc/deviceptr-1.c |  18 +--
 gcc/testsuite/c-c++-common/goacc/kernels-1.c   |   6 +
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |  20 +++
 gcc/testsuite/gcc.dg/goacc/parallel-sb-1.c |  22 ---
 gcc/testsuite/gcc.dg/goacc/parallel-sb-2.c |  10 --
 gcc/testsuite/gcc.dg/goacc/sb-1.c  |  54 +++
 gcc/testsuite/gcc.dg/goacc/sb-2.c  |  22 +++
 libgomp/ChangeLog.gomp |   4 +
 libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c   |  25 +++
 libgomp/testsuite/libgomp.oacc-c/kernels-1.c   | 170 +
 libgomp/testsuite/libgomp.oacc-c/parallel-1.c  |  14 ++
 22 files changed, 506 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-1.c
 delete mode 100644 gcc/testsuite/gcc.dg/goacc/parallel-sb-1.c
 delete mode 100644 gcc/testsuite/gcc.dg/goacc/parallel-sb-2.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/sb-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/sb-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/kernels-1.c

diff --git gcc/c-family/ChangeLog.gomp gcc/c-family/ChangeLog.gomp
index 3da377f..3b4a335 100644
--- gcc/c-family/ChangeLog.gomp
+++ gcc/c-family/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-02-28  Thomas Schwinge  tho...@codesourcery.com
+
+   * c-pragma.c (oacc_pragmas): Add kernels.
+   * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_KERNELS.
+
 2014-02-21  Thomas Schwinge  tho...@codesourcery.com
 
* c-pragma.c (oacc_pragmas): Add data.
diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c
index 08374aa..ee0ee93 100644
--- gcc/c-family/c-pragma.c
+++ gcc/c-family/c-pragma.c
@@ -1170,6 +1170,7 @@ static vecpragma_ns_name registered_pp_pragmas;
 struct omp_pragma_def { const char *name; unsigned int id; };
 static const struct omp_pragma_def oacc_pragmas[] = {
   { data, PRAGMA_OACC_DATA },
+  { kernels, PRAGMA_OACC_KERNELS },
   { parallel, PRAGMA_OACC_PARALLEL },
 };
 static const struct omp_pragma_def omp_pragmas[] = {
diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h
index d092f9f..d55a511 100644
--- gcc/c-family/c-pragma.h
+++ gcc/c-family/c-pragma.h
@@ -28,6 +28,7 @@ typedef enum pragma_kind {
   PRAGMA_NONE = 0,
 
   PRAGMA_OACC_DATA,
+  PRAGMA_OACC_KERNELS,
   PRAGMA_OACC_PARALLEL,
   PRAGMA_OMP_ATOMIC,
   PRAGMA_OMP_BARRIER,
diff --git gcc/c/ChangeLog.gomp gcc/c/ChangeLog.gomp
index 9b95725..0551026 100644
--- gcc/c/ChangeLog.gomp
+++ gcc/c/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-02-28  Thomas Schwinge  tho...@codesourcery.com
+
+   * c-parser.c (OACC_KERNELS_CLAUSE_MASK): New macro definition.
+

Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

2014-02-28 Thread Dr. Philipp Tomsich

Ganesh,

On 28 Feb 2014, at 10:13 , Gopalasubramanian, Ganesh 
ganesh.gopalasubraman...@amd.com wrote:

 I also have attached a patch that implements the following. 
 * Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). 
 Added a predicate for this.
 * Prefetch with immediate offset - in the range -256 to 255 (Gets 
 generated only when we have a negative offset. Generates prfum instruction). 
 Added a predicate for this.
 * Prefetch with register offset. (modified for printing the locality)

These changes look good to me.
We’ll try them out on the benchmarks that caused us to add prefetching in the 
first place.

Best,
Philipp.

[Patch AArch64] Define TARGET_FLAGS_REGNUM

2014-02-28 Thread Ramana Radhakrishnan


Hi,

	This defines TARGET_FLAGS_REGNUM for AArch64 to be CC_REGNUM. Noticed 
this turns on the cmpelim pass after reload and in a few examples and a 
couple of benchmarks I noticed a number of comparisons getting deleted. 
A similar patch for AArch32 is being tested.


Tested cross with aarch64-none-elf on a model with no regressions.

Ok for stage1 ?

regards
Ramana

DATE  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

* config/aarch64/aarch64.c (TARGET_FLAGS_REGNUM): Define.


--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 901ad3de793c2dd6ca3a2458dc6268e56322400a..617f4de494b1c9fa366dcf4a9fc7f22e7d11642a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8536,6 +8536,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs
 
+#undef TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM CC_REGNUM
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include gt-aarch64.h

Re: [PATCH/AARCH64 1/3] Add AARCH64 ILP32 PCH support

2014-02-28 Thread Richard Earnshaw

On 26/02/14 02:25, Andrew Pinski wrote:
 
 Hi,
   Just like most of the targets out there we should define
 TRY_EMPTY_VM_SPACE to have better PCH support.
 
 OK?  Built and tested on aarch64-linux-gnu with no regressions.
 
 Thanks,
 Andrew Pinski
 
   * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32
   definition.
 ---
  gcc/ChangeLog   |5 +
  gcc/config/host-linux.c |4 +++-
  2 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 616d8ec..fd2b6cd 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,8 @@
 +2014-02-25  Andrew Pinski  apin...@cavium.com
 +
 + * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32
 + definition.
 +
  2014-02-25  Vladimir Makarov  vmaka...@redhat.com
  
   PR rtl-optimization/60317
 diff --git a/gcc/config/host-linux.c b/gcc/config/host-linux.c
 index 17048d7..b298a17 100644
 --- a/gcc/config/host-linux.c
 +++ b/gcc/config/host-linux.c
 @@ -86,8 +86,10 @@
  # define TRY_EMPTY_VM_SPACE  0x6000
  #elif defined(__mc68000__)
  # define TRY_EMPTY_VM_SPACE  0x4000
 -#elif defined(__aarch64__)
 +#elif defined(__aarch64__)  defined(__LP64__)
  # define TRY_EMPTY_VM_SPACE  0x10
 +#elif defined(__aarch64__)
 +# define TRY_EMPTY_VM_SPACE  0x6000
  #elif defined(__ARM_EABI__)
  # define TRY_EMPTY_VM_SPACE 0x6000
  #elif defined(__mips__)  defined(__LP64__)
 

I'd prefer to see this written as:


-#elif defined(__aarch64__)
+#elif defined(__aarch64__)  defined(__ILP32__)
 # define TRY_EMPTY_VM_SPACE0x6000
+#elif defined(__aarch64__)
+# define TRY_EMPTY_VM_SPACE0x10


Since I'd expect there to be a much higher likelihood of another variant
that uses 64-bit pointers (eg LLP64) than of there being another variant
that uses 32-bit.

R.

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread Richard Biener

On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests has
 already been fixed by recent patches, I went through and fixed the others.
 One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c.  When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This certainly
 results in warnings about invalid profile information and causes the case to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  bin.ch...@arm.com

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.


 This change:

 if (dest-loop_father-header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + bb-loop_father-header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 + bb-loop_father-header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
   bb-loop_father-header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?

 I think the preheader test isn't fully correct (bb may be in an inner loop
 for example).  So a more conservative variant would be

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)
 @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
/* Protect loop preheaders and latches if requested.  */
if (dest-loop_father-header == dest)
 {
 - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 -  bb-loop_father-header != dest)
 -   return false;
 -
 - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 -  bb-loop_father-header == dest)
 -   return false;
 + if (bb-loop_father == dest-loop_father)
 +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
 + else if (bb-loop_father == loop_outer (dest-loop_father))
 +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
 + /* Always preserve other edges into loop headers that are
 +not simple latches or preheaders.  */
 + return false;
 }
  }

 that makes sure we can properly update loop information.  It's also
 a more conservative change at this point which should still successfully
 remove simple latches and preheaders created by loop discovery.

I think the patch makes sense anyway and thus I'll install it once it
passed bootstrap / regtesting.

Another fix that may make sense is to restrict it to
!loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
itself can end up setting that ... which we eventually should fix if it
still happens.  That is, check if

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 208169)
+++ gcc/tree-cfgcleanup.c   (working copy)

@@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

   timevar_pop (TV_TREE_CLEANUP_CFG);

-  if (changed  current_loops)
-loops_state_set (LOOPS_NEED_FIXUP);
+  if (changed  current_loops
+   !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+verify_loop_structure ();

   return changed;
 }

trips anywhere (and apply fixes).  That's of course not appropriate at
this stage.

 Does it fix 435.gromacs?

I can't see

Re: [AArch64] 64-bit float vreinterpret implemention

2014-02-28 Thread Alex Velenko


On 25/02/14 18:15, Richard Henderson wrote:

On 02/25/2014 09:02 AM, Alex Velenko wrote:

+(define_expand aarch64_reinterpretdfmode
+  [(match_operand:DF 0 register_operand )
+   (match_operand:VD_RE 1 register_operand )]
+  TARGET_SIMD
+{
+  aarch64_simd_reinterpret (operands[0], operands[1]);
+  DONE;
+})


I believe you want to implement these in aarch64_fold_builtin to fold to a
VIEW_CONVERT_EXPR.  No sense in leaving these opaque until rtl expansion.


r~



Hi Richard,
Thank you for your suggestion. Attached is a patch that includes
implementation of your proposition. A testsuite was run on LE and BE
compilers with no regressions.

Here is the description of the patch:

This patch introduces vreinterpret implementation for vectors with 
64-bit float lanes and adds testcase for those intrinsics.


Thanks,
Alex

gcc/

2014-02-28  Alex Velenko  alex.vele...@arm.com

* config/aarch64/aarch64-builtins.c (TYPES_REINTERP): Removed.
(aarch64_types_signed_unsigned_qualifiers): Qualifier added.
(aarch64_types_signed_poly_qualifiers): Likewise.
(aarch64_types_unsigned_signed_qualifiers): Likewise.
(aarch64_types_poly_signed_qualifiers): Likewise.
(TYPES_REINTERP_SS): Type macro added.
(TYPES_REINTERP_SU): Likewise.
(TYPES_REINTERP_SP): Likewise.
(TYPES_REINTERP_US): Likewise.
(TYPES_REINTERP_PS): Likewise.
(aarch64_fold_builtin): New expression folding added.
* config/aarch64/aarch64-simd-builtins.def (REINTERP):
Declarations removed.
(REINTERP_SS): Declarations added.
(REINTERP_US): Likewise.
(REINTERP_PS): Likewise.
(REINTERP_SU): Likewise.
(REINTERP_SP): Likewise.
* config/aarch64/arm_neon.h (vreinterpret_p8_f64): Implemented.
(vreinterpretq_p8_f64): Likewise.
(vreinterpret_p16_f64): Likewise.
(vreinterpretq_p16_f64): Likewise.
(vreinterpret_f32_f64): Likewise.
(vreinterpretq_f32_f64): Likewise.
(vreinterpret_f64_f32): Likewise.
(vreinterpret_f64_p8): Likewise.
(vreinterpret_f64_p16): Likewise.
(vreinterpret_f64_s8): Likewise.
(vreinterpret_f64_s16): Likewise.
(vreinterpret_f64_s32): Likewise.
(vreinterpret_f64_s64): Likewise.
(vreinterpret_f64_u8): Likewise.
(vreinterpret_f64_u16): Likewise.
(vreinterpret_f64_u32): Likewise.
(vreinterpret_f64_u64): Likewise.
(vreinterpretq_f64_f32): Likewise.
(vreinterpretq_f64_p8): Likewise.
(vreinterpretq_f64_p16): Likewise.
(vreinterpretq_f64_s8): Likewise.
(vreinterpretq_f64_s16): Likewise.
(vreinterpretq_f64_s32): Likewise.
(vreinterpretq_f64_s64): Likewise.
(vreinterpretq_f64_u8): Likewise.
(vreinterpretq_f64_u16): Likewise.
(vreinterpretq_f64_u32): Likewise.
(vreinterpretq_f64_u64): Likewise.
(vreinterpret_s64_f64): Likewise.
(vreinterpretq_s64_f64): Likewise.
(vreinterpret_u64_f64): Likewise.
(vreinterpretq_u64_f64): Likewise.
(vreinterpret_s8_f64): Likewise.
(vreinterpretq_s8_f64): Likewise.
(vreinterpret_s16_f64): Likewise.
(vreinterpretq_s16_f64): Likewise.
(vreinterpret_s32_f64): Likewise.
(vreinterpretq_s32_f64): Likewise.
(vreinterpret_u8_f64): Likewise.
(vreinterpretq_u8_f64): Likewise.
(vreinterpret_u16_f64): Likewise.
(vreinterpretq_u16_f64): Likewise.
(vreinterpret_u32_f64): Likewise.
(vreinterpretq_u32_f64): Likewise.

gcc/testsuite/

2014-02-28  Alex Velenko  alex.vele...@arm.com

* gcc.target/aarch64/vreinterpret_f64_1.c: new_testcase
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 5e0e9b94653deb1530955d62d9842c39da95058a..8241f918e3fcfb71144daf1c873ba1ed481a4385 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -147,6 +147,23 @@ aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned };
 #define TYPES_UNOPU (aarch64_types_unopu_qualifiers)
 #define TYPES_CREATE (aarch64_types_unop_qualifiers)
+#define TYPES_REINTERP_SS (aarch64_types_unop_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_su_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned };
+#define TYPES_REINTERP_SU (aarch64_types_unop_su_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_sp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_poly };
+#define TYPES_REINTERP_SP (aarch64_types_unop_sp_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_us_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define TYPES_REINTERP_US (aarch64_types_unop_us_qualifiers)
+static enum aarch64_type_qualifiers

Re: [C++ Patch] PR 60314 (ICE with decltype(auto))

2014-02-28 Thread Paolo Carlini


Hi,

On 02/27/2014 08:29 PM, Jason Merrill wrote:

On 02/25/2014 05:03 AM, Paolo Carlini wrote:

here we ICE exactly as we did in c++/53756: the only difference is the
use of decltype(auto) instead of auto. Now, if we compare is_cxx_auto to
is_auto (the front-end helper), evidently there is an inconsistency
about the handling of decltype(auto) and the below fixes the ICE.
However, also clearly the patchlet needs a review, because an out of
class decltype(auto) is already fine. Also, I'm not 100% sure we don't
need a decltype_auto_die, etc.


I think we do need a decltype_auto_die.


Ok, then I tested on x86_64-linux the below.

Thanks!
Paolo.

///
2014-02-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60314
* dwarf2out.c (decltype_auto_die): New static.
(gen_subprogram_die): Handle 'decltype(auto)' like 'auto'.
(gen_type_die_with_usage): Handle 'decltype(auto)'.
(is_cxx_auto): Likewise.

/testsuite
2014-02-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60314
* g++.dg/cpp1y/auto-fn24.C: New.
Index: dwarf2out.c
===
--- dwarf2out.c (revision 208214)
+++ dwarf2out.c (working copy)
@@ -250,6 +250,9 @@ static GTY(()) section *cold_text_section;
 /* The DIE for C++1y 'auto' in a function return type.  */
 static GTY(()) dw_die_ref auto_die;
 
+/* The DIE for C++1y 'decltype(auto)' in a function return type.  */
+static GTY(()) dw_die_ref decltype_auto_die;
+
 /* Forward declarations for functions defined in this file.  */
 
 static char *stripattributes (const char *);
@@ -10230,7 +10233,8 @@ is_cxx_auto (tree type)
   tree name = TYPE_NAME (type);
   if (TREE_CODE (name) == TYPE_DECL)
name = DECL_NAME (name);
-  if (name == get_identifier (auto))
+  if (name == get_identifier (auto)
+ || name == get_identifier (decltype(auto)))
return true;
 }
   return false;
@@ -18022,10 +18026,11 @@ gen_subprogram_die (tree decl, dw_die_ref context_
  if (get_AT_unsigned (old_die, DW_AT_decl_line) != (unsigned) s.line)
add_AT_unsigned (subr_die, DW_AT_decl_line, s.line);
 
- /* If the prototype had an 'auto' return type, emit the real
-type on the definition die.  */
+ /* If the prototype had an 'auto' or 'decltype(auto)' return type,
+emit the real type on the definition die.  */
  if (is_cxx()  debug_info_level  DINFO_LEVEL_TERSE
-  get_AT_ref (old_die, DW_AT_type) == auto_die)
+  (get_AT_ref (old_die, DW_AT_type) == auto_die
+ || get_AT_ref (old_die, DW_AT_type) == decltype_auto_die))
add_type_attribute (subr_die, TREE_TYPE (TREE_TYPE (decl)),
0, 0, context_die);
}
@@ -19852,13 +19857,18 @@ gen_type_die_with_usage (tree type, dw_die_ref con
 default:
   if (is_cxx_auto (type))
{
- if (!auto_die)
+ tree name = TYPE_NAME (type);
+ if (TREE_CODE (name) == TYPE_DECL)
+   name = DECL_NAME (name);
+ dw_die_ref *die = (name == get_identifier (auto)
+? auto_die : decltype_auto_die);
+ if (!*die)
{
- auto_die = new_die (DW_TAG_unspecified_type,
- comp_unit_die (), NULL_TREE);
- add_name_attribute (auto_die, auto);
+ *die = new_die (DW_TAG_unspecified_type,
+ comp_unit_die (), NULL_TREE);
+ add_name_attribute (*die, IDENTIFIER_POINTER (name));
}
- equate_type_number_to_die (type, auto_die);
+ equate_type_number_to_die (type, *die);
  break;
}
   gcc_unreachable ();
Index: testsuite/g++.dg/cpp1y/auto-fn24.C
===
--- testsuite/g++.dg/cpp1y/auto-fn24.C  (revision 0)
+++ testsuite/g++.dg/cpp1y/auto-fn24.C  (working copy)
@@ -0,0 +1,12 @@
+// PR c++/60314
+// { dg-options -std=c++1y -g }
+
+// fine
+decltype(auto) qux() { return 42; }
+
+struct foo
+{
+  // also ICEs if not static 
+  static decltype(auto) bar()
+  { return 42; }
+};

Re: [PATCH v4] PR middle-end/60281

2014-02-28 Thread lin zuojian

于 2014年02月28日 15:58, lin zuojian 写道:
 Hi Bernd,
 I agree you with the mode problem.

 And I have not change the stack alignment.What I change is the virtual
 register base's alignment.
 Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the
 efficient code is impossible.
Sorry, it should be Realignment must be make in STRICT_ALIGNMENT machine.
 For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI
 X1,REG:SI Y1,if X is not mentioned as SI mode aligned.
 To make sure X is SI mode algined,virtual register base must be realigned.

 For this patch,I only intent to make it right.Making it best is next task.
 --
 Regards
 lin zuojian.

 于 2014年02月28日 15:47, Bernd Edlinger 写道:
 Hi,

 I see the problem too.

 But I think it is not necessary to change the stack alignment
 to solve the problem.

 It appears to me that the code in asan_emit_stack_protection
 is just wrong. It uses SImode when the memory is not aligned
 enough for that mode. This would not happen if that code
 is rewritten to use get_best_mode, and by the way, even on
 x86_64 the emitted code is not optimal, because that target
 could work with DImode more efficiently.

 So, to fix that, it would be better to concentrate on that function,
 and use word_mode instead of SImode, and let get_best_mode
 choose the required mode.


 Regards
 Bernd Edlinger.

Re: [patch] [arm] Fix PR60169 - thumb1 far jump

2014-02-28 Thread Ramana Radhakrishnan

On Fri, Feb 28, 2014 at 2:42 AM, Joey Ye joey...@arm.com wrote:
 Ping. OK for trunk and 4.8?

Ok if no regressions.

Ramana


 -Original Message-
 From: Joey Ye [mailto:joey...@arm.com]
 Sent: 21 February 2014 19:32
 To: gcc-patches@gcc.gnu.org
 Subject: [patch] [arm] Fix PR60169 - thumb1 far jump

 Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced
 this ICE:

 1. thumb1 estimate if far_jump is used based on function insn size 2.
 During
 reload, after stack layout finalized, it does reload_as_needed. It however
 increases insn size that changes estimation result of far_jump, which in
 return need to save lr and change stack layout again. While there is not
 chance to change, GCC crashes.

 Solution:
 Do not change estimation result of far_jump if reload_in_progress or
 reload_completed is true.

 Not likely need to fix lra according to Vlad:
 http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html

 ChangeLog:
 * config/arm/arm.c (thumb_far_jump_used_p): Don't change
   if reload in progress or completed.

 * gcc.target/arm/thumb1-far-jump-3.c: New case.

[PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.

2014-02-28 Thread Kirill Yukhin

Hello,
This is relatively obvious patch which eliminates comparision
of inifinities for exp2 AVX-512 test and properly comparing floats
for avx512f-sqrtps-2.c.

Tests pass.

Is it ok for trunk?

gcc/testsuite/
* gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
argument to avoid inf values.
* gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with
UNION_FP_CHECK machinery.

--
Thanks, K

---
 gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c
index 06ef68c..ab911c0 100644
--- a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c
@@ -25,7 +25,7 @@ avx512er_test (void)

   for (i = 0; i  16; i++)
 {
-  src.a[i] = 179.345 - 6.5645 * i;
+  src.a[i] = 79.345 - 6.5645 * i;
   res2.a[i] = DEFAULT_VALUE;
 }

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c
index 5249bbd..f5a7b78 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c
@@ -46,10 +46,10 @@ TEST (void)
 abort ();

   MASK_MERGE () (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN,) (res2, res_ref))
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
 abort ();

   MASK_ZERO () (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN,) (res3, res_ref))
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
 abort ();

Re: [PATCH] [libgcc,arm] Fix PR 60166 - NAN fraction bits

2014-02-28 Thread Ramana Radhakrishnan

On Fri, Feb 28, 2014 at 7:16 AM, Joey Ye joey...@arm.com wrote:
 This patch is a mirror copy from approved patch in glibc:
 http://sourceware.org/ml/libc-alpha/2014-02/msg00741.html

 OK to trunk, 4.8 and 4.7?

OK everywhere.

Ramana


 ChangeLog.libgcc:

 * config/arm/sfp-machine.h (_FP_NANFRAC_H,
   _FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q):
   Set to zero.

 diff --git a/libgcc/config/arm/sfp-machine.h
 b/libgcc/config/arm/sfp-machine.h
 index bb34895..8d45320 100644
 --- a/libgcc/config/arm/sfp-machine.h
 +++ b/libgcc/config/arm/sfp-machine.h
 @@ -19,10 +19,12 @@ typedef int __gcc_CMPtype __attribute__ ((mode
 (__libgcc_cmp_return__)));
  #define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
  #define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)

 -#define _FP_NANFRAC_H  ((_FP_QNANBIT_H  1) - 1)
 -#define _FP_NANFRAC_S  ((_FP_QNANBIT_S  1) - 1)
 -#define _FP_NANFRAC_D  ((_FP_QNANBIT_D  1) - 1), -1
 -#define _FP_NANFRAC_Q  ((_FP_QNANBIT_Q  1) - 1), -1, -1, -1
 +/* According to RTABI, QNAN is only with the most significant bit of the
 +   significand set, and all other significand bits zero.  */
 +#define _FP_NANFRAC_H  0
 +#define _FP_NANFRAC_S  0
 +#define _FP_NANFRAC_D  0, 0
 +#define _FP_NANFRAC_Q  0, 0, 0, 0
  #define _FP_NANSIGN_H  0
  #define _FP_NANSIGN_S  0
  #define _FP_NANSIGN_D  0

[PATCH] Restrict and fix the PR60280 fix

2014-02-28 Thread Richard Biener


This narrows down the effect of the PR60280 fix (removing more
forwarder blocks during cfg-cleanup when loops are present) to
only remove forwarders how loop_optimizer_init would create
them.  It also fixes the loop latch updating in remove_forwarder_block
(though that doesn't have any immediate effect as we fixup loops
anywya) - it was set to the wrong loop.  Which also made me
figure that we don't honor !LOOPS_MAY_HAVE_MULTIPLE_LATCHES
properly (also fixed).

Maybe any of the above will fix the gromacs miscompare HJ is seeing
(can't reproduce it).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-02-28  Richard Biener  rguent...@suse.de

PR target/60280
* tree-cfgcleanup.c (tree_forwarder_block_p): Restrict
previous fix and only allow to remove trivial pre-headers
and latches.  Also honor LOOPS_MAY_HAVE_MULTIPLE_LATCHES.
(remove_forwarder_block): Properly update the latch of
a loop.

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 208216)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -316,13 +316,22 @@ tree_forwarder_block_p (basic_block bb,
   /* Protect loop preheaders and latches if requested.  */
   if (dest-loop_father-header == dest)
{
- if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
-  bb-loop_father-header != dest)
-   return false;
-
- if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
-  bb-loop_father-header == dest)
-   return false;
+ if (bb-loop_father == dest-loop_father)
+   {
+ if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
+   return false;
+ /* If bb doesn't have a single predecessor we'd make this
+loop have multiple latches.  Don't do that if that
+would in turn require disambiguating them.  */
+ return (single_pred_p (bb)
+ || loops_state_satisfies_p
+  (LOOPS_MAY_HAVE_MULTIPLE_LATCHES));
+   }
+ else if (bb-loop_father == loop_outer (dest-loop_father))
+   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
+ /* Always preserve other edges into loop headers that are
+not simple latches or preheaders.  */
+ return false;
}
 }
 
@@ -417,6 +426,10 @@ remove_forwarder_block (basic_block bb)
 
   can_move_debug_stmts = MAY_HAVE_DEBUG_STMTS  single_pred_p (dest);
 
+  basic_block pred = NULL;
+  if (single_pred_p (bb))
+pred = single_pred (bb);
+
   /* Redirect the edges.  */
   for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); )
 {
@@ -510,7 +523,7 @@ remove_forwarder_block (basic_block bb)
   /* Adjust latch infomation of BB's parent loop as otherwise
  the cfg hook has a hard time not to kill the loop.  */
   if (current_loops  bb-loop_father-latch == bb)
-bb-loop_father-latch = dest;
+bb-loop_father-latch = pred;
 
   /* And kill the forwarder block.  */
   delete_basic_block (bb);

Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.

2014-02-28 Thread Uros Bizjak

On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This is relatively obvious patch which eliminates comparision
 of inifinities for exp2 AVX-512 test and properly comparing floats
 for avx512f-sqrtps-2.c.

 Tests pass.

 Is it ok for trunk?

 gcc/testsuite/
 * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
 argument to avoid inf values.
 * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with
 UNION_FP_CHECK machinery.

You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to
avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c.

Uros.

Re: copyright dates in binutils (and includes/)

2014-02-28 Thread Joel Brobecker

 Joseph, do you know why implicitly adding years to the claimed
 copyright years is a problem?  I'm guessing the file needs to be
 published somewhere for each year claimed.

IANAL, but from 2 discussions with copyright-clerk:

  1. We start claiming copyright the year the file as committed
 to a medium (hard drive), not the year it was published.

  2. As long as we have evidence of a copyrightable change each year,
 we can include that year in the list of copyright years in
 all files' headers.

For (2), this is how I asked the FSF:

 My question is: As we have evidence of copyrightable changes to the
 GDB project every year since 1986, is it acceptable fix the copyright
 headers to add the missing holes? And if yes, is it acceptable to go
 straight to the next step, which is reducing the copyright years to
 a single range, even if the original list had holes in it? (we will
 make sure that the first year of the range is always 1986 or later,
 or else investigate to make sure that the range is correct).

 For example, we would reduce:

  Copyright (C) 1986, 1988-1989, 1991-1993, 1999-2000, 2007-2012 Free
  Software Foundation, Inc.

 into:

  1986-2012 Free Software Foundation, Inc.

 Naturally, if the initial year was 1995, then it would be the year
 used as the start of the range!

... to which they answered that it would be acceptable.

Does it mean that the sources needed to be made public that year for
us to be able to claim copyright that year? It did not seem so to me.
But you could ask the FSF (copyright DASH clerk AT fsf DOT org).

-- 
Joel

[jit] New API entrypoint: gcc_jit_block_get_function

2014-02-28 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit.h (gcc_jit_block_get_function): New.
* libgccjit.map (gcc_jit_block_get_function): New.
* libgccjit++.h (gccjit::block::get_function): New method.
* libgccjit.c (gcc_jit_block_get_function): New.
---
 gcc/jit/ChangeLog.jit | 7 +++
 gcc/jit/libgccjit++.h | 8 
 gcc/jit/libgccjit.c   | 8 
 gcc/jit/libgccjit.h   | 4 
 gcc/jit/libgccjit.map | 1 +
 5 files changed, 28 insertions(+)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index c7b2395..6c43ce9 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,10 @@
+2014-02-28  David Malcolm  dmalc...@redhat.com
+
+   * libgccjit.h (gcc_jit_block_get_function): New.
+   * libgccjit.map (gcc_jit_block_get_function): New.
+   * libgccjit++.h (gccjit::block::get_function): New method.
+   * libgccjit.c (gcc_jit_block_get_function): New.
+
 2014-02-27  David Malcolm  dmalc...@redhat.com
 
* libgccjit.h (gcc_jit_label): Delete in favor of...
diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index a8801a3..7c1c3be 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -316,6 +316,8 @@ namespace gccjit
 
 gcc_jit_block *get_inner_block () const;
 
+function get_function () const;
+
 void add_eval (rvalue rvalue,
   location loc = location ());
 
@@ -1109,6 +,12 @@ function::new_local (type type_,
 name.c_str ()));
 }
 
+inline function
+block::get_function () const
+{
+  return function (gcc_jit_block_get_function ( get_inner_block ()));
+}
+
 inline void
 block::add_eval (rvalue rvalue,
 location loc)
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 1146261..ce7987c 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -591,6 +591,14 @@ gcc_jit_block_as_object (gcc_jit_block *block)
   return static_cast gcc_jit_object * (block-as_object ());
 }
 
+gcc_jit_function *
+gcc_jit_block_get_function (gcc_jit_block *block)
+{
+  RETURN_NULL_IF_FAIL (block, NULL, NULL block);
+
+  return static_cast gcc_jit_function * (block-get_function ());
+}
+
 gcc_jit_lvalue *
 gcc_jit_context_new_global (gcc_jit_context *ctxt,
gcc_jit_location *loc,
diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
index c24fddd..f00d672 100644
--- a/gcc/jit/libgccjit.h
+++ b/gcc/jit/libgccjit.h
@@ -503,6 +503,10 @@ gcc_jit_function_new_block (gcc_jit_function *func,
 extern gcc_jit_object *
 gcc_jit_block_as_object (gcc_jit_block *block);
 
+/* Which function is this block within?  */
+extern gcc_jit_function *
+gcc_jit_block_get_function (gcc_jit_block *block);
+
 /**
  lvalues, rvalues and expressions.
  **/
diff --git a/gcc/jit/libgccjit.map b/gcc/jit/libgccjit.map
index 48fd9d2..9f6a466 100644
--- a/gcc/jit/libgccjit.map
+++ b/gcc/jit/libgccjit.map
@@ -11,6 +11,7 @@
 gcc_jit_block_end_with_jump;
 gcc_jit_block_end_with_return;
 gcc_jit_block_end_with_void_return;
+gcc_jit_block_get_function;
 gcc_jit_context_acquire;
 gcc_jit_context_compile;
 gcc_jit_context_dump_to_file;
-- 
1.7.11.7

Re: [C++ Patch] PR 60314 (ICE with decltype(auto))

2014-02-28 Thread Jason Merrill


OK, thanks.

Jason

Re: [C++ Patch] PR 58610

2014-02-28 Thread Jason Merrill


OK.

Jason

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Ilya Verbin

2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com:
  * Functions and variables now go into different tables, otherwise
intermixing between them could be a problem that causes tables to
go out of sync between host and target (imagine one big table being
generated by ptx lto1/mkoffload, and multiple small table fragments
being linked together on the host side).

If you need 2 different tables for funcs and vars, we can also use
them. But I still don't understand how it will help synchronization
between host and target tables.

  * I've put the begin/end fragments for the host tables into crtstuff,
which seems like the standard way of doing things.

Our plan was that the host side descriptor __OPENMP_TARGET__ will
contain (in addition to func/var table) pointers to the images for all
enabled accelerators (e.g. omp_image_nvptx_start and
omp_image_intelmic_start), therefore we generated it in the
lto-wrapper. But if the number of accelerators and their types/names
will be defined during configuration, then it's ok to generate the
descriptor in crtstuff.

  * Is there a reason to call a register function for the host tables?
The way I've set it up, we register a target function/variable table
while also passing a pointer to the __OPENMP_TARGET__ symbol which
holds information about the host side tables.

In our case we can't register target table with a call to libgomp, it
can be obtained only from the accelerator. Therefore we propose a
target-independent approach: during device initialization libgomp
calls 2 functions from the plugin (or this can be implemented by a
single function):
1. devicep-device_load_image_func, which will load target image (its
pointer will be taken from the host descriptor);
2. devicep-device_get_table_func, which in our case connects to the
device and receives its table. And in your case it will return
func_mappings and var_mappings. Will it work for you?

  * An offload compiler is built with --enable-as-accelerator-for=, which
eliminates the need for -fopenmp-target, and changes install paths so
that the host compiler knows where to find it. No need for
OFFLOAD_TARGET_COMPILERS anymore.

Unfortunately I don't fully understand this configure magic... When a
user specifies 2 or 3 accelerators during configuration with
--enable-accelerators, will several different accel-gccs be built?

Thanks,
  -- Ilya

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward 
 loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests has
 already been fixed by recent patches, I went through and fixed the others.
 One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c.  When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This 
 certainly
 results in warnings about invalid profile information and causes the case 
 to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  bin.ch...@arm.com

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.


 This change:

 if (dest-loop_father-header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + bb-loop_father-header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 + bb-loop_father-header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
 phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
   bb-loop_father-header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?

 I think the preheader test isn't fully correct (bb may be in an inner loop
 for example).  So a more conservative variant would be

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)
 @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
/* Protect loop preheaders and latches if requested.  */
if (dest-loop_father-header == dest)
 {
 - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 -  bb-loop_father-header != dest)
 -   return false;
 -
 - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 -  bb-loop_father-header == dest)
 -   return false;
 + if (bb-loop_father == dest-loop_father)
 +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
 + else if (bb-loop_father == loop_outer (dest-loop_father))
 +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
 + /* Always preserve other edges into loop headers that are
 +not simple latches or preheaders.  */
 + return false;
 }
  }

 that makes sure we can properly update loop information.  It's also
 a more conservative change at this point which should still successfully
 remove simple latches and preheaders created by loop discovery.

 I think the patch makes sense anyway and thus I'll install it once it
 passed bootstrap / regtesting.

 Another fix that may make sense is to restrict it to
 !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
 itself can end up setting that ... which we eventually should fix if it
 still happens.  That is, check if

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)

 @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

timevar_pop (TV_TREE_CLEANUP_CFG);

 -  if (changed  current_loops)
 -loops_state_set (LOOPS_NEED_FIXUP);
 +  if (changed  current_loops
 +   !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
 +verify_loop_structure ();

return changed;
  }

 trips

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Bernd Schmidt


On 02/28/2014 05:09 PM, Ilya Verbin wrote:

2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com:

  * Functions and variables now go into different tables, otherwise
intermixing between them could be a problem that causes tables to
go out of sync between host and target (imagine one big table being
generated by ptx lto1/mkoffload, and multiple small table fragments
being linked together on the host side).


If you need 2 different tables for funcs and vars, we can also use
them. But I still don't understand how it will help synchronization
between host and target tables.


I think it won't help that much - I still think this entire scheme is 
likely to fail on nvptx. I'll try to construct an example at some point.


One other thing about the split tables is that we don't have to write a 
useless size of 1 for functions.



  * I've put the begin/end fragments for the host tables into crtstuff,
which seems like the standard way of doing things.


Our plan was that the host side descriptor __OPENMP_TARGET__ will
contain (in addition to func/var table) pointers to the images for all
enabled accelerators (e.g. omp_image_nvptx_start and
omp_image_intelmic_start), therefore we generated it in the
lto-wrapper.


The concept of image is likely to vary somewhat between accelerators. 
For ptx, it's just a string and it can't really be generated the same 
way as for your target where you can manipulate ELF images. So I think 
it is better to have a call to a gomp registration function for every 
offload target. That should also give you the ordering you said you 
wanted between shared libraries.



  * Is there a reason to call a register function for the host tables?
The way I've set it up, we register a target function/variable table
while also passing a pointer to the __OPENMP_TARGET__ symbol which
holds information about the host side tables.


In our case we can't register target table with a call to libgomp, it
can be obtained only from the accelerator. Therefore we propose a
target-independent approach: during device initialization libgomp
calls 2 functions from the plugin (or this can be implemented by a
single function):
1. devicep-device_load_image_func, which will load target image (its
pointer will be taken from the host descriptor);
2. devicep-device_get_table_func, which in our case connects to the
device and receives its table. And in your case it will return
func_mappings and var_mappings. Will it work for you?


Probably. I think the constructor call to the gomp registration function 
would contain an opaque pointer to whatever data the target wants, so it 
can arrange its image/table data in whatever way it likes.


It would help to see the code you have on the libgomp side, I don't 
believe that's been posted yet?



Unfortunately I don't fully understand this configure magic... When a
user specifies 2 or 3 accelerators during configuration with
--enable-accelerators, will several different accel-gccs be built?


No - the idea is that --enable-accelerator= is likely specific to ptx, 
where we really just want to build a gcc and no target libraries, so 
building it alongside the host in an accel-gcc subdirectory is ideal.


For your use case, I'd imagine the offload compiler would be built 
relatively normally as a full build with 
--enable-as-accelerator-for=x86_64-linux, which would install it into 
locations where the host will eventually be able to find it. Then the 
host compiler would be built with another new configure option (as yet 
unimplemented in my patch set) --enable-offload-targets=mic,... which 
would tell the host compiler about the pre-built offload target 
compilers. On the ptx side, --enable-accelerator=ptx would then also 
add ptx to the list of --enable-offload-targets.
Naming of all these configure options can be discussed, I have no real 
preference for any of them.



Bernd

Re: [PATCH] Fix epilogue bb expansion (PR middle-end/60175)

2014-02-28 Thread Richard Henderson

On 02/17/2014 11:45 AM, Jakub Jelinek wrote:
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 2014-02-17  Jakub Jelinek  ja...@redhat.com
 
   PR middle-end/60175
   * function.c (expand_function_end): Don't emit
   clobber_return_register sequence if clobber_after is a BARRIER.
   * cfgexpand.c (construct_exit_block): Append instructions before
   return_label to prev_bb.

Ok.


r~

[wwwdocs] GSoC2014 and POWER8 News items

2014-02-28 Thread David Edelsohn

I added a news item for GSoC2014. I also realized that POWER8 support
had not been added to the News announcements, so I inserted an item.

Thanks, David


Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.905
diff -c -p -r1.905 index.html
*** index.html  17 Feb 2014 08:28:36 -  1.905
--- index.html  28 Feb 2014 16:41:17 -
*** mission statement/a./p
*** 48,58 
  td style=width: 50%; padding-right: 8px; valign=top


-
  h2 style=margin-top:0pt; id=newsNews/h2

  dl class=news

  dtspanIntel AVX-512 support/span
  span class=date[2014-02-17]/span/dt
  ddIntel AVX-512 support was added to GCC.  That includes inline
--- 48,63 
  td style=width: 50%; padding-right: 8px; valign=top


  h2 style=margin-top:0pt; id=newsNews/h2

  dl class=news

+ dtspanGCC Google Summer of Code 2014/span
+ span class=date[2014-02-24]/span/dt
+ ddGCC has been accepted as a
+ a href=http://www.google-melange.com/gsoc/org2/google/gsoc2014/gcc;Goog
le Summer of Code 2014 project/a.
+ Students, mentors and project ideas welcome!/dd
+
  dtspanIntel AVX-512 support/span
  span class=date[2014-02-17]/span/dt
  ddIntel AVX-512 support was added to GCC.  That includes inline
*** mission statement/a./p
*** 109,114 
--- 114,126 
  a href=https://plus.google.com/108467477471815191158; rel=publisher ta
rget=_blankGoogle+/a
   to help developers stay informed of progress./dd

+ dtspanIBM POWER8 support/span
+ span class=date[2013-07-15]/span/dt
+ ddSupport for the POWER8 processor has been contributed by IBM.
+ This includes new VSX, HTM and atomic instructions, new intrinsics,
+ and scheduling improvements. Little Endian support also has been
+ enhanced, including control over vector element endianness./dd
+
  dtspana href=gcc-4.8/GCC 4.8.1/a released/span
  span class=date[2013-05-31]/span/dt
  dd/dd

Re: [PATCH] Handle more COMDAT profiling issues

2014-02-28 Thread Teresa Johnson

 Here's the new patch. The only changes from the earlier patch are in
 handle_missing_profiles, where we now get the counts off of the entry
 and call stmt bbs, and in tree_profiling, where we call
 handle_missing_profiles earlier and I have removed the outlined cgraph
 rebuilding code since it doesn't need to be reinvoked.

 Honza, does this look ok for trunk when stage 1 reopens? David, I can
 send a similar patch for review to google-4_8 if it looks good.

 Thanks,
 Teresa

...

 Spec testing of my earlier patch hit an issue with the call to
 gimple_bb in this routine, since the caller was a thunk and therefore
 the edge did not have a call_stmt set. I've attached a slightly
 modified patch that guards the call by a check to
 cgraph_function_with_gimple_body_p. Regression and spec testing are
 clean.

 Teresa

Ping - Honza, does this patch look ok for stage 1?

Thanks,
Teresa


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward 
 loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests has
 already been fixed by recent patches, I went through and fixed the others.
 One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c.  
 When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This 
 certainly
 results in warnings about invalid profile information and causes the case 
 to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  bin.ch...@arm.com

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.


 This change:

 if (dest-loop_father-header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + bb-loop_father-header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 + bb-loop_father-header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
 phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
   bb-loop_father-header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?

 I think the preheader test isn't fully correct (bb may be in an inner loop
 for example).  So a more conservative variant would be

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)
 @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
/* Protect loop preheaders and latches if requested.  */
if (dest-loop_father-header == dest)
 {
 - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 -  bb-loop_father-header != dest)
 -   return false;
 -
 - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 -  bb-loop_father-header == dest)
 -   return false;
 + if (bb-loop_father == dest-loop_father)
 +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
 + else if (bb-loop_father == loop_outer (dest-loop_father))
 +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
 + /* Always preserve other edges into loop headers that are
 +not simple latches or preheaders.  */
 + return false;
 }
  }

 that makes sure we can properly update loop information.  It's also
 a more conservative change at this point which should still successfully
 remove simple latches and preheaders created by loop discovery.

 I think the patch makes sense anyway and thus I'll install it once it
 passed bootstrap / regtesting.

 Another fix that may make sense is to restrict it to
 !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
 itself can end up setting that ... which we eventually should fix if it
 still happens.  That is, check if

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)

 @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

timevar_pop (TV_TREE_CLEANUP_CFG);

 -  if (changed  current_loops)
 -loops_state_set (LOOPS_NEED_FIXUP);
 +  if (changed  current_loops
 +   !loops_state_satisfies_p

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward 
 loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests 
 has
 already been fixed by recent patches, I went through and fixed the 
 others.
 One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c.  
 When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This 
 certainly
 results in warnings about invalid profile information and causes the 
 case to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  bin.ch...@arm.com

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.


 This change:

 if (dest-loop_father-header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + bb-loop_father-header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 + bb-loop_father-header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
 phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
   bb-loop_father-header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?

 I think the preheader test isn't fully correct (bb may be in an inner loop
 for example).  So a more conservative variant would be

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)
 @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
/* Protect loop preheaders and latches if requested.  */
if (dest-loop_father-header == dest)
 {
 - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 -  bb-loop_father-header != dest)
 -   return false;
 -
 - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 -  bb-loop_father-header == dest)
 -   return false;
 + if (bb-loop_father == dest-loop_father)
 +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
 + else if (bb-loop_father == loop_outer (dest-loop_father))
 +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
 + /* Always preserve other edges into loop headers that are
 +not simple latches or preheaders.  */
 + return false;
 }
  }

 that makes sure we can properly update loop information.  It's also
 a more conservative change at this point which should still successfully
 remove simple latches and preheaders created by loop discovery.

 I think the patch makes sense anyway and thus I'll install it once it
 passed bootstrap / regtesting.

 Another fix that may make sense is to restrict it to
 !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
 itself can end up setting that ... which we eventually should fix if it
 still happens.  That is, check if

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)

 @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

timevar_pop (TV_TREE_CLEANUP_CFG);

 -  if (changed  current_loops)
 -loops_state_set (LOOPS_NEED_FIXUP);
 +  if

[PATCH, AArch64] Sync merge libffi - fix call frame information in ffi_closure_SYSV

2014-02-28 Thread Yufeng Zhang


Hi,

The attached patch fixes a bug in ./src/aarch64/sysv.S:ffi_closure_SYSV 
where stack unwinding information was not generated correctly.  The 
change has been reviewed, approved and merged into the stand-alone 
libffi release tree**.


OK for the trunk?

Thanks,
Yufeng

** http://github.com/atgreen/libffi


2014-02-28  Yufeng Zhang  yufeng.zh...@arm.com

* src/aarch64/sysv.S (ffi_closure_SYSV): Use x29 as the
main CFA reg; update cfi_rel_offset.diff --git a/libffi/src/aarch64/sysv.S b/libffi/src/aarch64/sysv.S
index b8cd421..ffb16f8 100644
--- a/libffi/src/aarch64/sysv.S
+++ b/libffi/src/aarch64/sysv.S
@@ -231,13 +231,13 @@ ffi_closure_SYSV:
 cfi_rel_offset (x30, 8)
 
 mov x29, sp
+cfi_def_cfa_register (x29)
 
 sub sp, sp, #ffi_closure_SYSV_FS
-   cfi_adjust_cfa_offset (ffi_closure_SYSV_FS)
 
 stp x21, x22, [x29, #-16]
-cfi_rel_offset (x21, 0)
-cfi_rel_offset (x22, 8)
+cfi_rel_offset (x21, -16)
+cfi_rel_offset (x22, -8)
 
 /* Load x21 with call_context.  */
 mov x21, sp
@@ -295,7 +295,7 @@ ffi_closure_SYSV:
 cfi_restore (x22)
 
 mov sp, x29
-   cfi_adjust_cfa_offset (-ffi_closure_SYSV_FS)
+cfi_def_cfa_register (sp)
 
 ldp x29, x30, [sp], #16
cfi_adjust_cfa_offset (-16)

Re: copyright dates in binutils (and includes/)

2014-02-28 Thread Joseph S. Myers

On Fri, 28 Feb 2014, Joel Brobecker wrote:

  Joseph, do you know why implicitly adding years to the claimed
  copyright years is a problem?  I'm guessing the file needs to be
  published somewhere for each year claimed.
 
 IANAL, but from 2 discussions with copyright-clerk:
 
   1. We start claiming copyright the year the file as committed
  to a medium (hard drive), not the year it was published.

I don't think it counts unless the version in question got published at 
some point.  The question is about versions that weren't published at the 
time, but were published later when the version control history was 
released.

There was a discussion on bug-standards starting Jan 2012.  Karl's revised 
wording from 11 May 2012 seems to indicate that if a version was committed 
to a version control history that was later released, the dates from that 
history count as copyrightable years (so reducing the number of cases 
where it may not be possible to fill in gaps) - but that revised wording 
doesn't seem to have been committed.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 9:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward 
 loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests 
 has
 already been fixed by recent patches, I went through and fixed the 
 others.
 One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c.  
 When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This 
 certainly
 results in warnings about invalid profile information and causes the 
 case to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  bin.ch...@arm.com

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.


 This change:

 if (dest-loop_father-header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + bb-loop_father-header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 + bb-loop_father-header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
 phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
   bb-loop_father-header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?

 I think the preheader test isn't fully correct (bb may be in an inner loop
 for example).  So a more conservative variant would be

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)
 @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
/* Protect loop preheaders and latches if requested.  */
if (dest-loop_father-header == dest)
 {
 - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 -  bb-loop_father-header != dest)
 -   return false;
 -
 - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 -  bb-loop_father-header == dest)
 -   return false;
 + if (bb-loop_father == dest-loop_father)
 +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
 + else if (bb-loop_father == loop_outer (dest-loop_father))
 +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
 + /* Always preserve other edges into loop headers that are
 +not simple latches or preheaders.  */
 + return false;
 }
  }

 that makes sure we can properly update loop information.  It's also
 a more conservative change at this point which should still successfully
 remove simple latches and preheaders created by loop discovery.

 I think the patch makes sense anyway and thus I'll install it once it
 passed bootstrap / regtesting.

 Another fix that may make sense is to restrict it to
 !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
 itself can end up setting that ... which we eventually should fix if it
 still happens.  That is, check if

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)

 @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

timevar_pop (TV_TREE_CLEANUP_CFG);

 -  if

Re: [C++ Patch] PR 58610

2014-02-28 Thread Paolo Carlini


On 02/28/2014 04:57 PM, Jason Merrill wrote:

OK.
Thanks. I'm going to commit as obvious the additional lambda.c hunk 
below, which removes another now redundant STRIP_TEMPLATE use.


Thanks,
Paolo.


/cp
2014-02-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58610
* cp-tree.h (DECL_DELETED_FN): Use LANG_DECL_FN_CHECK.
* call.c (print_z_candidate): Remove STRIP_TEMPLATE use.
* lambda.c (maybe_add_lambda_conv_op): Likewise.

/testsuite
2014-02-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58610
* g++.dg/cpp0x/constexpr-ice11.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 208224)
+++ cp/call.c   (working copy)
@@ -3237,7 +3237,7 @@ print_z_candidate (location_t loc, const char *msg
 inform (cloc, %s%T conversion, msg, candidate-fn);
   else if (candidate-viable == -1)
 inform (cloc, %s%#D near match, msg, candidate-fn);
-  else if (DECL_DELETED_FN (STRIP_TEMPLATE (candidate-fn)))
+  else if (DECL_DELETED_FN (candidate-fn))
 inform (cloc, %s%#D deleted, msg, candidate-fn);
   else
 inform (cloc, %s%#D, msg, candidate-fn);
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 208224)
+++ cp/cp-tree.h(working copy)
@@ -3222,7 +3222,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a
 
 /* Nonzero if DECL was declared with '= delete'.  */
 #define DECL_DELETED_FN(DECL) \
-  (DECL_LANG_SPECIFIC (FUNCTION_DECL_CHECK 
(DECL))-u.base.threadprivate_or_deleted_p)
+  (LANG_DECL_FN_CHECK (DECL)-min.base.threadprivate_or_deleted_p)
 
 /* Nonzero if DECL was declared with '= default' (maybe implicitly).  */
 #define DECL_DEFAULTED_FN(DECL) \
Index: cp/lambda.c
===
--- cp/lambda.c (revision 208224)
+++ cp/lambda.c (working copy)
@@ -975,7 +975,7 @@ maybe_add_lambda_conv_op (tree type)
  the conversion op is used.  */
   if (varargs_function_p (callop))
 {
-  DECL_DELETED_FN (STRIP_TEMPLATE (fn)) = 1;
+  DECL_DELETED_FN (fn) = 1;
   return;
 }
 
Index: testsuite/g++.dg/cpp0x/constexpr-ice11.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ice11.C(revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ice11.C(working copy)
@@ -0,0 +1,9 @@
+// PR c++/58610
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  templatetypename A();
+};
+
+constexpr A a;  // { dg-error literal|matching }

Re: [AArch64] Improve vst4_lane intrinsics

2014-02-28 Thread Marcus Shawcroft

On 13 February 2014 16:03, James Greenhalgh james.greenha...@arm.com wrote:

 Hi,

 This patch rewrites the vst4_lane intrinsics in terms of RTL builtins.

 Tested on aarch64-none-elf with no issues.

 OK to queue for Stage 1?

OK for stage 1
/Marcus

Re: RFA: RL78: Add missing instruction patterns

2014-02-28 Thread DJ Delorie


   * config/rl78/rl78-real.md (cbranchsi4_real_signed): Add
   anti-cacnonical alternatives.
   (negandhi3_real): New pattern.
   * config/rl78/rl78-virt.md (negandhi3_virt): New pattern.

These are fine, although I don't know why gcc would require a negandhi3 
pattern...

Re: [PATCH,GRAPHITE] Fix for P1 bug 58028

2014-02-28 Thread Mircea Namolaru

Hi,

Thanks. Here is the updated patch.

2014-02-26  Tobias Grosser  tob...@grosser.es
Mircea Namolaru  mircea.namol...@inria.fr

 PR tree-optimization/58028
 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar
   dimensions.

Index: gcc/graphite-clast-to-gimple.c
===
--- gcc/graphite-clast-to-gimple.c  (revision 207298)
+++ gcc/graphite-clast-to-gimple.c  (working copy)
@@ -1522,6 +1522,13 @@
  variables.  */
   options-save_domains = 1;
 
+  /* Do not remove scalar dimensions.  CLooG by default removes scalar 
+ dimensions very early from the input schedule.  However, they are 
+ necessary to correctly derive from the saved domains 
+ (options-save_domains) the relationship between the generated loops 
+ and the schedule dimensions they are generated from.  */ 
+  options-noscalars = 1;
+
   /* Disable optimizations and make cloog generate source code closer to the
  input.  This is useful for debugging,  but later we want the optimized
  code.

Mircea

RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jason Merrill

Multiple large C++ projects (KDE and libreoffice, at least) have been 
breaking when GCC speculatively devirtualizes a call to an 
implicitly-declared virtual destructor, because this leads to references 
to base destructors and vtables that might be hidden in another DSO. 
This patch avoids this problem by avoiding speculative devirtualization 
of calls to implicitly-declared functions.


Tested x86_64-pc-linux-gnu.  OK for trunk?

commit 94eb5df9fb20c796d09151d7293ae89ac012ae79
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 28 14:03:19 2014 -0500

	PR c++/58678
	* ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared
	function.

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 21649cb..27dc27d 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1710,7 +1710,7 @@ ipa_devirt (void)
 
   int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
   int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
-  int nwrong = 0, nok = 0, nexternal = 0;;
+  int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0;
 
   FOR_EACH_DEFINED_FUNCTION (n)
 {	
@@ -1820,6 +1820,16 @@ ipa_devirt (void)
 		nexternal++;
 		continue;
 	  }
+	/* Don't use an implicitly-declared destructor (c++/58678).  */
+	struct cgraph_node *real_target
+	  = cgraph_function_node (likely_target);
+	if (DECL_ARTIFICIAL (real_target-decl))
+	  {
+		if (dump_file)
+		  fprintf (dump_file, Target is implicitly declared\n\n);
+		nartificial++;
+		continue;
+	  }
 	if (cgraph_function_body_availability (likely_target)
 		= AVAIL_OVERWRITABLE
 		 symtab_can_be_discarded (likely_target))
@@ -1862,10 +1872,10 @@ ipa_devirt (void)
 	  %i speculatively devirtualized, %i cold\n
 	 %i have multiple targets, %i overwritable,
 	  %i already speculated (%i agree, %i disagree),
-	  %i external, %i not defined\n,
+	  %i external, %i not defined, %i artificial\n,
 	 npolymorphic, ndevirtualized, nconverted, ncold,
 	 nmultiple, noverwritable, nspeculated, nok, nwrong,
-	 nexternal, nnotdefined);
+	 nexternal, nnotdefined, nartificial);
   return ndevirtualized ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C b/gcc/testsuite/g++.dg/ipa/devirt-28.C
new file mode 100644
index 000..35c8df1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C
@@ -0,0 +1,17 @@
+// PR c++/58678
+// { dg-options -O3 -fdump-ipa-devirt }
+
+struct A {
+  virtual ~A();
+};
+struct B : A {
+  virtual int m_fn1();
+};
+void fn1(B* b) {
+  delete b;
+}
+
+// { dg-final { scan-assembler-not _ZN1AD2Ev } }
+// { dg-final { scan-assembler-not _ZN1BD0Ev } }
+// { dg-final { scan-ipa-dump Target is implicitly declared devirt } }
+// { dg-final { cleanup-ipa-dump devirt } }

Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jan Hubicka

 Multiple large C++ projects (KDE and libreoffice, at least) have
 been breaking when GCC speculatively devirtualizes a call to an
 implicitly-declared virtual destructor, because this leads to
 references to base destructors and vtables that might be hidden in
 another DSO. This patch avoids this problem by avoiding speculative
 devirtualization of calls to implicitly-declared functions.
 
 Tested x86_64-pc-linux-gnu.  OK for trunk?
 

 commit 94eb5df9fb20c796d09151d7293ae89ac012ae79
 Author: Jason Merrill ja...@redhat.com
 Date:   Fri Feb 28 14:03:19 2014 -0500
 
   PR c++/58678
   * ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared
   function.
 
 diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
 index 21649cb..27dc27d 100644
 --- a/gcc/ipa-devirt.c
 +++ b/gcc/ipa-devirt.c
 @@ -1710,7 +1710,7 @@ ipa_devirt (void)
  
int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
 -  int nwrong = 0, nok = 0, nexternal = 0;;
 +  int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0;
  
FOR_EACH_DEFINED_FUNCTION (n)
  {
 @@ -1820,6 +1820,16 @@ ipa_devirt (void)
   nexternal++;
   continue;
 }
 + /* Don't use an implicitly-declared destructor (c++/58678).  */
 + struct cgraph_node *real_target
 +   = cgraph_function_node (likely_target);
 + if (DECL_ARTIFICIAL (real_target-decl))

I think we can safely test here DECL_ARTIFICIAL  (DECL_EXTERNAL ||
DECL_COMDAT).  If the dtor is going to be output anyway, we are safe to use it.

Are those programs valid by C++ standard? (I believe it is not valid to include
sutff whose implementation you do not link with.). If we just want to avoid
breaking python and libreoffice (I fixed libreoffice part however), we may just
go with the ipa-devirt change as you propose (with externalcomdat check). 

If this is an correcness issue, I think we want to be safe that other 
optimizations
won't do the same. In that case your check seems misplaced.

If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into
function_attribute_inlinable_p.  If the dtor is not safe to refer, then I would
add it into can_refer_decl_in_current_unit_p

Both such changes would however inhibit quite some potimization, since
artificial destructors are quite common case, right? Or is there some reason why
only speculative devirtualiztaion count possibly work out reference to these?

Honza

Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jason Merrill


On 02/28/2014 03:56 PM, Jan Hubicka wrote:

I think we can safely test here DECL_ARTIFICIAL  (DECL_EXTERNAL ||
DECL_COMDAT).  If the dtor is going to be output anyway, we are safe to use it.


We already skipped DECL_EXTERNAL decls, and artificial members are 
always DECL_COMDAT, but I'll add the COMDAT check.



Are those programs valid by C++ standard? (I believe it is not valid to include
stuff whose implementation you do not link with.).


Symbol visibility is outside the scope of the standard.


If we just want to avoid
breaking python and libreoffice (I fixed libreoffice part however), we may just
go with the ipa-devirt change as you propose (with externalcomdat check).

If this is an correctness issue, I think we want to be safe that other 
optimizations
won't do the same. In that case your check seems misplaced.

If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into
function_attribute_inlinable_p.  If the dtor is not safe to refer, then I would
add it into can_refer_decl_in_current_unit_p



Both such changes would however inhibit quite some optimization, since
artificial destructors are quite common case, right? Or is there some reason why
only speculative devirtualization count possibly work out reference to these?


Normally, it's fine to inline destructors, and refer to them.  The 
problem comes when we turn what had been a virtual call (which goes 
through the vtable that is hidden in the DSO) into a direct call to a 
hidden function.  We don't do that for user-defined virtual functions 
because the user controls whether or not they are defined in the header, 
and we don't devirtualize if no definition is available, but 
implicitly-declared functions are different because the user has no way 
to prevent the definition from being available.


This also isn't a problem for cprop devirtualization, because in that 
situation we must have already referred to the vtable.


Jason

commit 2a05a09c268ce3abb373aa86cf731d20aac8dd7a
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 28 14:03:19 2014 -0500

	PR c++/58678
	* ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared
	function.

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 21649cb..2f84f17 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1710,7 +1710,7 @@ ipa_devirt (void)
 
   int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
   int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
-  int nwrong = 0, nok = 0, nexternal = 0;;
+  int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0;
 
   FOR_EACH_DEFINED_FUNCTION (n)
 {	
@@ -1820,6 +1820,17 @@ ipa_devirt (void)
 		nexternal++;
 		continue;
 	  }
+	/* Don't use an implicitly-declared destructor (c++/58678).  */
+	struct cgraph_node *non_thunk_target
+	  = cgraph_function_node (likely_target);
+	if (DECL_ARTIFICIAL (non_thunk_target-decl)
+		 DECL_COMDAT (non_thunk_target-decl))
+	  {
+		if (dump_file)
+		  fprintf (dump_file, Target is artificial\n\n);
+		nartificial++;
+		continue;
+	  }
 	if (cgraph_function_body_availability (likely_target)
 		= AVAIL_OVERWRITABLE
 		 symtab_can_be_discarded (likely_target))
@@ -1862,10 +1873,10 @@ ipa_devirt (void)
 	  %i speculatively devirtualized, %i cold\n
 	 %i have multiple targets, %i overwritable,
 	  %i already speculated (%i agree, %i disagree),
-	  %i external, %i not defined\n,
+	  %i external, %i not defined, %i artificial\n,
 	 npolymorphic, ndevirtualized, nconverted, ncold,
 	 nmultiple, noverwritable, nspeculated, nok, nwrong,
-	 nexternal, nnotdefined);
+	 nexternal, nnotdefined, nartificial);
   return ndevirtualized ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C b/gcc/testsuite/g++.dg/ipa/devirt-28.C
new file mode 100644
index 000..e18b818
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C
@@ -0,0 +1,17 @@
+// PR c++/58678
+// { dg-options -O3 -fdump-ipa-devirt }
+
+struct A {
+  virtual ~A();
+};
+struct B : A {
+  virtual int m_fn1();
+};
+void fn1(B* b) {
+  delete b;
+}
+
+// { dg-final { scan-assembler-not _ZN1AD2Ev } }
+// { dg-final { scan-assembler-not _ZN1BD0Ev } }
+// { dg-final { scan-ipa-dump Target is artificial devirt } }
+// { dg-final { cleanup-ipa-dump devirt } }

Re: [C++ patch] for C++/52369

2014-02-28 Thread Jason Merrill


On 02/28/2014 04:03 PM, Fabien Chêne wrote:

The first two lines are fine in my opinion. The third line should
actually be split into an error + an inform. By doing that, I think we
also need to reformulate the error message like this:
testsuite/g++.dg/init/pr44086.C:4:8: error: 'struct A' needs its
non-static const members to be initialized
testsuite/g++.dg/init/pr44086.C:6:19: note: 'A::i' should be initialized

What do you think ? (before I bother adjusting the testsuite)


Let's change the C++11 diagnostic to match the C++98 diagnostic.  So, 
uninitialized const member in %q#T + %qD should be initialized.



Incidentally, while moving the diagnostic concerning the uninitialized
field from an error to an inform, I realized that the syntactic sugar
%q#D is no longer honored an is treated as %qD, is it expected ?


No, how do you mean?

Jason

[jit] New API entrypoint: gcc_jit_context_new_cast

2014-02-28 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit.h (gcc_jit_context_new_cast): New.
* libgccjit.map (gcc_jit_context_new_cast): New.
* libgccjit++.h (gccjit::context::new_cast): New method.
* libgccjit.c (gcc_jit_context_new_cast): New.

* internal-api.h (gcc::jit::recording::context::new_cast): New method.
(gcc::jit::recording::cast): New subclass of rvalue.
(gcc::jit::playback::context::new_cast): New method.
(gcc::jit::playback::context::build_cast): New method.

* internal-api.c (convert): New.
(gcc::jit::recording::context::new_cast): New.
(gcc::jit::recording::cast::replay_into): New.
(gcc::jit::recording::cast::make_debug_string): New.
(gcc::jit::playback::context::build_cast): New.
(gcc::jit::playback::context::new_cast): New.

* TODO.rst: Update.

gcc/testsuite/
* jit.dg/test-expressions.c (make_test_of_cast): New, to test new
entrypoint gcc_jit_context_new_cast.
(make_tests_of_casts): New.
(create_code): Add call to make_tests_of_casts.
(verify_code): Add call to verify_casts.
---
 gcc/jit/ChangeLog.jit   |  21 ++
 gcc/jit/TODO.rst|   9 +--
 gcc/jit/internal-api.c  | 102 ++
 gcc/jit/internal-api.h  |  33 +
 gcc/jit/libgccjit++.h   |  15 
 gcc/jit/libgccjit.c |  13 
 gcc/jit/libgccjit.h |  11 +++
 gcc/jit/libgccjit.map   |   1 +
 gcc/testsuite/ChangeLog.jit |   8 ++
 gcc/testsuite/jit.dg/test-expressions.c | 126 
 10 files changed, 332 insertions(+), 7 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 6c43ce9..625e01a 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,26 @@
 2014-02-28  David Malcolm  dmalc...@redhat.com
 
+   * libgccjit.h (gcc_jit_context_new_cast): New.
+   * libgccjit.map (gcc_jit_context_new_cast): New.
+   * libgccjit++.h (gccjit::context::new_cast): New method.
+   * libgccjit.c (gcc_jit_context_new_cast): New.
+
+   * internal-api.h (gcc::jit::recording::context::new_cast): New method.
+   (gcc::jit::recording::cast): New subclass of rvalue.
+   (gcc::jit::playback::context::new_cast): New method.
+   (gcc::jit::playback::context::build_cast): New method.
+
+   * internal-api.c (convert): New.
+   (gcc::jit::recording::context::new_cast): New.
+   (gcc::jit::recording::cast::replay_into): New.
+   (gcc::jit::recording::cast::make_debug_string): New.
+   (gcc::jit::playback::context::build_cast): New.
+   (gcc::jit::playback::context::new_cast): New.
+
+   * TODO.rst: Update.
+
+2014-02-28  David Malcolm  dmalc...@redhat.com
+
* libgccjit.h (gcc_jit_block_get_function): New.
* libgccjit.map (gcc_jit_block_get_function): New.
* libgccjit++.h (gccjit::block::get_function): New method.
diff --git a/gcc/jit/TODO.rst b/gcc/jit/TODO.rst
index 227113a..8a2308e 100644
--- a/gcc/jit/TODO.rst
+++ b/gcc/jit/TODO.rst
@@ -23,13 +23,6 @@ Initial Release
 
 * expose the statements in the API? (mostly so they can be stringified?)
 
-* explicit casts::
-
-extern gcc_jit_rvalue *
-gcc_jit_rvalue_cast (gcc_jit_rvalue *, gcc_jit_type *);
-
-  e.g. (void*) to (struct foo*)
-
 * support more arithmetic ops and comparison modes
 
 * access to a function by address::
@@ -119,6 +112,8 @@ Initial Release
   have each block have its own stmt_list, avoiding the need for this
   traversal, and having the block structure show up within tree dumps.
 
+* Implement more kinds of casts e.g. pointers
+
 Bugs
 
 * INTERNAL functions don't seem to work (see e.g. test-quadratic, on trying
diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c
index fa08e56..573dc67 100644
--- a/gcc/jit/internal-api.c
+++ b/gcc/jit/internal-api.c
@@ -16,12 +16,29 @@
 #include diagnostic-core.h
 #include dumpfile.h
 #include tree-cfg.h
+#include target.h
+#include convert.h
 
 #include pthread.h
 
 #include internal-api.h
 #include jit-builtins.h
 
+/* gcc::jit::playback::context::build_cast uses the convert.h API,
+   which in turn requires the frontend to provide a convert
+   function, apparently as a fallback.
+
+   Hence we provide this dummy one, with the requirement that any casts
+   are handled before reaching this.  */
+extern tree convert (tree type, tree expr);
+
+tree
+convert (tree /*type*/, tree /*expr*/)
+{
+  error (unhandled conversion);
+  return error_mark_node;
+}
+
 namespace gcc {
 namespace jit {
 
@@ -474,6 +491,16 @@ recording::context::new_comparison (recording::location 
*loc,
 }
 
 recording::rvalue *
+recording::context::new_cast (recording::location *loc,
+ recording::rvalue *expr,
+ recording::type

Re: [jit] Major API change: blocks rather than labels

2014-02-28 Thread David Malcolm

On Thu, 2014-02-27 at 17:25 -0500, David Malcolm wrote:
 On Thu, 2014-02-27 at 17:11 -0500, David Malcolm wrote:
 
 [...]
 
  With this commit, the API changes to using basic blocks instead: blocks
  are created within functions, and statements are added to blocks, rather
  than to functions.
 
 [...]
 
 I've also ported the jittest example to the new API, as of this
 commit:
 https://github.com/davidmalcolm/jittest/commit/af66efe0386e52a9292b7527174ae402c0af5e43
 
 (though currently it falls foul of type-checking, due to int vs bool
 issues in conditionals; upon hacking out the type-checking from
 libgccjit it compiles and runs OK).

jittest is now fixed, as of:
https://github.com/davidmalcolm/jittest/commit/7af0765c018e15d600016d41f7b444273cc0389a

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Bernd Schmidt


On 02/28/2014 05:21 PM, Bernd Schmidt wrote:

On 02/28/2014 05:09 PM, Ilya Verbin wrote:

Unfortunately I don't fully understand this configure magic... When a
user specifies 2 or 3 accelerators during configuration with
--enable-accelerators, will several different accel-gccs be built?


No - the idea is that --enable-accelerator= is likely specific to ptx,
where we really just want to build a gcc and no target libraries, so
building it alongside the host in an accel-gcc subdirectory is ideal.

For your use case, I'd imagine the offload compiler would be built
relatively normally as a full build with
--enable-as-accelerator-for=x86_64-linux, which would install it into
locations where the host will eventually be able to find it. Then the
host compiler would be built with another new configure option (as yet
unimplemented in my patch set) --enable-offload-targets=mic,... which
would tell the host compiler about the pre-built offload target
compilers. On the ptx side, --enable-accelerator=ptx would then also
add ptx to the list of --enable-offload-targets.
Naming of all these configure options can be discussed, I have no real
preference for any of them.


IOW, something like the following on top of the other patches. Ideally 
we'd also add error checking to make sure the offload compilers exist in 
the places we'll be looking for them.



Bernd

Index: gomp-4_0-branch/gcc/config.in
===
--- gomp-4_0-branch.orig/gcc/config.in
+++ gomp-4_0-branch/gcc/config.in
@@ -1748,6 +1748,12 @@
 #endif
 
 
+/* Define to hold the list of target names suitable for offloading. */
+#ifndef USED_FOR_TARGET
+#undef OFFLOAD_TARGETS
+#endif
+
+
 /* Define to the address where bug reports for this package should be sent. */
 #ifndef USED_FOR_TARGET
 #undef PACKAGE_BUGREPORT
Index: gomp-4_0-branch/gcc/configure
===
--- gomp-4_0-branch.orig/gcc/configure
+++ gomp-4_0-branch/gcc/configure
@@ -908,6 +908,7 @@ with_bugurl
 enable_languages
 enable_accelerator
 enable_as_accelerator_for
+enable_offload_targets
 with_multilib_list
 enable_rpath
 with_libiconv_prefix
@@ -1618,6 +1619,8 @@ Optional Features:
   --enable-acceleratorbuild accelerator [ARG={no,device-triplet}]
   --enable-as-accelerator-for
   build compiler as accelerator target for given host
+  --enable-offload-targets=LIST
+  enable offloading to devices from LIST
   --disable-rpath do not hardcode runtime library paths
   --enable-sjlj-exceptions
   arrange to use setjmp/longjmp exception handling
@@ -7299,12 +7302,14 @@ else
 fi
 
 
+offload_targets=
 # Check whether --enable-accelerator was given.
 if test ${enable_accelerator+set} = set; then :
   enableval=$enable_accelerator;
   case $enable_accelerator in
   no) ;;
   *)
+offload_targets=$enable_accelerator
 
 $as_echo #define ENABLE_OFFLOADING 1 confdefs.h
 
@@ -7343,6 +7348,31 @@ fi
 
 
 
+# Check whether --enable-offload-targets was given.
+if test ${enable_offload_targets+set} = set; then :
+  enableval=$enable_offload_targets;
+  if test x$enable_offload_targets = x; then
+as_fn_error no offload targets specified $LINENO 5
+  else
+if test x$offload_targets = x; then
+  offload_targets=$enable_offload_targets
+else
+  offload_targets=$offload_targets,$enable_offload_targets
+fi
+  fi
+
+else
+  enable_accelerator=no
+fi
+
+
+offload_targets=`echo $offload_targets | sed -e 's#,#:#'`
+
+cat confdefs.h _ACEOF
+#define OFFLOAD_TARGETS $offload_targets
+_ACEOF
+
+
 
 # Check whether --with-multilib-list was given.
 if test ${with_multilib_list+set} = set; then :
@@ -17983,7 +18013,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 17986 configure
+#line 18016 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -18089,7 +18119,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 18092 configure
+#line 18122 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
Index: gomp-4_0-branch/gcc/configure.ac
===
--- gomp-4_0-branch.orig/gcc/configure.ac
+++ gomp-4_0-branch/gcc/configure.ac
@@ -839,12 +839,14 @@ AC_ARG_ENABLE(languages,
 esac],
 [enable_languages=c])
 
+offload_targets=
 AC_ARG_ENABLE(accelerator,
 [AS_HELP_STRING([--enable-accelerator], [build accelerator @:@ARG={no,device-triplet}@:@])],
 [
   case $enable_accelerator in
   no) ;;
   *)
+offload_targets=$enable_accelerator
 AC_DEFINE(ENABLE_OFFLOADING, 1,
  [Define this to enable support for offloading.])
 AC_DEFINE_UNQUOTED(ACCEL_TARGET,${enable_accelerator},
@@ -871,6 +873,25 @@ AC_ARG_ENABLE(as-accelerator-for,
 ], [enable_as_accelerator=no])
 AC_SUBST(enable_as_accelerator)

Re: [C++ patch] for C++/52369

2014-02-28 Thread Fabien Chêne

2014-02-28 22:52 GMT+01:00 Fabien Chêne fabien.ch...@gmail.com:
 Incidentally, while moving the diagnostic concerning the uninitialized
 field from an error to an inform, I realized that the syntactic sugar
 %q#D is no longer honored an is treated as %qD, is it expected ?


 No, how do you mean?

 I must be tired, false alarm, sorry.

I guess my mistake comes from the fact that %q#D is not present in the
c++98 diagnostic. Shall we homogeneise that as well ?
In favor of %q#D ?

-- 
Fabien

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Ilya Verbin

On 28 Feb 17:21, Bernd Schmidt wrote:
 It would help to see the code you have on the libgomp side, I don't
 believe that's been posted yet?

It was posted here: http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01777.html
And below is the updated version.

---
 libgomp/libgomp.map |1 +
 libgomp/target.c|  138 ---
 2 files changed, 132 insertions(+), 7 deletions(-)

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index cb52e45..d33673d 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -208,6 +208,7 @@ GOMP_3.0 {
 
 GOMP_4.0 {
   global:
+   GOMP_register_lib;
GOMP_barrier_cancel;
GOMP_cancel;
GOMP_cancellation_point;
diff --git a/libgomp/target.c b/libgomp/target.c
index a6a5505..7fafa9a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -84,6 +84,19 @@ struct splay_tree_key_s {
   bool copy_from;
 };
 
+enum library_descr {
+  DESCR_TABLE_START,
+  DESCR_TABLE_END,
+  DESCR_IMAGE_START,
+  DESCR_IMAGE_END
+};
+
+/* Array of pointers to target shared library descriptors.  */
+static void **libraries;
+
+/* Total number of target shared libraries.  */
+static int num_libraries;
+
 /* Array of descriptors of all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -107,6 +120,12 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 #include splay-tree.h
 
+struct target_table_s
+{
+  void **entries;
+  int num_entries;
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -117,15 +136,21 @@ struct gomp_device_descr
  TARGET construct.  */
   int id;
 
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+
   /* Plugin file handler.  */
   void *plugin_handle;
 
   /* Function handlers.  */
-  bool (*device_available_func) (void);
+  bool (*device_available_func) (int);
+  void (*device_init_func) (int);
+  struct target_table_s (*device_load_image_func) (void *, int);
   void *(*device_alloc_func) (size_t);
   void (*device_free_func) (void *);
   void *(*device_dev2host_func)(void *, const void *, size_t);
   void *(*device_host2dev_func)(void *, const void *, size_t);
+  void (*device_run_func) (void *, void *);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s dev_splay_tree;
@@ -471,6 +496,80 @@ gomp_update (struct gomp_device_descr *devicep, size_t 
mapnum,
   gomp_mutex_unlock (devicep-dev_env_lock);
 }
 
+void
+GOMP_register_lib (const void *openmp_target)
+{
+  libraries = realloc (libraries, (num_libraries + 1) * sizeof (void *));
+
+  if (libraries == NULL)
+return;
+
+  libraries[num_libraries] = (void *) openmp_target;
+
+  num_libraries++;
+}
+
+static void
+gomp_init_device (struct gomp_device_descr *devicep)
+{
+  /* Initialize the target device.  */
+  devicep-device_init_func (devicep-id);
+
+  /* Load shared libraries into target device and
+ perform host-target address mapping.  */
+  int i;
+  for (i = 0; i  num_libraries; i++)
+{
+  /* Get the pointer to the target image from the library descriptor.  */
+  void **lib = libraries[i];
+
+  /* FIXME: Select the proper target image, if there are several.  */
+  void *target_image = lib[DESCR_IMAGE_START];
+  int target_img_size = lib[DESCR_IMAGE_END] - lib[DESCR_IMAGE_START];
+
+  /* Calculate the size of host address table.  */
+  void **host_table_start = lib[DESCR_TABLE_START];
+  void **host_table_end = lib[DESCR_TABLE_END];
+  int host_table_size = host_table_end - host_table_start;
+
+  /* Load library into target device and receive its address table.  */
+  struct target_table_s target_table
+   = devicep-device_load_image_func (target_image, target_img_size);
+
+  if (host_table_size != target_table.num_entries)
+   gomp_fatal (Can't map target objects);
+
+  void **host_entry, **target_entry;
+  for (host_entry = host_table_start, target_entry = target_table.entries;
+  host_entry  host_table_end; host_entry += 2, target_entry += 2)
+   {
+ struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+ tgt-refcount = 1;
+ tgt-array = gomp_malloc (sizeof (*tgt-array));
+ tgt-tgt_start = (uintptr_t) *target_entry;
+ tgt-tgt_end = tgt-tgt_start + *((uint64_t *) target_entry + 1);
+ tgt-to_free = NULL;
+ tgt-list_count = 0;
+ tgt-device_descr = devicep;
+ splay_tree_node node = tgt-array;
+ splay_tree_key k = node-key;
+ k-host_start = (uintptr_t) *host_entry;
+ k-host_end = k-host_start + *((uint64_t *) host_entry + 1);
+ k-tgt_offset = 0;
+ k-tgt = tgt;
+ node-left = NULL;
+ node-right = NULL;
+ splay_tree_insert (devicep-dev_splay_tree, node);
+   }
+
+

[GOOGLE] Remove size check when loop is very hot

2014-02-28 Thread Dehao Chen

This patch removes the size limit for loop unroll/peel when the loop
is truly hot. This makes the implementation easily maintanable between
FDO and AutoFDO.

Bootstrapped and loadtest perf show neutral impact.

OK for google-4_8?

Thanks,
Dehao

Index: gcc/loop-unroll.c
===
--- gcc/loop-unroll.c (revision 208233)
+++ gcc/loop-unroll.c (working copy)
@@ -347,11 +347,9 @@ code_size_limit_factor(struct loop *loop)
   /* Next, set the value of the codesize-based unroll factor divisor which in
  most loops will need to be set to a value that will reduce or eliminate
  unrolling/peeling.  */
-  if (num_hot_counters  size_threshold * 2
-   loop-header-count  0)
+  if (loop-header-count  0)
 {
-  /* For applications that are less than twice the codesize limit, allow
- limited unrolling for very hot loops.  */
+  /* Allow limited unrolling for very hot loops.  */
   sum_to_header_ratio = profile_info-sum_all / loop-header-count;
   hotness_ratio_threshold = PARAM_VALUE
(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
   /* When the profile count sum to loop entry header ratio is smaller than

Re: [GOOGLE] Remove size check when loop is very hot

2014-02-28 Thread Teresa Johnson

Looks good to me.
Thanks, Teresa

On Fri, Feb 28, 2014 at 2:17 PM, Dehao Chen de...@google.com wrote:
 This patch removes the size limit for loop unroll/peel when the loop
 is truly hot. This makes the implementation easily maintanable between
 FDO and AutoFDO.

 Bootstrapped and loadtest perf show neutral impact.

 OK for google-4_8?

 Thanks,
 Dehao

 Index: gcc/loop-unroll.c
 ===
 --- gcc/loop-unroll.c (revision 208233)
 +++ gcc/loop-unroll.c (working copy)
 @@ -347,11 +347,9 @@ code_size_limit_factor(struct loop *loop)
/* Next, set the value of the codesize-based unroll factor divisor which in
   most loops will need to be set to a value that will reduce or eliminate
   unrolling/peeling.  */
 -  if (num_hot_counters  size_threshold * 2
 -   loop-header-count  0)
 +  if (loop-header-count  0)
  {
 -  /* For applications that are less than twice the codesize limit, allow
 - limited unrolling for very hot loops.  */
 +  /* Allow limited unrolling for very hot loops.  */
sum_to_header_ratio = profile_info-sum_all / loop-header-count;
hotness_ratio_threshold = PARAM_VALUE
 (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
/* When the profile count sum to loop entry header ratio is smaller 
 than



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

calloc = malloc + memset

2014-02-28 Thread Marc Glisse


Hello,

this is a stage 1 patch, and I'll ping it then, but if you have comments 
now...


Passes bootstrap+testsuite on x86_64-linux-gnu.

2014-02-28  Marc Glisse  marc.gli...@inria.fr

PR tree-optimization/57742
gcc/
* tree-ssa-forwprop.c (simplify_malloc_memset): New function.
(simplify_builtin_call): Call it.
gcc/testsuite/
* g++.dg/tree-ssa/calloc.C: New testcase.
* gcc.dg/tree-ssa/calloc.c: Likewise.

--
Marc GlisseIndex: gcc/testsuite/g++.dg/tree-ssa/calloc.C
===
--- gcc/testsuite/g++.dg/tree-ssa/calloc.C  (revision 0)
+++ gcc/testsuite/g++.dg/tree-ssa/calloc.C  (working copy)
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options -std=gnu++11 -O3 -fdump-tree-optimized } */
+
+#include new
+#include vector
+#include cstdlib
+
+void g(void*);
+inline void* operator new(std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
+{
+  void *p;
+
+  if (sz == 0)
+sz = 1;
+
+  // Slightly modified from the libsupc++ version, that one has 2 calls
+  // to malloc which makes it too hard to optimize.
+  while ((p = std::malloc (sz)) == 0)
+{
+  std::new_handler handler = std::get_new_handler ();
+  if (! handler)
+_GLIBCXX_THROW_OR_ABORT(std::bad_alloc());
+  handler ();
+}
+  return p;
+}
+
+void f(void*p,int n){
+  new(p)std::vectorint(n);
+}
+
+/* { dg-final { scan-tree-dump-times calloc 1 optimized } } */
+/* { dg-final { scan-tree-dump-not malloc optimized } } */
+/* { dg-final { scan-tree-dump-not memset optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */

Property changes on: gcc/testsuite/g++.dg/tree-ssa/calloc.C
___
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Index: gcc/testsuite/gcc.dg/tree-ssa/calloc.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/calloc.c  (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+#include stdlib.h
+#include string.h
+
+extern int a;
+extern int* b;
+int n;
+void* f(long*q){
+  int*p=malloc(n);
+  ++*q;
+  if(p){
+++*q;
+a=2;
+memset(p,0,n);
+*b=3;
+  }
+  return p;
+}
+void* g(void){
+  float*p=calloc(8,4);
+  return memset(p,0,32);
+}
+
+/* { dg-final { scan-tree-dump-times calloc 2 optimized } } */
+/* { dg-final { scan-tree-dump-not malloc optimized } } */
+/* { dg-final { scan-tree-dump-not memset optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/calloc.c
___
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 208224)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -1487,20 +1487,149 @@ constant_pointer_difference (tree p1, tr
 }
 
   for (i = 0; i  cnt[0]; i++)
 for (j = 0; j  cnt[1]; j++)
   if (exps[0][i] == exps[1][j])
return size_binop (MINUS_EXPR, offs[0][i], offs[1][j]);
 
   return NULL_TREE;
 }
 
+/* Optimize
+   ptr = malloc (n);
+   memset (ptr, 0, n);
+   into
+   ptr = calloc (n);
+   gsi_p is known to point to a call to __builtin_memset.  */
+static bool
+simplify_malloc_memset (gimple_stmt_iterator *gsi_p)
+{
+  /* First make sure we have:
+ ptr = malloc (n);
+ memset (ptr, 0, n);  */
+  gimple stmt2 = gsi_stmt (*gsi_p);
+  if (!integer_zerop (gimple_call_arg (stmt2, 1)))
+return false;
+  tree ptr1, ptr2 = gimple_call_arg (stmt2, 0);
+  tree size = gimple_call_arg (stmt2, 2);
+  if (TREE_CODE (ptr2) != SSA_NAME) 
+return false;
+  gimple stmt1 = SSA_NAME_DEF_STMT (ptr2);
+  tree callee1;
+  /* Handle the case where STMT1 is a unary PHI, which happends
+ for instance with:
+ while (!(p = malloc (n))) { ... }
+ memset (p, 0, n);  */
+  if (!stmt1)
+return false;
+  if (gimple_code (stmt1) == GIMPLE_PHI
+   gimple_phi_num_args (stmt1) == 1)
+{
+  ptr1 = gimple_phi_arg_def (stmt1, 0);
+  if (TREE_CODE (ptr1) != SSA_NAME)
+   return false;
+  stmt1 = SSA_NAME_DEF_STMT (ptr1);
+}
+  else
+ptr1 = ptr2;
+  if (!stmt1
+  || !is_gimple_call (stmt1)
+  || !(callee1 = gimple_call_fndecl (stmt1)))
+return false;
+
+  bool is_calloc;
+  if (DECL_FUNCTION_CODE (callee1) == BUILT_IN_MALLOC)
+{
+  is_calloc = false;
+  if (!operand_equal_p (gimple_call_arg (stmt1, 0), size, 0))
+   return false;
+}
+  else if (DECL_FUNCTION_CODE

Re: [C++ patch] for C++/52369

2014-02-28 Thread Jason Merrill


On 02/28/2014 05:04 PM, Fabien Chêne wrote:

I guess my mistake comes from the fact that %q#D is not present in the
c++98 diagnostic. Shall we homogeneise that as well ?
In favor of %q#D ?


OK.

Jason

[jit] Add typechecking to binary ops and comparisons

2014-02-28 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit.c (gcc_jit_context_new_binary_op): Check that the
operands have the same type.
(gcc_jit_context_new_comparison): Likewise.
---
 gcc/jit/ChangeLog.jit |  6 ++
 gcc/jit/libgccjit.c   | 18 ++
 2 files changed, 24 insertions(+)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 625e01a..f2fea8c 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,11 @@
 2014-02-28  David Malcolm  dmalc...@redhat.com
 
+   * libgccjit.c (gcc_jit_context_new_binary_op): Check that the
+   operands have the same type.
+   (gcc_jit_context_new_comparison): Likewise.
+
+2014-02-28  David Malcolm  dmalc...@redhat.com
+
* libgccjit.h (gcc_jit_context_new_cast): New.
* libgccjit.map (gcc_jit_context_new_cast): New.
* libgccjit++.h (gccjit::context::new_cast): New method.
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 6c078ce..d9f63cf 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -752,6 +752,15 @@ gcc_jit_context_new_binary_op (gcc_jit_context *ctxt,
   RETURN_NULL_IF_FAIL (result_type, ctxt, NULL result_type);
   RETURN_NULL_IF_FAIL (a, ctxt, NULL a);
   RETURN_NULL_IF_FAIL (b, ctxt, NULL b);
+  RETURN_NULL_IF_FAIL_PRINTF4 (
+a-get_type () == b-get_type (),
+ctxt,
+mismatching types for binary op:
+ a: %s (type: %s) b: %s (type: %s),
+a-get_debug_string (),
+a-get_type ()-get_debug_string (),
+b-get_debug_string (),
+b-get_type ()-get_debug_string ());
 
   return (gcc_jit_rvalue *)ctxt-new_binary_op (loc, op, result_type, a, b);
 }
@@ -766,6 +775,15 @@ gcc_jit_context_new_comparison (gcc_jit_context *ctxt,
   /* op is checked by the inner function.  */
   RETURN_NULL_IF_FAIL (a, ctxt, NULL a);
   RETURN_NULL_IF_FAIL (b, ctxt, NULL b);
+  RETURN_NULL_IF_FAIL_PRINTF4 (
+a-get_type () == b-get_type (),
+ctxt,
+mismatching types for comparison:
+ a: %s (type: %s) b: %s (type: %s),
+a-get_debug_string (),
+a-get_type ()-get_debug_string (),
+b-get_debug_string (),
+b-get_type ()-get_debug_string ());
 
   return (gcc_jit_rvalue *)ctxt-new_comparison (loc, op, a, b);
 }
-- 
1.7.11.7

[PATCH, rs6000] Restrict reload use of FLOAT_REGS

2014-02-28 Thread Bill Schmidt

Hi,

We've encountered a rare bug that occurs when attempting to reload for
an unaligned store in DImode.  For an unaligned store, using stfd gets
preference over std since stfd doesn't have an alignment restriction and
therefore the m constraint matches.  However, when there is not a
register available for the REG to be stored, register elimination can
replace the REG with its REQ_EQUIV.  When this is a PLUS, we end up with
an attempt to compute an integer add into a floating-point register, and
things rapidly go downhill.

We had some internal discussion and determined the best way to fix this
is to avoid ever using FLOAT_REGS for a PLUS in
rs6000_preferred_reload_class, similar to what's currently done to avoid
loading constants into FLOAT_REGS.  Uli Weigand pointed out that this
existing test is actually a bit too strong, as rclass could be ALL_REGS
and this prevents us from using GENERAL_REGS in that case.  So I've
relaxed that test to only look for superclasses of FLOAT_REGS.  (If you
feel this is too risky, I can avoid that change.)

The patch below fixes the one case where we've observed this bug in the
wild (it occurred for a particular snapshot of code for an internal
build that doesn't match any public branch).  Because it's dependent on
register spill, it is very difficult to try to produce a test case that
isn't too fragile, so I haven't tried to add one.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu
(--with-cpu=power8) and powerpc64-unknown-linux-gnu (--with-cpu=power7)
with no regressions.  Is this ok for trunk?

Thanks,
Bill


2014-02-28  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow
PLUS rtx's from reloading into a superset of FLOAT_REGS; relax
constraint on constants to only prevent them from being reloaded
into a superset of FLOAT_REGS.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 208207)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -16751,7 +16751,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_cla
easy_vector_constant (x, mode))
 return ALTIVEC_REGS;
 
-  if (CONSTANT_P (x)  reg_classes_intersect_p (rclass, FLOAT_REGS))
+  if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)
+   reg_class_subset_p (FLOAT_REGS, rclass))
 return NO_REGS;
 
   if (GET_MODE_CLASS (mode) == MODE_INT  rclass == NON_SPECIAL_REGS)

[PATCH, rs6000] Document reserved use of wc constraint

2014-02-28 Thread Bill Schmidt

Hi,

Hal Finkel requested that we define a constraint for representing
individual CR bits.  We agreed to reserve wc for this purpose to
maintain compatibility with LLVM.  This patch documents that use.

A pro-forma regstrap is in progress.  Assuming no problems, is this ok
for trunk?

Thanks,
Bill


2014-02-28  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/constraints.md: Document reserved use of wc.


Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 208237)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -56,6 +56,9 @@
 (define_register_constraint wa rs6000_constraints[RS6000_CONSTRAINT_wa]
   Any VSX register if the -mvsx option was used or NO_REGS.)
 
+;; NOTE: For compatibility, wc is reserved to represent individual CR bits.
+;; It is currently used for that purpose in LLVM.
+
 (define_register_constraint wd rs6000_constraints[RS6000_CONSTRAINT_wd]
   VSX vector register to hold vector double data or NO_REGS.)

[PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8

2014-02-28 Thread Peter Bergner

I'd like to ask for permission to backport the following two LIBITM bug
fixes to the FSF 4.8 branch.  Although these are not technically fixing
regressions, they do fix the libitm.c/reentrant.c testsuite failure on
s390 and powerpc (or at least it will when we finally get our power8
code backported to FSF 4.8).  It also fixes a real bug on x86 that is
latent because we don't currently have a test case that warms up the
x86's RTM hardware enough such that its xbegin succeeds exposing the
bug.  I'd like this backport so that the 4.8 based distros won't need
to carry this as an add-on patch.

It should also be fairly safe as well, since the fixed code is limited
to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH,
so all others definitely won't see a difference.

I'll note I CC'd some of the usual suspects interested in TM as well
as the normal RMs, because LIBITM doesn't seem to have a maintainer
or reviewer listed in the MAINTAINERS file.  Is that an oversight or???

Peter

Backport from mainline
2013-06-20  Torvald Riegel  trie...@redhat.com

* query.cc (_ITM_inTransaction): Abort when using the HTM fastpath.
(_ITM_getTransactionId): Same.
* config/x86/target.h (htm_transaction_active): New.

2013-06-20  Torvald Riegel  trie...@redhat.com

PR libitm/57643
* beginend.cc (gtm_thread::begin_transaction): Handle reentrancy in
the HTM fastpath.

Index: libitm/beginend.cc
===
--- libitm/beginend.cc  (revision 208151)
+++ libitm/beginend.cc  (working copy)
@@ -197,6 +197,8 @@
  // We are executing a transaction now.
  // Monitor the writer flag in the serial-mode lock, and abort
  // if there is an active or waiting serial-mode transaction.
+ // Note that this can also happen due to an enclosing
+ // serial-mode transaction; we handle this case below.
  if (unlikely(serial_lock.is_write_locked()))
htm_abort();
  else
@@ -219,6 +221,14 @@
  tx = new gtm_thread();
  set_gtm_thr(tx);
}
+ // Check whether there is an enclosing serial-mode transaction;
+ // if so, we just continue as a nested transaction and don't
+ // try to use the HTM fastpath.  This case can happen when an
+ // outermost relaxed transaction calls unsafe code that starts
+ // a transaction.
+ if (tx-nesting  0)
+   break;
+ // Another thread is running a serial-mode transaction.  Wait.
  serial_lock.read_lock(tx);
  serial_lock.read_unlock(tx);
  // TODO We should probably reset the retry count t here, unless
Index: libitm/config/x86/target.h
===
--- libitm/config/x86/target.h  (revision 208151)
+++ libitm/config/x86/target.h  (working copy)
@@ -125,6 +125,13 @@
 {
   return begin_ret  _XABORT_RETRY;
 }
+
+/* Returns true iff a hardware transaction is currently being executed.  */
+static inline bool
+htm_transaction_active ()
+{
+  return _xtest() != 0;
+}
 #endif


Index: libitm/query.cc
===
--- libitm/query.cc (revision 208151)
+++ libitm/query.cc (working copy)
@@ -43,6 +43,15 @@
 _ITM_howExecuting ITM_REGPARM
 _ITM_inTransaction (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // If we use the HTM fastpath, we cannot reliably detect whether we are
+  // in a transaction because this function can be called outside of
+  // a transaction and thus we can't deduce this by looking at just the serial
+  // lock.  This function isn't used in practice currently, so the easiest
+  // way to handle it is to just abort.
+  if (htm_fastpath  htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   if (tx  (tx-nesting  0))
 {
@@ -58,6 +67,11 @@
 _ITM_transactionId_t ITM_REGPARM
 _ITM_getTransactionId (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // See ITM_inTransaction.
+  if (htm_fastpath  htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   return (tx  (tx-nesting  0)) ? tx-id : _ITM_noTransactionId;
 }

Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jason Merrill

I went ahead and checked in my patch so that the regression is fixed 
over the weekend.


Jason

Re: [PATCH, rs6000] Document reserved use of wc constraint

2014-02-28 Thread David Edelsohn

On Fri, Feb 28, 2014 at 7:23 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 Hal Finkel requested that we define a constraint for representing
 individual CR bits.  We agreed to reserve wc for this purpose to
 maintain compatibility with LLVM.  This patch documents that use.

 A pro-forma regstrap is in progress.  Assuming no problems, is this ok
 for trunk?

You're not going to implement the new register class?















Okay

Thanks, David

Re: [PATCH, rs6000] Restrict reload use of FLOAT_REGS

2014-02-28 Thread David Edelsohn

On Fri, Feb 28, 2014 at 7:11 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 We've encountered a rare bug that occurs when attempting to reload for
 an unaligned store in DImode.  For an unaligned store, using stfd gets
 preference over std since stfd doesn't have an alignment restriction and
 therefore the m constraint matches.  However, when there is not a
 register available for the REG to be stored, register elimination can
 replace the REG with its REQ_EQUIV.  When this is a PLUS, we end up with
 an attempt to compute an integer add into a floating-point register, and
 things rapidly go downhill.

 We had some internal discussion and determined the best way to fix this
 is to avoid ever using FLOAT_REGS for a PLUS in
 rs6000_preferred_reload_class, similar to what's currently done to avoid
 loading constants into FLOAT_REGS.  Uli Weigand pointed out that this
 existing test is actually a bit too strong, as rclass could be ALL_REGS
 and this prevents us from using GENERAL_REGS in that case.  So I've
 relaxed that test to only look for superclasses of FLOAT_REGS.  (If you
 feel this is too risky, I can avoid that change.)

 The patch below fixes the one case where we've observed this bug in the
 wild (it occurred for a particular snapshot of code for an internal
 build that doesn't match any public branch).  Because it's dependent on
 register spill, it is very difficult to try to produce a test case that
 isn't too fragile, so I haven't tried to add one.

 Bootstrapped and tested on powerpc64le-unknown-linux-gnu
 (--with-cpu=power8) and powerpc64-unknown-linux-gnu (--with-cpu=power7)
 with no regressions.  Is this ok for trunk?

 Thanks,
 Bill


 2014-02-28  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow
 PLUS rtx's from reloading into a superset of FLOAT_REGS; relax
 constraint on constants to only prevent them from being reloaded
 into a superset of FLOAT_REGS.

This is okay with me. Uli is the best one to comment if this is the right test.

Thanks, David

Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.

2014-02-28 Thread Kirill Yukhin

Hello Uroš,
On 28 Feb 13:55, Uros Bizjak wrote:
 On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
  Hello,
  This is relatively obvious patch which eliminates comparision
  of inifinities for exp2 AVX-512 test and properly comparing floats
  for avx512f-sqrtps-2.c.
 
  Tests pass.
 
  Is it ok for trunk?
 
  gcc/testsuite/
  * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
  argument to avoid inf values.
  * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with
  UNION_FP_CHECK machinery.
 
 You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to
 avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c.
Sorry for mess.
Broken was avx512f-vdivps-2.c.

Updated testsuite/CHangelog:
* gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
argument to avoid inf values.
* gcc.target/i386/avx512f-vdivps-2.c: Compare results with
UNION_FP_CHECK machinery.

--
Thanks, K

58 matches

Mail list logo